Agentic Refactor Loop

Pattern · engineering-culture

Core Concept

A template for driving large-scale, mechanical refactors across a codebase using AI agents with human pair review — the plan is a first-class reviewable artifact (committed to git), and the agent executes the approved plan in bite-size chunks that feed learning back into the plan.

The Loop

Pair on framing — human + human agree on what the refactor is and why
Robot drafts the plan — specific steps, rules, exceptions
Pair reviews the plan — catch obvious gaps
A different robot analyzes the plan — cross-model check (a different model / different prompt finds what the first one missed)
Commit the plan as a PR — the plan lives in git, is reviewable on its own, and becomes durable documentation
Approved plan goes live — robot executes across the codebase
Pair reviews the changes — could be large, so keep batches bite-size
Ship — then iterate: review comments on one batch feed back into the plan for the next

Why It Works

Plan-as-PR turns the spec into an artifact. Reviewing a plan is cheaper than reviewing code; the plan catches errors before they become 8,000 lines of robot output.
Two-model cross-check catches one-model blindspots. The second robot isn’t rubber-stamping — it’s genuinely finding what the first missed.
Bite-size batches create a feedback loop. The large-PR review rule — comments scale inversely with PR size — is enforced by chunking. Each batch’s review comments go back into the plan for future batches, so the process self-improves.
Guardrails mature over time. Early batches stay small while rails are uncertain. As confidence grows, batch size can scale up.

Variations / Extensions

GitHub Action auto-PR: on merge of batch N, automatically open batch N+1 — keeps the queue full, minimizes human scheduling cost.
Custom-instruction guardrails in PR review: parallel to the migration, add AI-PR-review instructions that flag any new introductions of the old pattern. Closes the “someone adds the thing you’re removing” back-door.
Accessibility check variant: same loop, but the robot checks for accessibility violations in the diff rather than executing a mechanical change.

When to Use / When Not To

Use:

Mechanical refactors with a clear rule (old pattern → new pattern)
Non-critical code paths where a mistake isn’t a production outage
Codebases with enough test coverage to catch regressions

Don’t use:

Logic changes where the “right answer” requires domain judgment
Irreversible migrations (data migrations, schema changes with backfill)
First time trying agentic work in a codebase — start smaller

AI-Ready-Engineering — testing infrastructure is the prerequisite quality gate
Testing-Infrastructure-As-AI-Enabler — same principle, applied to test infrastructure
Iteration-Speed-Is-The-Strategy — bite-size batches are the “fast iteration” variant for refactors
Maintainability-Over-Comprehension — the outcome this pattern serves: consistency across the codebase