Agentic Refactor Loop
Core Concept
A template for driving large-scale, mechanical refactors across a codebase using AI agents with human pair review — the plan is a first-class reviewable artifact (committed to git), and the agent executes the approved plan in bite-size chunks that feed learning back into the plan.
The Loop
- Pair on framing — human + human agree on what the refactor is and why
- Robot drafts the plan — specific steps, rules, exceptions
- Pair reviews the plan — catch obvious gaps
- A different robot analyzes the plan — cross-model check (a different model / different prompt finds what the first one missed)
- Commit the plan as a PR — the plan lives in git, is reviewable on its own, and becomes durable documentation
- Approved plan goes live — robot executes across the codebase
- Pair reviews the changes — could be large, so keep batches bite-size
- Ship — then iterate: review comments on one batch feed back into the plan for the next
Why It Works
- Plan-as-PR turns the spec into an artifact. Reviewing a plan is cheaper than reviewing code; the plan catches errors before they become 8,000 lines of robot output.
- Two-model cross-check catches one-model blindspots. The second robot isn’t rubber-stamping — it’s genuinely finding what the first missed.
- Bite-size batches create a feedback loop. The large-PR review rule — comments scale inversely with PR size — is enforced by chunking. Each batch’s review comments go back into the plan for future batches, so the process self-improves.
- Guardrails mature over time. Early batches stay small while rails are uncertain. As confidence grows, batch size can scale up.
Variations / Extensions
- GitHub Action auto-PR: on merge of batch N, automatically open batch N+1 — keeps the queue full, minimizes human scheduling cost.
- Custom-instruction guardrails in PR review: parallel to the migration, add AI-PR-review instructions that flag any new introductions of the old pattern. Closes the “someone adds the thing you’re removing” back-door.
- Accessibility check variant: same loop, but the robot checks for accessibility violations in the diff rather than executing a mechanical change.
When to Use / When Not To
Use:
- Mechanical refactors with a clear rule (old pattern → new pattern)
- Non-critical code paths where a mistake isn’t a production outage
- Codebases with enough test coverage to catch regressions
Don’t use:
- Logic changes where the “right answer” requires domain judgment
- Irreversible migrations (data migrations, schema changes with backfill)
- First time trying agentic work in a codebase — start smaller
Related Patterns
- AI-Ready-Engineering — testing infrastructure is the prerequisite quality gate
- Testing-Infrastructure-As-AI-Enabler — same principle, applied to test infrastructure
- Iteration-Speed-Is-The-Strategy — bite-size batches are the “fast iteration” variant for refactors
- Maintainability-Over-Comprehension — the outcome this pattern serves: consistency across the codebase