Back to blog

June 9, 2026

Stop Prompting Coding Agents. Start Designing Loops.

The real shift in AI coding is not from bad prompts to better prompts. It is from human-driven conversations to systems that can observe, verify, remember, and decide when to ask the agent again.

  • Agents
  • LLMs
  • AI Engineering
  • Coding Agents
  • Loop Engineering

If you use coding agents for real work, you have probably felt the strange 70% problem.

The agent can write code. It can explain unfamiliar APIs. It can make reasonable edits across multiple files. Then it stops. You run the tests. Something fails. You paste the error back. It tries again. You inspect the diff. You notice it touched the wrong layer. You steer it back. It apologizes, patches, forgets one constraint, and waits for the next instruction.

The agent looks autonomous, but the loop is still you.

That is why Peter Steinberger’s June 2026 post landed so hard:

“You shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.”

The line is easy to flatten into a slogan. The useful reading is more specific: prompts are not dead; prompts are being demoted. A prompt is no longer the main product of the work. It is one message generated inside a larger operating system.

The leverage is moving from what you say to the agent toward the world you build around it.

The Misread: This Is Not Anti-Prompt

A good loop still needs good prompts. A bad instruction inside a well-designed system can still waste money, touch the wrong files, and produce plausible nonsense.

But a prompt by itself cannot carry time.

It cannot remember every attempt. It cannot know which tests are authoritative. It cannot enforce a budget. It cannot decide that the third retry is no longer useful. It cannot tell whether a visual layout actually renders correctly on mobile. It cannot maintain a durable record of why one approach was rejected last week.

When all of that lives in the human operator’s head, the human remains the scheduler, debugger, reviewer, memory system, and stop condition. The agent is powerful, but the workflow is still manual.

A loop changes where those responsibilities live.

It gives the agent a controlled environment: a trigger, a context builder, tools, permissions, verifiers, memory, budgets, and escalation rules. The prompt becomes a door into that environment, not the environment itself.

This is the same deeper shift that appears in context-heavy AI work more broadly. The prompt is not a magic spell. It is a key. What matters is the room it opens: the project documents, tests, logs, prior attempts, conventions, constraints, and feedback signals that make the agent’s next move less like a guess.

The Three Stages of Coding-Agent Use

Most teams do not jump straight to loop design. They move through three rough stages.

In the first stage, the agent is a driver for isolated tasks. You ask it to write a function, explain a stack trace, or draft a test. The bottleneck is prompt clarity. If you ask vaguely, you get vague work.

In the second stage, the agent becomes a collaborator. It can read files, edit code, run commands, and respond to failures. The bottleneck shifts from “can it generate code?” to “can it observe reality?” A coding agent without tests, logs, browser screenshots, or type checks is still guessing in the dark.

In the third stage, the human becomes the architect of the agent’s operating conditions. The question is no longer “what should I ask next?” It becomes “what should happen next, given the state of the work?” The answer may be: run a narrower test, ask a reviewer agent, update the task memory, stop because the budget is exhausted, or escalate because the decision is subjective.

This is the real meaning of designing loops. It is not just repeating prompts. It is redesigning the workflow so the next prompt is produced by state, evidence, and policy.

AI adoption often looks like a tool upgrade at first, but the meaningful productivity gains come when the surrounding system is redesigned. Factories did not get the full value of electric motors by replacing one steam engine with one big motor. They got it when the layout of work changed around distributed power. Coding agents are similar. The first instinct is to put an agent into the old human workflow. The bigger change is to redesign the workflow around agents.

What a Loop Actually Owns

A useful agent loop owns the pieces that humans otherwise keep doing by hand.

It owns the trigger. What starts the loop: a failing CI job, a new pull request, a dependency release, a benchmark regression, a support ticket, or a human request?

It owns the context. What should the agent see right now: the full repository, a small diff, a stack trace, a design note, prior failed attempts, or a screenshot?

It owns the action policy. What can the agent do without asking: edit code, run tests, open a browser, create a branch, call an API, comment on a PR, or modify production configuration?

It owns the verifier. How does reality push back: tests, type checks, linting, screenshot comparison, schema validation, eval scores, security scans, or human review?

It owns memory. What should survive this run: rejected approaches, flaky tests, project conventions, root causes, useful commands, or decisions that future agents should not rediscover?

It owns budget and stopping. How many attempts, tokens, minutes, changed files, or risk levels are acceptable before the system stops pretending it is making progress?

It owns escalation. When should the agent ask a human because the decision is about product taste, architecture direction, security risk, cost, or reputation?

If these responsibilities are implicit, the human is still the loop. If they are explicit, the agent can operate inside a system.

That is the difference between a clever prompt and an engineering artifact.

Why Software Is The First Serious Testbed

Coding agents are where this shift becomes visible first because software has unusually good feedback surfaces.

Code can be run. Diffs can be inspected. Tests can fail. Types can be checked. Browsers can render pages. Git can isolate changes. CI can reproduce failures. Rollbacks are usually possible. Compared with medicine, finance, hiring, or strategy, software is relatively reversible and instrumented.

That does not make coding agents easy. It makes them measurable.

This is why the stronger agent workflows are not just chat windows with larger context. They are harnesses. Thoughtworks calls the surrounding machinery “harness engineering”: the context management, tool integration, orchestration, evaluation, and safety boundaries that let a model participate in a production system. The paper “Code as Agent Harness” makes a related point: code is not only what agents produce; code can also structure how agents reason, act, store state, and verify their work.

That framing matters because it puts the model in the right place. The model supplies judgment inside the loop. The loop supplies continuity, constraints, and contact with reality.

Software is full of work that can be partially automated because the verifier is nearby. Fix the failing test. Update the docs for changed APIs. Review this diff for regressions. Re-run the eval set after a retrieval change. Capture screenshots after a UI patch. These are not vague wishes. They can be turned into closed loops.

Outside software, the same idea applies more carefully. The less reversible the domain, and the weaker the verifier, the more conservative the loop must be.

The Community Pushback Is a Feature, Not a Distraction

The best objections to loop engineering are mostly right.

One objection is that loops can become expensive. A bad loop is just a token-burning machine that keeps asking the same question with a slightly different stack trace. If it has no attempt cap, no memory of failed approaches, and no cost visibility, it is not automation. It is an infinite meeting.

Another objection is that a loop without a real verifier automates self-deception. If the only judge of the agent’s work is another agent reading the first agent’s explanation, the system may merely produce more confident prose. Verification has to touch the world: tests, execution, screenshots, schemas, user behavior, production metrics, or human judgment.

A third objection is that not every important decision is verifiable. UI polish, API taste, architecture boundaries, product positioning, and naming still require judgment. A loop can generate options and reveal evidence, but it should not pretend that every decision can be reduced to a green check.

A fourth objection is that this can sound like “prompt engineering with extra steps.” Sometimes it is. If the loop does not add state, verification, memory, or a better stop condition, it is only ceremony.

The Hacker News discussion around “Agents need control flow, not more prompts” captured the practical version of the debate. One failure mode was simple: ask an agent to process many requirement files and it eventually skips, repeats, or loses the thread. The fix is not necessarily a more poetic prompt. The fix is ordinary software: iterate through files deterministically, call the model for the ambiguous part, store outputs, validate completeness, and continue.

Use code for control flow. Use the model for judgment.

Examples Of Loops That Are Worth Building

The most useful loops start where humans are already doing the same steering action repeatedly.

The failing-test repair loop. CI fails. The loop checks out the branch, finds the smallest reproducible failing command, gives the agent the failure context, applies a patch, re-runs the test, and repeats until the test passes or the attempt cap is hit. The important detail is not that the agent writes a fix. It is that the loop prevents the task from expanding into “wander around the repo until something looks plausible.”

The pull-request review loop. A PR arrives. The loop reads the diff, routes different parts to specialized reviewers, asks for concrete failure modes, filters low-evidence comments, and posts only findings that are specific enough for a human to act on. Raw model review is noisy. A useful review loop ranks severity, deduplicates findings, and asks for proof.

The dependency-upgrade loop. A package has a security patch. The loop reads the changelog, updates the dependency, runs type checks and tests, asks the agent to repair breakages, and produces a migration note. If failures drift into unrelated areas, the loop stops. The boundary is part of the product.

The documentation-drift loop. Public APIs change. The loop detects changed exports, compares docs and examples, asks the agent to update only affected pages, validates links, and opens a small docs PR. “Update the docs” is a prompt. “These three symbols changed; update these two docs and validate links” is a loop-generated task.

The LLM-evaluation loop. A prompt, model, retrieval step, or tool schema changes. The loop runs an eval set, clusters regressions, asks the agent to explain likely causes, proposes a small change, runs the eval again, and compares deltas. The loop prevents cherry-picking by forcing aggregate comparison.

The frontend visual-QA loop. A UI change lands. The loop starts the app, captures desktop and mobile screenshots, runs accessibility checks, asks the agent to fix overflow or contrast failures, and repeats. Without screenshots, the model is guessing. With screenshots, the loop gives it eyes.

The context-improvement loop. An agent run fails or needs too many human interventions. The loop reads the transcript, identifies missing project knowledge, proposes updates to project instructions, tests, docs, or scripts, and opens a small improvement PR. This is where the system starts compounding. It does not only finish tasks; it improves the environment for the next task.

The pattern is the same in every case: the loop turns repeated human steering into system behavior.

A Good Loop Makes The Invisible Work Visible

There is also a communication reason to design loops well.

When an AI result appears magically correct, observers often assign credit to the model or to a lucky prompt. They do not see the hard part: the context curation, failure handling, validation, memory, permissions, evals, and design choices that made the result reliable.

If you want people to understand the engineering value, show the pipeline.

Show what triggered the run. Show what context was selected and what was excluded. Show which checks passed and failed. Show where the agent was allowed to act and where it had to stop. Show how a failed run updates future context. The goal is not to drown users in logs. The goal is to make the difference between “we asked an AI” and “we built a system” visible.

This matters for teams too. A visible loop is easier to trust, debug, and improve. A hidden loop becomes folklore. Nobody knows why it works until it stops working.

The Practical Rule

Before prompting a coding agent, ask one question:

What would I do after it responds?

If you would run tests, let the loop run tests.

If you would paste the error back, let the loop feed the error back.

If you would inspect the diff, let a review loop inspect the diff.

If you would stop after three failed attempts, encode that stop condition.

If you would search the docs every time, make the docs part of the context builder.

If you would ask a human because the answer depends on taste, risk, or product direction, encode the escalation point.

Every repeated steering action is a candidate for automation. Every failure mode is a candidate for a verifier. Every recurring explanation is a candidate for durable context. Every subjective judgment is a candidate for human review.

That is the shift from prompting agents to designing loops.

The Takeaway

The future of AI coding is not promptless. It is less human-prompted.

The best builders will still write clear instructions. But their main advantage will come from designing the conditions under which agents can keep working: context, tools, tests, memory, budgets, review gates, and escalation paths.

Peter’s post landed because it compressed a larger transition into two sentences. We are moving from conversational AI as a helper to agentic systems as operational infrastructure. The human role moves up a layer: from typing the next instruction to defining the contract, the feedback signal, and the boundaries.

The unit of leverage is no longer the prompt.

It is the loop that knows when to prompt again.

Further Reading