Your agent works in the demo. It breaks in production. You swap in a bigger model. It still breaks. This was the story repeated throughout AI Engineer World‘s Fair 2026. One team found that expanding their agent’s tool list from 10 to 741 dropped accuracy from 78% to 13%. Same model. It never got dumber. It just got overwhelmed.
Model capability is no longer the constraint. The bottleneck has shifted to how you keep models reliable in production. As one attendee put it: “The model is no longer the hard part.” Most of the conference agenda wasn‘t about which frontier model is smartest — it was about logs, memory, retrieval, evals, tool routing, contracts, and approval gates. The AI race is moving from model capability to engineering deployment.

Agents Are Entering Production — and No One Has the Playbook
Amplify’s industry survey revealed that 95% of respondents are already using agents — double last year‘s number. Among them, 89% said their agents aren’t just reading and writing data — they‘re taking action inside systems. Agents are becoming part of production infrastructure.
But production brings real problems. Observability is the biggest bottleneck. When agents handle real business, “behavior you can’t explain” is the biggest risk. Agents need receipts — verifiable logs of every tool call and decision. One developer described it: “You need to log what the agent saw, what tools it called, why it picked that action, and how to reproduce the result.” A reproducible failure is far easier to fix than a bug you can only describe.
No one has drawn a box around the control layer yet. One presenter said: “No one has put a period on the sentence for the control layer.” The industry is still looking for best practices. Cost is a real constraint. 40% of respondents said AI costs frequently limit their ambitions; another 36% said it happens sometimes. Token consumption is now the second most-monitored metric after output quality. As agents start doing work, token bills surge.
Long-term debt is a growing concern. 59% of respondents worry that today‘s AI-generated code is creating long-term technical debt. AI makes experimentation cheaper and teams more productive, but at the cost of future maintainability. More software is being shipped. Its maintainability isn’t keeping up.

The Loop Debate: Are We Ready for the Software Factory?
The conference‘s most heated debate centered on “loops.” How close are we to a software factory — where AI agents write, review, and deploy code autonomously? The pro side argued that loops are inevitable. Geoffrey Huntley said: “It’s inevitable. It‘s going to be there for the long haul. I don’t see myself going back to hand-coding.” He compared the new role of software engineering to a train driver — keeping the train on the tracks.
The con side warned that the hype is ahead of the discipline. Dex Horthy pointed out that Kubernetes is also built on control loops — but those are deterministic. Agent loops are non-deterministic. You can‘t predict what they’ll do next. Horthy said: “I haven‘t seen evidence that we’re at the point where we can raise the abstraction layer.” The vote at the end of the debate: the “con” side won by a narrow margin. The industry is still cautious about full automation. But even the skeptics agree that loops are inevitable. The debate isn‘t whether to use them. It’s when and how.

Open Source and the Narrative Shift
Another notable shift at this year‘s conference: international developers are moving from “Does China have AI?” to “Why is Chinese AI growing so fast?” People’s Daily noted that the U.S. still leads in frontier foundation models, but China is building a competitive edge through open-source models, low-cost training, and rapid commercialization.
Z.ai positioned ZCode as a harness for all frontier models — a direct counter to OpenAI‘s Codex. HuggingFace’s Thomas Wolf interviewed the MiniMax team at the conference, and its recently released M3 open-weight model became a talking point. Analysts at the conference noted that AI competitiveness is no longer just about model scale and compute — it‘s about engineering capability, industrial ecosystem, open collaboration, and real-world applications.
P.S. 95% of teams are using agents, but no one has figured out how to manage them at scale. The model is no longer the bottleneck — engineering capability is. The software factory debate isn’t about whether we should build it. It‘s about when and how. The model race is still happening. But the engineering race just started. And the industry is still figuring out the rules.