TL;DR in plain English
- Agents that work in demos usually fail at scale because the engineering around them is missing, not because the LLM itself is the limiting factor. See Rafael Lopes’ summary: https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- The three hard engineering walls are: deterministic guardrails for non-deterministic outputs, enterprise data integration (most enterprise data is unstructured), and a resilient orchestration layer that handles mid-chain failures. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Practical rule for small teams: scope narrowly, make one customer flow reliable end-to-end, add simple guardrails and idempotency, log chain events, and gate rollout with feature flags. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Quick artefact you can produce in 15 minutes: a one-page checklist that lists the flow, top data sources, three guardrail rules, the primary observability metrics, and a rollout gate. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
What changed
Rafael Lopes reframes the scalability problem: the LLM is usually the easy piece; the operational gap is engineering. The core fixes are engineering problems — deterministic guardrails, connectors that make unstructured enterprise data usable, and orchestration that decides which agent/step runs and what to do when a step fails. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Concretely, teams must budget engineering time for:
- deterministic guardrails around non-deterministic outputs;
- connectors, access controls and curated indexes for enterprise data (the post notes most enterprise data is unstructured: 90%+); and
- an orchestration layer that handles retries, mid-chain failures and responsibility boundaries. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
That changes prioritization: invest in plumbing (idempotency, observability, access control) before widening agent scope. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Why this matters (for real teams)
Treating agent failures as engineering failures reframes the risk and the operational playbook. Lopes explains that without these systems the following happen: cascading error modes across chained steps, unclear MTTR, and surprise cost spikes as retrievals and external API calls multiply. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Practical consequences for product and ops teams:
- Chain failures multiply troubleshooting scope and increase mean time to recovery (MTTR). https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Costs increase when many retrievals, LLM calls, and external API calls run per conversation; monitor per-conversation spend. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Compliance and procurement will block rollouts without clear data-access contracts and residency commitments. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Operational priorities: track chain-failure rate, fail external actions closed-by-default, and require data-access contracts before broad rollouts. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Concrete example: what this looks like in practice
Scenario: a support agent performs three chained steps:
- retrieval from a private document store;
- the LLM drafts a suggested reply; and
- the agent calls the ticketing API to create or update a ticket.
Failure mode: a bad retrieval supplies wrong context, the LLM confidently drafts an incorrect reply, and the agent creates an incorrect ticket. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Practical fixes that come from the engineering framing:
- Deterministic guardrails: require explicit preconditions before automatic external actions, and short-circuit to a human when confidence is low. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Idempotency: attach deduplication keys to external calls so retries don’t create duplicate tickets. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Compensating and observability actions: flag conversations for human follow-up and log chain events so root cause spans the entire chain. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Decision guidance: short-circuit bad retrievals to a human; retry failed actions with backoff and idempotency; block external actions on suspected hallucination and require verification. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
What small teams and solo founders should do now
Scope narrowly and build depth, not breadth. Rafael Lopes’ framing points to plumbing first. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Three concrete, actionable steps:
-
Pick one customer-facing flow and map it end-to-end. Deliverable: a one-page flow map showing inputs, data sources, decision points, and external actions. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
-
Add simple deterministic guardrails and safe defaults. Examples: input validation, explicit action preconditions, and human hand-off when confidence is low. Implement idempotency for any external call. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
-
Implement lightweight observability and gated rollout. Log chain events, expose a few dashboard metrics, and control traffic with a feature flag before expanding. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Starter checklist (copy into your task tracker):
- [ ] Define the single critical flow (the top customer-impacting path)
- [ ] Capture top data sources and access constraints for that flow
- [ ] Add guardrail rules and idempotency for external actions
- [ ] Add chain-event logging and three dashboard metrics
- [ ] Gate rollout behind a feature flag and a simple rollout criterion
- [ ] Configure a cost alert tied to your daily test budget
These minimal steps address the three walls Lopes identifies and reduce common cascading failures. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Regional lens (UK)
UK teams must treat data access and residency as first-class constraints. The engineering problems Lopes highlights—data accessibility and unstructured sources—intersect directly with UK GDPR and ICO expectations. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Practical UK checklist items:
- Map data flows to UK GDPR categories (PII in transcripts, contracts, support logs).
- Define lawful basis and retention schedules before enabling retrievals.
- Implement telemetry redaction and limit trace sampling when residency rules require it.
Procurement note: do not expand beyond an early rollout without a DPA and any residency commitments enterprise customers require. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
US, UK, FR comparison
A compact comparison for sales and technical scoping. The engineering work (guardrails, connectors, orchestration) is the same; legal gating changes timing and required artifacts. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
| Country | Regulator / rule | Main compliance checkpoints | Typical procurement asks | Required artifact | |---|---:|---|---|---| | US | Fragmented (state + sector) | Per-customer config; sector rules (e.g., HIPAA, GLBA) | Per-customer DPA; flexible configs | Per-customer data-flow doc | | UK | ICO / UK GDPR | Lawful basis, residency, retention | Data residency and clear DPA | DPA + compliance checklist | | FR | CNIL (strict) | DPIA likely; consent and processing records | Demonstrable DPIA, redaction | DPIA + processing records |
Legal requirements change rollout timing; the platform engineering work is consistent. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Technical notes + this-week checklist
Assumptions / Hypotheses
- Core hypothesis (from the source): scaling agents is primarily an engineering problem; the LLM is the easy piece and failures come from missing guardrails, data wiring, and orchestration. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Enterprise data is mostly unstructured (90%+), so plan connectors and curated indexes. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Operational hypotheses and concrete thresholds (these are recommendations to validate in your environment):
- Rollout steps to validate: 1% → 10% → 100% traffic expansion.
- Early monitoring window: 48 hours of close observation after each expansion.
- Observability set: 3 primary dashboard metrics (chain-failure rate, median chain latency, cost-per-conversation).
- Example latency target for interactive flows: ~500 ms median chain latency (tune to product needs).
- Example cost cap for early tests: $200/day hard stop on infrastructure+API spend.
- Max tokens per conversation cap example: 4,000 tokens to limit per-conversation inference cost.
- Quick artefact timebox: a 15-minute one-page checklist to document the flow and guardrails.
Methodology note: these numeric thresholds are operational hypotheses informed by the engineering framing in the source; tune them to your product and user expectations. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Risks / Mitigations
- Risk: chain multiplication of failures increases MTTR.
- Mitigation: instrument chain-failure rate and enforce automated rollback triggers at early rollout steps. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Risk: uncontrolled costs as chains call many APIs.
- Mitigation: add cost-per-conversation alerts and a hard stop if daily spend exceeds the test budget ×2; keep a small daily budget cap in early tests. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
- Risk: customer compliance pushback.
- Mitigation: include compliance milestones in rollout gates and keep a simple data-flow map available for customer review. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
Next steps
If you do one thing this week: pick a single high-impact flow, add a deterministic guardrail that prevents external actions when retrieval confidence is low, and run a small gated rollout behind a feature flag. This single action prevents many cascading failures the post describes. https://blog.r-lopes.com/posts/2026-06-11-why-agents-dont-scale
If you want a ready short plan, copy the starter checklist above into your issue tracker and begin with the first task: map the flow end-to-end and capture the top data sources and access constraints.