Blog

Insights on production AI.

Practical knowledge from the field. No hype, no fluff, just lessons learned from shipping enterprise AI systems.

Agents That Crash and Restart From Zero Are Not Production Agents

A transient 429 mid-workflow shouldn't be a five-alarm fire. For most agents shipped in 2024-2025, it was, because state was never designed. Here's the maturity model AWS now treats as baseline.

July 21, 2026Read more

AI Ops9 min read

The Document Is the Exploit: Indirect Prompt Injection as a Supply Chain Attack

Your RAG pipeline has a WAF. Your API has rate limits. The PDF a vendor emailed your agent last Tuesday had none of that scrutiny, and it can issue instructions with the same authority as your system prompt.

July 7, 2026

Automation8 min read

Better Reasoning, Worse Tool Use: The Hidden Tradeoff in Capable Agents

Reinforcement learning that sharpens an LLM's reasoning also raises tool hallucination proportionally. That's a causal finding, peer-reviewed at ACL 2026. Here's the architectural fix and what to add to your eval harness today.

June 9, 2026

Automation8 min read

From RAG to Enriched: The Retrivon Blueprint

Five RAG failure modes, five architectural decisions. How we built Retrivon, a production enterprise knowledge platform with 100% source-grounded answers, and the blueprint you can steal.

May 15, 2026

AI Ops10 min read

Context Engineering: Why Your Agent Degrades at Turn 40

57% of engineering teams now run agents in production. The dominant failure mode is not the model. It is what is in the context window. Here is the five-criteria framework to test before you call your agent production-ready.

May 14, 2026

Governance9 min read

Audit logs for AI: the contract that survives a compliance review

Six months after a bad inference, a regulator will ask what model decided, on which input, on whose behalf, with what downstream effect. Most teams cannot answer. Here is the schema we ship on every regulated build.

May 13, 2026

AI Ops10 min read

Trust is a UX layer, not a model property

The cleanest data point on AI trust failure is from a study where the model never failed. 758 consultants, GPT-4, a poisoned task: 84% right without it, 60 to 70% right with it. The surface owns trust. Here are the four design moves we ship on every build.

May 13, 2026

AI Ops10 min read

The production AI checklist for 2026: from demo to deployment

Most AI projects die between the POC demo and the on-call rotation. Not because the model is wrong, but because the operating discipline around it was never built. Here is the checklist DAD applies before we call a build shippable.

May 13, 2026

Automation10 min read

Five RAG failure modes we still find in 2026 audits

RAG is a mature pattern. The systems we audit are not. Five failure modes keep showing up at companies that already shipped a v1, and each has a symptom, a fix, and a cheap measurement.

May 13, 2026

AI Ops9 min read

Your agent passes the benchmark. It will fail in production.

Pass@1 is a single coin flip dressed up as an SLA. Early-2026 research puts numbers on the gap, and on what to instrument instead before you ship.

May 12, 2026

Stay updated

AI, built right. Monthly: where the industry gets AI wrong and how we get it right on real builds.