Insights on production AI.
Practical knowledge from the field. No hype, no fluff, just lessons learned from shipping enterprise AI systems.
Better Reasoning, Worse Tool Use: The Hidden Tradeoff in Capable Agents
Reinforcement learning that sharpens an LLM's reasoning also raises tool hallucination proportionally. That's a causal finding, peer-reviewed at ACL 2026. Here's the architectural fix and what to add to your eval harness today.
Audit logs for AI: the contract that survives a compliance review
Six months after a bad inference, a regulator will ask what model decided, on which input, on whose behalf, with what downstream effect. Most teams cannot answer. Here is the schema we ship on every regulated build.
Trust is a UX layer, not a model property
The cleanest data point on AI trust failure is from a study where the model never failed. 758 consultants, GPT-4, a poisoned task: 84% right without it, 60 to 70% right with it. The surface owns trust. Here are the four design moves we ship on every build.
The production AI checklist for 2026: from demo to deployment
Most AI projects die between the POC demo and the on-call rotation. Not because the model is wrong, but because the operating discipline around it was never built. Here is the checklist DAD applies before we call a build shippable.
Five RAG failure modes we still find in 2026 audits
RAG is a mature pattern. The systems we audit are not. Five failure modes keep showing up at companies that already shipped a v1, and each has a symptom, a fix, and a cheap measurement.
Your agent passes the benchmark. It will fail in production.
Pass@1 is a single coin flip dressed up as an SLA. Early-2026 research puts numbers on the gap, and on what to instrument instead before you ship.
Stay updated
Field notes from the studio. Monthly, signal only — engineering decisions on real projects.