The demo always works. A model reads a few documents, answers a question, drafts a reply, and the room nods. Then the project meets the system of record, the ERP, the CRM, the claims engine, the core banking ledger, and the momentum drains out of it. This is the integration tax, and it is the single most underestimated line item in enterprise AI. The intelligence was never the hard part. Getting that intelligence to read trustworthy data and write durable changes into the systems that run the business is where quarters disappear.
The reason is that systems of record were built around a different contract than the one AI brings. They assume a known caller, a fixed schema, a transaction that either commits or rolls back, and an audit trail that names a human or a service account. An AI agent shows up with probabilistic output, a fuzzy sense of intent, and a tendency to improvise field values that look plausible but violate a constraint three tables deep. The model is not wrong because it is bad. It is wrong because nobody gave it the same guardrails that every other integration into that system has had to earn over twenty years.
So the work shifts to where it always belonged. You need a typed boundary between the model and the record, a layer that validates every proposed write against the real schema, enforces business rules the model has never seen, and rejects anything that does not conform before it touches production data. You need idempotency so a retried agent does not double post an invoice. You need the same identity and authorization story you would demand of any service, so that every action the agent takes is attributable and revocable. None of this is glamorous, and none of it shows up in a model benchmark.
This is also why reads matter as much as writes. An agent grounded on stale, duplicated, or partially permissioned data will produce confident answers that quietly contradict the ledger. The integration layer has to expose a clean, current, access-aware view of enterprise state, not a raw dump of whatever the connector returns. Teams that treat retrieval as a one-time plumbing task discover later that the model was reasoning over the wrong version of reality the whole time.
The organizations getting this right have stopped framing AI as a model problem and started framing it as a systems integration problem with a model attached. They invest in the boundary first, the connectors, the contracts, the validation, the observability, and they let the model plug into a surface that is already safe. The result is less impressive in a demo and far more durable in production, which is exactly the trade a serious enterprise should be making.
If your AI pilots keep looking brilliant in the sandbox and fragile against your real systems, the model is not the bottleneck. The boundary is, and that is where the next investment should go.