From “What Was Revenue Last Quarter?” to “Generate 30% More Revenue This Quarter”
The a16z piece on context layers is spot on. Here’s what it means when you follow the argument all the way through.
A16z just published a piece that every data practitioner should read. Your Data Agents Need Context makes the case that AI agents have been failing not because the models are bad, but because they’ve been operating without the business context they need to reason well — no understanding of how revenue is defined, which tables are the source of truth, what a fiscal quarter actually means for this organization. The fix, they argue, is a modern context layer: richer than a semantic layer, encompassing canonical entities, tribal knowledge, governance logic, and identity resolution.
They’re right. And I want to extend that argument into territory I think deserves more attention — because the implications of taking it seriously go well beyond the examples that are easiest to reach for.
This Is Data Teams’ Moment — The Question Is How Big We Think
I wrote recently about how agentic AI is finally giving data teams their long-overdue moment. For years, data teams spent budget cycles defending their existence. That is changing fast. AI agents are only as good as the data they run on, and the foundational work data teams have been quietly doing for years — quality, governance, lineage, entity resolution — is now the prerequisite for every enterprise AI initiative that matters.
The budget conversation is flipping — from “justify why we need you” to “how fast can you get us AI-ready?” That’s a genuine shift. But it comes with a risk: if the measure of success for data agents settles at “answering questions BI could already answer,” we’ll have spent enormous energy rebuilding, more expensively and with more failure modes, something that already worked.
Revenue last quarter? Your BI team has a Looker dashboard for that. It refreshes nightly, the fiscal calendar is baked in, and the CFO trusts it. The a16z piece walks through exactly why an agent struggles to replicate even this — wrong revenue definition, stale YAML, ambiguous tables. That’s a real problem worth solving. But it’s also worth asking: once we’ve solved it, what have we actually unlocked?
The answer, I’d argue, is: the foundation for something far more valuable.
The Questions That Context-Plus-Identity Actually Makes Possible
The a16z piece mentions that a modern context layer should go beyond metric definitions to include canonical entities and identity resolution. I want to take that seriously and spell out what it actually unlocks. Walk into any commercial or strategy meeting and listen for the questions that make the room go quiet — not because the answer is sensitive, but because nobody actually knows: How many real customers do we have — not accounts or rows in a CRM, but distinct humans or organisations with an active relationship? Who are our most valuable customers, across product lines, referrals, and renewal behaviour? What is our customers’ lifetime value when “Acme Corp” in the CRM and “ACME Inc.” in billing are finally recognised as the same entity? Who can we upsell, and when, based on what similar cohorts did next? Which customers are three months from churning, based on signals scattered across product, support, and finance systems? These questions couldn’t be asked before not because agents were too dumb, but because the identity layer that makes them answerable was never in place.
What Agents Actually Do with a Resolved Identity Graph
This is where the discussion stops being theoretical. Once entity resolution is in place — once the agent knows who it’s reasoning about across every system — a new class of autonomous workflows becomes possible. Here are four that illustrate the shift.
The Churn Prevention Agent. The agent monitors product usage daily — login frequency, feature adoption, support ticket volume — and when a customer’s pattern matches historical churn signatures, it autonomously cross-references their contract renewal date, checks their NPS history across systems, and drafts a personalised outreach for the customer success manager with a suggested intervention. The entire workflow collapses if the agent is seeing three different records for the same customer across the product database, CRM, and support tool. Entity resolution is not a prerequisite step — it is the agent’s eyes.
The Upsell Timing Agent. A customer’s usage of Product A crosses a threshold. The agent compares their trajectory against cohorts of similar customers who went on to adopt Product B within 90 days, finds the pattern match, creates a qualified opportunity in the CRM, and alerts the account manager — without anyone pulling a report. The cohort comparison is only valid if the purchase history and usage data it’s drawing on are resolved to the same underlying customer. A fragmented identity graph means the agent is comparing apples to a mix of apples and ghosts.
The Cross-Sell Across Household or Org Agent. In B2C — a bank, an insurer, a retailer — one household member has a mortgage, another has a current account, a third just opened a savings product. An agent that understands household relationships, not just individual records, can surface the cross-sell opportunity and route it to the right team before a competitor does. In B2B, the same logic applies at the organisational level: one division is already a customer, another is in a competitor’s trial. The agent identifies the relationship and acts. Both scenarios are structurally impossible without entity resolution at the household or org level — and both are routine once it’s in place.
The Supplier Risk Agent. The agent continuously watches news feeds, financial filings, and logistics data for signals about key suppliers — credit downgrades, port disruptions, leadership changes. When a risk threshold is crossed, it maps which SKUs are affected, what inventory buffer exists, and which alternative suppliers are pre-qualified, then surfaces a recommended action to the procurement team. The catch: this requires knowing that “Acme Logistics Ltd” in the ERP, “ACME Logistics” in the contracts system, and “Acme Log.” in the invoicing platform are the same supplier. Without that, the agent is monitoring fragments, not entities — and the risk signal it’s supposed to catch slips through the gaps.
What “Canonical Entities” Actually Requires
It’s worth dwelling on the practical aspects of having canonical entities and identity resolution as components of a modern context layer — because this is where most implementations will either succeed or quietly fail.
I’ve written before about why building a unified customer identity is genuinely hard, and why so many teams feel like imposters for still wrestling with it. The industry narrative treats unified identity as a prerequisite — something that should already be done before you get to the interesting work. In reality, the vast majority of organisations are still fighting this battle. They’re just not talking about it publicly because it feels like admitting failure.
Entity resolution — sometimes called record linkage, identity resolution, or master data management — is the discipline of determining that two records refer to the same real-world person, company, or object, even when names, formats, and identifiers don’t match. It requires probabilistic matching across messy strings, address normalisation, fuzzy deduplication, graph-based clustering, and ongoing maintenance as records evolve. It’s one of the oldest unsolved problems in enterprise data, not because people haven’t tried, but because the messiness is structural. Organisations grow through acquisition. Customers interact across channels. Nobody standardised data entry in 1997.
For the context layer to include this, entity resolution:
Cannot be hard-coded into YAML or a semantic layer — it requires ML-powered probabilistic matching trained on actual data
Cannot be a one-time batch job — it needs to run continuously as new records arrive and existing ones change
Cannot live outside the warehouse — moving billions of records to a separate resolution system reintroduces the very data silos and lack of context that AI agents are struggling with
Cannot be fully automated — it needs human-in-the-loop workflows to handle edge cases and feed corrections back into the model
This is not background infrastructure that someone configures once and forgets. It’s an ongoing data capability — closer in character to a production ML system than to a semantic layer definition. Teams that treat it as the former will find their context layers actually working. Teams that treat it as the latter will find their agents confidently wrong.
What This Looks Like When Done Right
Two examples illustrate what the context layer vision looks like when the entity resolution component is actually implemented well.
Fortnum & Mason, the 300-year-old luxury British retailer, had customer data fragmented across restaurant bookings, email signups, online transactions, and in-store purchases. They tried a third-party resolution service first — it created non-persistent identifiers, offered limited visibility, and raised privacy concerns about sending their entire customer dataset externally. By implementing Zingg natively in their Databricks environment, they built persistent, unified customer identifiers across all touchpoints, with full control over the matching process and privacy-compliant resolution that kept sensitive data in their own infrastructure. For the first time in their history, they could understand how customers were shopping across every channel and devise clienteling and personalisation around that. That’s not a BI question answered better. That’s a question that was previously unanswerable and an action that could not be done earlier.
Orthodox Union, operating over 40 websites and 5 mobile applications across disparate CRMs, used Zingg on Snowflake to power their golden records — resolving not just individual identities but household relationships across their entire digital estate. Their agents now reason about constituents as whole people with family connections, not as disconnected records across systems.
These outcomes are exactly what the context layer is promising. They require taking the entity resolution component as seriously as the metric definition component — which means treating it as an engineering discipline, not a checkbox.
The Architecture That Makes This Real
The a16z piece lays out a thoughtful five-step architecture: access the right data, build context automatically, refine with human input, connect to agents, keep it self-updating. That framework is right. The entity resolution layer sits at the foundation of step two — and its quality determines the ceiling for everything above it.
Concretely: before an agent reasons about revenue, lifetime value, churn risk, or upsell opportunity, it needs a stable, persistent entity graph that tells it who “this customer” actually is across every system. That graph needs to be built where the data lives, continuously maintained and grounded in human judgement for edge cases :
Only on top of this identity foundation does the rest of the context layer — the metric definitions, the table routing, the tribal knowledge — become genuinely trustworthy. Knowing what “revenue” means is important. Knowing that the revenue from “Acme Corp” and “ACME Inc.” should be attributed to the same customer is what makes the answer actually right.
The Real Payoff
As I argued in The Data Team’s Moment, AI doesn’t paper over data problems — it runs them at scale, in production, with consequences. A customer success agent operating on a fragmented identity layer doesn’t just give one bad recommendation; it systematically misjudges an entire customer segment, at machine speed, before anyone notices. The autonomous nature of agents is what makes the identity foundation so critical — errors compound rather than get caught.
But the inverse is equally true, and more exciting: an agent operating on a trustworthy identity graph can answer questions that were structurally impossible before. Not because the model got smarter, but because the data it’s reasoning on finally reflects reality.
AI agents that can tell you which accounts are actually the same company, which of those companies are in an expansion window, which customers share a behavioural signature with cohorts that churned six months ago — that’s the payoff the context layer is reaching for. The revenue-growth question is the proof of concept. The customer intelligence actions are the business transformation.
The context layer is necessary. Entity resolution is what makes it sufficient. And the questions worth asking, and the agents worth running — about customers, not metrics — are what make the whole thing worth building.
Building on the a16z post “Your Data Agents Need Context” by Jason Cui and Jennifer Li. Further reading: The Data Team’s Moment and The Identity Crisis. If you’re building the entity resolution layer, Zingg is open source and fully documented.

