The Real Threat to SaaS: It’s Not Vibe Coding. It’s Access To Your Data.
And why we built Zingg the way we did.
Vibe coding has everyone debating whether SaaS is dead.
One camp says: when you can build a CRM in an afternoon, why pay Salesforce $150 a seat? The other fires back: a POC is not a product. Security, compliance, scalability, ongoing maintenance — the real cost of DIY reveals itself fast.
Both sides have a point. But they’re arguing about the wrong thing.
The real threat to SaaS isn’t that customers will build replacements. It’s that customers will demand their data back.
The New Requirement Nobody Saw Coming
Most enterprises are not going to build their own CRM, their own support desk, or their own payroll system. That’s not happening. But they are going to build custom AI agents — agents tailored to their workflows, their industry, their competitive edge. And those agents are useless without access to the data that lives inside enterprise SaaS systems.
Think about what that means in practice. A procurement agent that can’t read your ERP. A sales agent that can’t query your CRM. A support agent that can’t see your ticketing history. You end up with AI that is powerful in theory and crippled in practice — not because the models aren’t good enough, but because the data is locked behind walls that the enterprise itself cannot breach.
This is the structural tension SaaS has quietly avoided for two decades. Data lock-in was never just a feature — it was the moat. Salesforce, Workday, ServiceNow, SAP built empires not just on software, but on the gravitational pull of accumulated enterprise data. The switching cost wasn’t the product. It was your own history, trapped inside someone else’s system.
AI is now making that lock-in impossible to ignore.
Enterprises building custom agents will demand data portability as a first-class requirement — not a nice-to-have, not a paid add-on, but a baseline expectation. The competitive pressure will shift toward SaaS vendors who open their data, and away from those who treat it as a proprietary asset. MCP is an early signal of this direction, but it is barely the beginning. Real openness means real-time access, structured APIs, granular permissions, and data that can move across systems without friction.
Why We Built Zingg the Way We Did
I want to share something personal here, because it’s directly relevant.
When we built Zingg — an open source entity resolution and data matching tool — one of our earliest and most deliberate architectural decisions was this: Zingg would run natively inside the customer’s own environment. Their data would never move.
Not to our servers. Not to a cloud we managed. Not through a pipeline that touched our infrastructure. Zingg would go to the data, not the other way around.
At the time, this felt like swimming against the tide. The SaaS playbook was clear: centralise everything, host it yourself, lock in the customer through the platform. Everyone was building managed services where customers uploaded their data and got results back. It was convenient. It was scalable. It was the default.
We chose differently — and we took some friction for it. Prospects would ask why they couldn’t just hand us their data and get clean results. Investors would ask why we weren’t building a managed pipeline. The market was optimised for data movement, and we were building for data residency.
Our reasoning was straightforward. Entity resolution deals with some of the most sensitive data an enterprise has — customer records, patient data, financial identities, supplier profiles. The enterprises that needed this most were also the ones with the strictest data governance requirements: banks, healthcare providers, insurers, government agencies. For them, sending data to a third-party system wasn’t a preference issue. It was a compliance wall.
Beyond compliance, there was a deeper principle at work. The enterprise owns its data. The enterprise should control its data. A tool that requires data to leave the enterprise in order to function is not really serving the enterprise — it’s holding the enterprise’s data hostage to deliver a service.
What We Learned From Building This Way
Running natively inside the customer’s environment changed everything about how we thought about the product.
It forced us to build something that was genuinely portable — that could run on on-premise Spark clusters, on Databricks, on AWS EMR, on GCP, on Snowflake, wherever the customer’s data actually lived. It couldn’t be opinionated about infrastructure, because the whole point was to meet the data where it was.
It meant our integration surface was the data itself — structured, well-defined, queryable — not a proprietary pipeline that the customer had to trust and maintain.
And it gave customers something that is genuinely rare in enterprise software: full control and transparency. They could see exactly what Zingg was doing with their data, because Zingg was running on their machines, under their monitoring, in their environment, at their schedule.
There was also a dimension we hadn’t fully appreciated until customers started telling us: it was significantly faster AND cheaper. When your tool runs inside the enterprise’s existing environment, the enterprise leverages infrastructure it has already paid for — compute clusters, data quality frameworks, observability stacks. There is no new data pipeline to design, build, or maintain. There is no ongoing cost of extracting data, shipping it somewhere, and syncing it back. No ETL tax. No vendor markup on compute. The enterprise uses what it already has, and the tool fits around it. For large organisations running entity resolution at scale, this wasn’t a marginal saving — it was a fundamental difference in total cost of ownership.
Interestingly, this also made Zingg stickier in a different way. Not because we had the data — we never did — but because the customer’s confidence in the product grew over time.
Why This Matters Now More Than Ever
Fast forward to today, and the AI agent conversation is making this architectural principle mainstream.
Every enterprise that wants to build custom agents is asking the same question we were thinking about years ago: how do I give my AI access to my data without giving someone else access to my data?
The answer is the same one we arrived at for Zingg. The data should never need to leave.
This is what makes the current MCP momentum so significant. MCP is a protocol that lets AI agents communicate with data systems without requiring data to be extracted, copied, or centralised. It is, in essence, a formalisation of the principle we built Zingg on: bring the compute to the data, not the data to the compute.
But MCP alone is not enough. A faithful MCP implementation means more than exposing an endpoint — it means real-time access, granular permissions, data that reflects the current state of the system, and APIs that are robust enough to support production AI workloads, not just demos.
The enterprises of the next decade will not tolerate data opacity. They have regulators demanding audit trails. They have security teams demanding residency guarantees. And now they have AI ambitions that demand data access. These forces are all pointing in the same direction.
SaaS vendors who open their data thoughtfully — with proper permissioning, auditability, and reliability — will find that openness builds a different kind of loyalty. One that is harder to compete with, because it’s based on trust rather than captivity.
The vendors who resist will find themselves on the wrong side of procurement conversations as AI becomes a standard enterprise requirement. The question won’t be “does your product have AI features?” It will be “can my AI agents use your data?”
If the answer is no, or not without significant friction, the deal will start to look elsewhere.
The Output Was Always the Point
There is one more dimension to this that feels especially relevant now.
Entity resolution is not a peripheral problem. It is one of the most foundational challenges in enterprise data — the question of whether the “Acme Corp” in your CRM is the same as the “Acme Corporation” in your ERP, whether the customer in your support desk is the same person as the one in your billing system, whether your supplier records are duplicated across three different procurement tools. Resolved, unified data is the prerequisite for almost everything else an enterprise wants to do with its data. Analytics, reporting, compliance, machine learning — none of it works well on dirty, fragmented, unresolved data.
This is why we were always deliberate about how Zingg surfaces its output. We didn’t just want Zingg to solve the matching problem — we wanted the resolved data to be genuinely portable, consumable by whatever system or workflow the enterprise chose to use next. Clean, resolved entities written back to the data store of the enterprise’s choice, in formats that fit naturally into existing pipelines, without any proprietary wrapper that would create a new dependency.
We did not see AI agents coming when we made that decision. But in hindsight, it was exactly the right call. A resolved, unified, trustworthy view of an enterprise’s core entities — customers, suppliers, products, locations — is precisely the kind of context foundation that AI agents need to function well. An agent working from fragmented, duplicated, unresolved data will produce fragmented, unreliable answers. An agent working from a clean, resolved data layer can actually be trusted.
We built Zingg to ensure its output could be consumed anywhere the enterprise wanted. It turns out that “anywhere the enterprise wants” now includes AI agents, RAG pipelines, and real-time reasoning systems we couldn’t have imagined when we started. The principle held.
A Principle Worth Standing Behind
When we made the architectural choice for Zingg, we weren’t predicting AI agents. We were just trying to build something enterprises could actually trust with their most sensitive data.
But the principle was right then, and it is right now: data should live where the enterprise decides it lives. Tools — whether they are matching algorithms or AI agents — should go to the data, not the other way around.
The market is catching up to this idea. And those who embrace it, rather than fight it, will be the ones who matter in the next era of enterprise software.
Zingg is an open source entity resolution framework that runs natively in your environment — no data movement required. You can find it at github.com/zinggAI/zingg.
