More persistence for Zingg IDs!

Introducing Reassign - taking the identity graph one step further.

Apr 05, 2026

In my last few posts, I wrote about the persistent ZINGG_ID and incremental flows. Quick recap for those who missed them. As data keeps changing in a growing business - new customers, updated addresses, records arriving from new channels - the identity graph has to keep up. The incremental flow in Zingg Enterprise handles exactly this. New and updated records get matched against existing clusters, clusters merge and split automatically, and the ZINGG_ID travels with the entity through all of it. The identity graph stays consistent even as the data underneath it keeps moving.

That was a big piece of work to build and ship. But as we rolled it out to more customers, we started hearing about a different scenario. One we had not fully planned for.

Some of our customers had been running Zingg for a while. Their identity graphs were in good shape. Incremental flows were humming. And then, their data got better.

Phone numbers that were unreliable at the start of the project - maybe coming from a poorly integrated source system, or just sparsely captured - were now being collected consistently. Email fields that were mostly empty were now populated. A new column from a recently onboarded data source was throwing up a great matching signal. The data engineering team had done some serious cleanup work on one of the source systems.

In other words, the matching could now be sharper than it was when the graph was first built.

And yet, teams were not acting on it. We started hearing the same thing again and again. Customers who loved their ZINGG_IDs - and were relying on them deeply - were finding that the very thing they valued was also holding them back. One customer had significant architecture changes and needed to run a full refresh, but was worried about losing their ZINGG_IDs. Others wanted to add a new matching signal but held off. Some wanted to move to a different data platform entirely, but the ZINGG_ID dependency made the migration feel too risky.

The Zingg identifier had become load-bearing infrastructure. Which we loved to see - it meant the graph was genuinely embedded in how the business operated. But it also meant that any change to the graph felt dangerous. Customers were not able to move from one model to another, capture more signals, or even change their data platform because of Zingg. That was not the position we wanted our customers to be in.

So, what do you do when you want a full refresh - run matching again on the full dataset, this time with the richer fields, the better signal, the cleaner data? But your downstream systems are wired to the existing ZINGG_IDs.

Reporting tables, Customer 360 dashboards, ML feature stores, operational workflows at the call centre - all of them reference the ZINGG_IDs from the previous run. If you blow away the graph and rebuild it, you get better clusters but you lose the identifier continuity. Every downstream system breaks or needs a migration. That is a huge cost, and it tends to make teams postpone the refresh indefinitely. The identity graph gets frozen at the quality level it was when it was first built, because improving it feels too disruptive.

This is the gap that Reassign closes.

After you run a full refresh with your updated fields and retrained model, reassign maps the new cluster assignments back to the existing ZINGG_IDs. It carries the established identifiers forward wherever the entity continuity holds. Downstream systems keep their reference point. The identity graph gets the benefit of the improved matching signal. The refresh is no longer a disruptive operation.

Here is the even more interesting bit - what taking the persistent ZINGG_ID one step further really means. Persistence, as we built it earlier, means the identifier travels with the entity through routine changes - new records, updated records, incremental loads. Reassign extends that guarantee to cover a deliberate, intentional rebuild of the graph itself. The identifier now survives not just the churn of normal operations, but also the data team’s decision to go back and do it better.

We think of this as data maturity support built into the identity graph. Because in practice, the signals get better over time. Source systems get cleaned up. New integrations bring in new attributes. The identity graph should be able to evolve with that maturity - not fight against it.

If you have a full refresh sitting in your backlog because the downstream impact felt too risky, we would love to hear from you. Hit reply and tell us what triggered it - what improved signal are you sitting on that you have not yet been able to incorporate?

Learning From Data, Data Teams and Ecosystem

Discussion about this post

Ready for more?