I did not realize that a year has gone by working publicly on Zingg. But now that the calendar has made the fact sink in, it feels surreal. Wasn’t it just yesterday that I was fretting about the documentation before announcing the open source and penning down my reasons for building Zingg? Now here I am, recruiting the team for the enterprise version :-)
It feels great to complete an entire year! Days have passed fairly quickly, spread between writing, lots of reading, speaking, giving demos, and meeting people. We have done a lot of development work as well as spent time on incorporation and funding. I have thoroughly enjoyed interacting with users and learning about the different use cases enabled by identity resolution. We have more than 200 members on our Slack now, with representation from Fortune 500s, digital natives, data platforms, consulting companies and startups. Besides the Customer 360, personalisation, AML, KYC, GDPR, master data management use cases, the trend towards composable customer data platforms is fuelling a lot of interest in Zingg which is very promising. The data stack has proven patterns for data ingestion, transformation and activation, but identity is a core missing piece there. Zingg fits in very well into this space.
One of the fundamental issues with identity resolution is scalability. While building Zingg, we used example datasets and tested our algorithms on them. To test scalability, we copy pasted an example multiple times to build a dummy dataset of roughly 15 million rows. Most enterprise master data is sub 10m rows. Zingg scaled well on this. However, dummy datasets with the same rows perform differently from actual data with real distribution of values. So there were some lingering doubts in the back of my mind. These were soon resolved when we consulted with a Fortune 10 having 12.5m records where Zingg ran like a breeze. Hence, I was confident of scalability while releasing the open source, but I wasn’t sure how far it would go beyond 15m records without tuning the knobs. Hence, it is extremely uplifting to see our users resolve identities on 70-80m records in a few hours with Zingg. Due to the quadratic nature of the comparisons, this is at least 25 times more complex than our 15m endeavours, and seeing it work well is a great confidence boost! Another measure for Zingg is the accuracy of predictions, and users have done head to head comparisons with their multi-million dollar MDM tools to confirm our results. Yay!
One very important aspect for us is arming our users with the power of machine learning and distributed systems in a self-serve, no-code way. Many of our users are first timers with Java, Spark or ML. Zingg abstracts away the technical complexity, and gives the superpower of identity resolution at scale directly to the end user. Now that our Python release is out, we are actively working on a native Snowflake implementation. This opens up a new way for Snowflake users to use Zingg and identify their customers and personalize their journeys and experiences.
There have been some mistakes last year, and I plan to do better in the coming year. Due to my flow state, I missed applying to some conferences but the one I am most disappointed about missing is Coalesce from dbt Labs. I would have met some amazing data practitioners there and made many friends. This time I will clear my calendar and attend virtually. If you are joining, do let me know.
There have been some other disappointments, but they were in the making since some time. Can’t complain.
On a philosophical note, identity resolution is the discovery of self as well as building relationships. Working on Zingg enables me to learn a lot of things around communication, positioning, customer needs, technology, hiring and sales. It pushes my boundaries and makes me discover myself. It also helps me talk to a lot of people and build relationships with customers, data leaders, practitioners, evangelists, contributors, team, investors and mentors. Many of us feel lost at the thought of networking - I am not sure how I would do if I had the charter to meet x new people every week! But when the baseline is common ground on data or ML, it is much easier to connect.
Zingg is made with meraki. It has been a lot of hard work, but it doesn’t feel that way at all. Last year has been nice, and we are having a good time. This coming year will decide how things shape up for us as a project and a startup.
Thanks to all of you for investing your time and effort on Zingg. We are incredibly grateful and motivated to do even more this year.
Wish us luck!
love this so much:
> identity resolution is the discovery of self as well as building relationships