The last few months have been packed with action for me. Zingg Labs, Inc got incorporated (lots and LOTS of paperwork!). We have closed our first institutional funding and onboarded phenomenal angels. The website is up and we are actively hiring. I have been working with early adopters and learning about different use cases and data stacks. We have users on Databricks, EMR, EC2, GCP, Azure, and private machines. Besides the Databricks and Azure datalakes, we see users with Snowflake, BigQuery, Redshift and Postgres data sources and sinks. Zingg’s pipe abstraction has held out pretty well so far for these.
I have a lot of stories about the learnings of the last few months but have been heads down to keep the focus on building Zingg. For a change, the next week is going to be different. I will be traveling for the Databricks Data and AI Summit and so it is time to go out and learn what others are building! To reconnect with old friends. And make new ones.
I have to admit that I am a reluctant traveler, preferring the comfort of remote work versus packing my bags for international travel. But my excitement is growing as the summit approaches! Probably it is the magic of the keynotes? With Spark and Delta Lake creators like Matei Zaharia, Reynold Xin, Ali Ghodsi and Michael Armburst, the Tuesday opening should be packed with loads of updates on the platform. I still remember the 2014 keynotes where there was so much buzz after the demonstration of the Databricks platform, everyone I talked with in the corridors was raving about the product. Tristan Handy from dbt Labs will deliver a keynote too, and as co-founder of one of the most loved data tools of the data stack, this is going to be a very interesting convergence of the SQL and non-SQL worlds!
The Wednesday keynotes are going to be spectacular too with Prof. Andrew Ng, Peter Norvig, Christopher Manning, Daphne Koller, Hilary Mason and others. I am also excited to hear Manish Amde, who I first met at the conference few years ago. It should be great to catch up! My talk on Zingg is in the afternoon, and I think post lunch I will be too distracted to attend any session properly until its over. So will hang out at the expo hall and check out the products.
Besides the travel and talk preparations, I have been going over the agenda and my current plan is to:
a) pick up topics I know absolutely nothing about
b) learn topics which I tried to learn earlier but failed
c) attend sessions that will help with work
The session on biomarkers fits (a) above. In school, all I did with biology was to mug up the lessons to get good grades. I am hoping this session would get me restarted in this area, something I have wanted to do but never got around to doing. I am also very tempted to attend Streaming ML Enrichment Framework and FutureMetrics which are in the same timeslot, maybe there is a version of the metaverse that allows me to attend them all at the same time? ;-)
Applications of ML in trading and markets - the session by the NASDAQ team looks very interesting. Or maybe the lure of knowledge graphs will make me hop into The Semantics of Biology. Itai’s talk on ingesting, processing and maintaining a data lake should be loaded with loads of practical content. I am very conflicted about which session to attend in this slot as all of them are so intriguing.
There are a few sessions related to privacy which I will most likely attend - top of my list are Protecting PII and Lessons Learnt from Deidentifying 700m Patient Notes. I often think about privacy preserving entity resolution, who knows I may get a few ideas from these talks!
Though we do not do any deep learning in Zingg currently, we may do so in the future, and I am curious to learn about scaling Deep Learning as well as Running Tensor Flow with Apache Spark.
It will be an anomaly to miss a session on anomaly detection, so will hang in there :-) Adi’s talk should be super insightful, really looking forward to that one! And there are a lot of data mesh talks, some of which I hope to attend. Zhamak is keynoting too. What can be a better avenue to learn about decentralized data ownership, self-serve data infrastructure and governance?
Overall, it should be a great place to learn what people are building, the kind of challenges they see and how they are solving them. I am also excited about meeting in person with folks I have been remotely collaborating over the last few months for Zingg.
Let me know if you are attending too and want to meet. Or if you have session recommendations to share.
See you there!