Entity Resolution and the sea
In "The Death of the Author", French critic and theorist Rolan Barthes argues that a writer’s work is merely that of a scribe. The author is merely a means, and it is limiting and inaccurate to go by only her interpretation of her work.
To quote Wikipedia
"To give a text an author" and assign a single, corresponding interpretation to it "is to impose a limit on that text."
The meaning derived by the readers is essentially what brings the text to life.
Every work is "eternally written here and now," with each re-reading, because the "origin" of meaning lies exclusively in "language itself" and its impressions on the reader.
The true effect of a novel is the emotion the work brings out. The feelings it evokes. The usage we put it to in terms of the lessons we derive from it. All these are far beyond what the author had planned or anticipated. I believe Barthes’ views are also equally applicable to painting and other forms of art.
A good majority of us programmers have eulogized Paul Graham’s Hackers and Painters, putting ourselves closer to artists and treating our work as a piece of art. Utilitarian, but beautiful in its own abstractions. A little magic and a lot of uncertainty. I have always felt pretty inspired while coding. Won’t say I am an artist, but I surely enjoy the process of building things from the ground up. In many ways, I feel that Barthes’ views are also applicable to software. We must build with our views, and then sit back and let the users interpret our work. There are ways people will use the software which the programmer had never thought about. There will be environments where the software is more valuable than what it was originally designed for. There will be problems that will get solved which the developer never knew of. This is particularly true of open source software, where the software finds its path to the set of problems it is best suited to solve. To the user group which finds it the most valuable.
Thankfully, software, unlike novels or paintings, can be done iteratively. We have the cushion of versioning to involve user feedback in our build process. So unlike a “limit on the text”, the software can grow and developers can add the user learnings back into the flow. And this new version leads to even more diverse applications by the users. As this feedback loop progresses, the developer of the software can learn and grow so much too from the user experiences.
It is to this end that talking to early adopters is key. To learn about their interpretation of the software, how they are using it, and in which context. The twenty-odd Zingg Slack community is one of my biggest sources of learning. When I put Zingg out, I had some applications in mind. Entity Resolution as a means to attribute revenue to customers in a typical enterprise setting was the main one. Figuring out the most valuable customers. A better micro view understanding of each individual customer. But since talking to some users over the past weeks, I have realized that there is more to it. There is a macro view too! That resolving customers across quarters is essential to learning how many new customers have been added, which in turn leads to tuning the right knobs in sales and marketing. I have learnt that this is relevant for new quarterly customer addition for a food chain, as well as for patients in a hospital. Very diverse settings indeed.
Another big revelation for me is that there is a fairly large world of users who are not comfortable with Spark or even Java. That users will use docker even if you do not provide a neat dockerfile! Also, the underlying technology can be Java which is best suited for the problem, but Python interfaces are the need of the hour. And how could I forget SQL? All the listening and talking with early users is a great conduit for fresh ideas to build.
I have also been reading. Really loved the modern data spaghetti. Admittedly I am a bit biased as Anna talks about resolving entities in the modern data stack.
Growing complexity in our pipelines leads to a new set of data problems: the need to resolve identities across these many systems,
In a nutshell - the more SAAS tools we have, the more will be our need for identity and entity resolution. I am completely subscribed to this point of view, and if you are curious, please do read Anna’s post.
Having come this far, you may be wondering about the title of this post. I have talked about art and software but whatever got me to borrow from Hemingway? :-) Without much ado, here is the post that inspired my title - Singapore’s maritime threat detection system. Singapore Maritime Crisis Center (SMCC) is developing a next-generation sense-making system to improve its abilities in detecting maritime security threats “as early and as far away” from Singapore as possible.
It automatically ingests and consolidates maritime information from more sources, including proprietary whole-of-government systems, commercial platforms and publicly available ship data. These sources include surveillance cameras belonging to Home Team agencies, as well as commercial multi-country maritime information sharing platforms. The system then uses data analytics and a technique called entity resolution to match the data to the correct ship, before trawling through it to detect tell-tale signs that raise suspicion.
Not just customers, ships at sea need entity resolution too.
“Some of the sources do not fact check the data, including the names or numbers that come in,” said Mr Tan Yang Zhi, a programme manager at the Defence Science and Technology Agency, which developed the system with DSO National Laboratories. “But the entity resolution algorithm allows us to match (the data even if) there are spelling errors or missing digits.”
What a neat usage of entity resolution! I hope Zingg gets used in some of these scenarios. Signing off with a quote from The Old Man and The Sea
It's silly not to hope. It's a sin