Kirchhoff’s laws for data

I was at a meetup recently, and the woman there talked about systems diagrams. These are diagrams which show how a resource flows through the network systems. These terms are pretty broad, because they are pretty broadly applied.

Whenever I think of flows on a network, I think of Kirchhoff’s Laws. High level description: Kirchhoff says flow in has to equal flow out. You probably know this intuitively… it is implied by the word “flow.” But that is what makes it such a beautiful concept, it is something you kind of knew already, but grounding yourself in it can lead to deeper understanding.

Kirchhoff is famous for electrodynamics, but it is used for other fluids, trading algorithms, and search engines. More low level: it is a large part of what gives Markov chains their power.

This got me thinking, to what extent can we think of data as a resource? The obvious reason why we would not is that data itself doesn’t necessarily follow Kirchhoff’s laws… The flow out can, theoretically exceed the flow in. Unlike a lot of resources, data does not deplete with use, but instead, loses potency with time.

That said, there is a sense in which data flows through a team. But usually each member/group transforms the data in some way. So perhaps the way to view this flow is the flow of information, or knowledge, or insight.

So while the data that flows into the data engineering team is not the data that comes out, the information that flows in, should be consistent with the information passed to the data scientist, which should be at the heart of the insights passed onto the management. While physical data which flowing in need not match the data which comes out, you should not be able to draw more information out than is theoretically contained in the data. For example, you can’t find someone’s location from the ambient temperature, although you might be able to narrow it down (e.g. Key West, San Diego, Indoors).

This allows you understand the services that a team or individuals provide. Where are they getting their information from? Where is it going too? How is it transformed? Is there anything lost? Are we drawing too strong of conclusions from it?

You can also understand the interaction between teams. Are their loops? How the data team uses their own insights should be carefully considered, and a loop might mean that we are building models based on previous models. Are we pseudo-labeling responsibly? Or is there a feedback loop leading to the data version of confirmation bias, where one uses their own insights to confirm what they already believed.

Is everyone getting the data they need/use? This complicates things a little, because the information value of data isn’t necessarily additive. For example, if we are trying to locate someone, their longitude by itself is not particularly useful. Nor is latitude taken by itself. But together, it is exactly the information we need.

I am not necessarily treading new ground here, but information is one of the biggest resources many modern companies hold, and it is interesting to understand data in terms of what it means and who can use it.

Author: djkelleher

Teacher, Scholar, Mathematician. View all posts by djkelleher

Share this:

Related

Author: djkelleher

Leave a comment Cancel reply