The Difference That Makes a Difference: The case for Data in Enterprise AI

In 1909, biologist Jakob von Uexkull coined the term umwelt — the perspective-dependent universe of salient information that an organism uses to navigate the world. Every species constructs its own. The electromagnetic spectrum is vast, but humans perceive only a narrow band that we call ‘light’. That is our umwelt: it defines our experience of the outside world. Bees, by contrast, rely on ultraviolet to locate nectar on flowers — a part of the spectrum that is invisible to us but semantically central to them. The signals are all there. What differs is which ones carry meaning. I have been thinking about this concept a lot lately, because I see a striking parallel to how Enterprises are approaching AI today.

The Distraction

Whenever we are in the midst of a fundamental technology transition, the Promethean lure is hard to resist: the feeling of bringing something magical and divine to humankind. Which is why it is not surprising that AI industry insiders are talking up the imminent coming of AGI and ASI. Meanwhile, most Enterprises are still struggling to make AI systems relevant to their context and get them to deliver meaningful business impact — a far cry from super-intelligent machines taking over the world. Worse still, all the speculation and hand-wringing around AGI takes attention away from the real task of harnessing AI for actual, meaningful work.

From Biology to Systems Theory

In systems theory, there is the notion that not every bit of information has the same value — and more importantly, salient information is not just a statistical distinction but is fundamentally different, i.e. semantically meaningful. The visible part of the electromagnetic spectrum is semantically more meaningful to humans than infrared. UV is semantically more meaningful to bees. And the level of granularity matters as well: the umwelt of an individual cell is different from that of the bee to which the cell belongs, and in turn different from that of the hive to which the bee belongs.

This leads to one more concept worth unpacking: any process involving causal manipulation of information can be understood as computation, or the running of code. And the information processed by this computation unit is data — what distinguishes it as salient. We receive signals from both the ‘red’ (visible) and ‘infrared’ (not visible) parts of the spectrum, but from our perspective, only ‘red’ carries semantic information — it can be perceived, acted upon, and thus ‘make a difference’.

The Parallel

It is useful to think of LLMs as computation engines that excel at reasoning and inference. But what gives them salience is the data they have access to. In the AI arms race that we are in today, it is tempting for Enterprises to obsess over the “best” LLM as defined by general benchmarks. While that matters, relying solely on model capability is completely missing the point.

Just as an organism’s umwelt determines what it can perceive and act on, an Enterprise AI system’s umwelt — its access to contextual, proprietary data — determines what it can reason about and deliver. A powerful LLM with generic training data is like a human eye pointed at infrared: the machinery is sophisticated, but the signal carries no meaning in that context.

What Enterprises really need to be focusing on is their umwelt: the data. Data is what gives Enterprises salience — the real context with which they can continue to evolve, differentiate, and compete. Gregory Bateson, an influential thinker in systems theory and psychology, described a bit of salient information as “a difference that makes a difference.”

And as those of us who have been in Enterprise AI know so well by now, it is the data, and not the LLMs themselves, that make all the difference. So the question worth sitting with is not “which LLM should we use?” — it is “what is our umwelt, and how do we make it richer?”

The Distraction

From Biology to Systems Theory

The Parallel

Share this:

Leave a comment Cancel reply