#78: Model-free vs. Model-based learning

Over the holidays, I read A Brief History of Intelligence by Max Bennett, a fantastic book that traces the evolution of intelligence from worm-like nematodes (microscopic worms) all the way to us humans. The author does an absolutely brilliant job of the journey by breaking it into five key breakthroughs starting from the minute organisms capable only of directional movement. As I read through the book, I couldn’t help noticing the parallels to the evolution of AI – in fact, the author makes that point at a few junctures in the book. And given the current narrative around AGI, it is tempting to think of AI along the same evolutionary trajectory. I am firmly in the camp of the AGI hype being a totally unnecessary distraction – a topic of another day.

To me, the one striking point in this evolutionary journey of intelligence was the leap from the early vertebrates to mammals with the evolution of the neocortex in the brain, which turned out to be a massive breakthrough and set the mammalian world on a trajectory that has culminated in us humans. The early vertebrate brains could be reinforced both positively and negatively, coupled with the ability to recognize patterns (shapes, smells etc.) even when they were naive to the stimulus. However, they were purely operating in a stimulus-response mode, without developing any mental model of the world. And then the mammals branched out with their neocortex edge. This game-changing capability helped the mammals evolve and survive by being able to simulate other organisms in their brains through imagining, which gave them the ability to learn before they acted. The neocortex allowed these organisms to evaluate various choices of action and consider which one was advantageous. This combined with reinforcement learning (aka model-based learning) was a breakout transition in the evolutionary arc, and set the mammals on a path that eventually ended up with humans. I tend to think of Model-free learning differing from model-based learning in three fundamental ways:

Model free learning	Model based learning
General idea of desired and acceptable outcomes (via prompts and guardrails)	+ Planning in rich detail with a desired objective function
Use past events to learn and determine next course of action (or search domain space to find the most relevant action aka RAG)	+ Consider possible futures by developing simulations
Act in response to an external stimulus (or a prompt with context window)	+ Simulate options and plan out a course of action from an objective

I think of the first wave of GenAI experiences (prompt engineering, RAG) as model-free learning systems. While they were useful for processes like virtual assistants, chatbots etc. It is also clear that we need a new paradigm for tackling some of the more challenging problems in the enterprise. We are increasingly seeing Compound Systems (see previous post), which I think of as model-based learning systems. Lest all this sounds theoretical, let’s take a Supply Chain example. Supply chains are truly complex systems, which is why they have been notoriously difficult to crack when it comes to software driven optimization. However, I think the notion of Compound systems presents an opportunity to make a meaningful dent.

Retail: Store Inventory Intelligent Assistant

Store level inventory management is one of the thorniest problems: At the edge of Retail Supply chains, they present the following challenges: 1/ Need to cater to hyper-localized demand patterns that require signals at a store level 2/ Retail shelf-space constraints which demand ruthless prioritization 3/ Small order quantities, all of which lead to a high noise-to-signal ratio and constant nightmares for Store Managers.

Current Solutions: Model free learning

In the current solution paradigm, Machine Learning models recommend target safety stock levels at a store/product SKU (stock keeping unit) level based on historical demand patterns. These systems have gotten better over the years, incorporating capabilities like SKU/location specific demand distributions (not just Gaussian), simulating re-order points and order quantities at a granular level, alerting systems on potential stock-outs to name a few. These are typically delivered through a dashboard or a report that is consumed by the Store Managers to drive decisions. This is a learning system in that the safety stock levels are updated based on refreshing the ML models. It is apparent that this approach has several limitations:

The safety stock replenishment algorithm is designed to optimize a localized objective (e.g. availability for a given SKU) as opposed to working towards a global objective (e.g. minimize stock-outs at an overall store level, given the space and capital constraints)
No simulation capabilities for the Store Manager to explore alternative objectives and evaluate scenarios (e.g. store-layout and placement of merchandise for specific events)
The user interface is via dashboards that lack the natural language conversational interface. All system level limitations.

The art of the possible: Model-based learning

A model-based approach to this domain encourages a Compound System design, which takes a system view of the problem. The design would include (and not limited to) the following domain-specific agents:

Objective-agent that would generate an objective function and constraints based on business priority. A typical objective function would be to maximize the overall Service Levels across a set of stores and SKUs for a given budget, subject to operational constraints like logistics, warehouse space and so on.
Profiler-agent that would ingest historical demand patterns and determine the probability distribution for the SKUs at every node in the Supply chain.
Safety Stock-agent that would call ML models that are trained on historical data and predict safety stock, at a SKU/node combination.
Simulator-agent would generate a model of the supply chain at an aggregate level, explore variants, rank them based on the objective function and pick the most optimal alternative.

Supporting these domain specific agents, would be a class of horizontal capabilities:

Text-to-SQL to translate natural language questions from end-users to SQL queries.
Evaluation systems to measure and monitor quality. This is critical, given the financial and business implications leading to over-stocking (impacts working capital) or under-stocking (Lost sales and erosion of customer trust)

All of this could be managed by a Supervisor Agent that can monitor the overall metrics, and trigger necessary remedial actions (e.g. readjustment of safety stocks, trigger replenishment orders) based on real-time signals.

At this point, the discerning reader would point out this entire solution could be modeled with a combination of statistical tools and human ‘agents’ (actually, experts). And that is entirely true: and the primary reason why the existing systems have not been able to tackle this is because of the complexity involved in transitioning from a model-free learning to a model-based learning system. One that we have a shot at solving now – thanks to the underlying technologies to build production-quality, enterprise grade Compound Systems with Agents.

And one final thing: none of this is possible without the Agents being able to access underlying enterprise data (historical transactions, supply chain network data, inventory carrying costs to name a few), and even more critically, the Agents aligning with the existing data and AI governance policies. Which is why I am excited about Compound Systems and the path it can create to graduate from model-free learning to model-based learning. Exciting times ahead!

Share this:

Leave a comment Cancel reply