What do you do when data is the problem?

I want to take a quick diversion and put down my thoughts about a specific problem that I continue to see in multiple enterprises. We are working with a mid-size bank on the classic problem of improving digital cross-sell sales. Except that, here’s the rub – there is almost no historical data – for the simple reason that the bank never used this channel in the past. Given that, how do you create meaningful propensity models for personalized targeting?

This is by no means an uncommon situation. As the Data Science mandate continues to expand within organizations, the most frustrating impediment by far is the lack of data – both quantity (not enough data to train algorithms) and quality (extracting meaningful signals from the noise).

So, the question then is – what is one to do in the world of small, messy data? The most common response thus far, has been two-fold: Invest more in building data pipelines (one more EDW project!) and/or initiate instrumentation efforts to better capture data. Both of these are worthy and necessary responses. But is it good enough? What if we reframe the question: what can we do with the current state of data?

[Detour alert!] This reminds me of the debate that has been brewing around in AI circles over the past few years: Symbolism vs. Connectionism. Without wading into the debate itself (which is fascinating by the way), here’s the main idea:

Symbolic AI takes a view that we can represent a solution space with symbols and combinations of symbols (‘reasoning’). And the implementations of symbolic reasoning are what we know as ‘Expert Systems’. Here’s the main point: we need human experts who initially establish the relationships between inputs and outputs through rules and start by expressing them in say, if-then-else constructs.

The Connectionist AI takes an opposite view and starts with the assumption that the relationships between inputs and outputs are not explicit – but can be inferred through observations. More formally, the network discovers the rules from training data (e.g. Artificial Neural Networks) Needless to say, this works with large and diverse set of observations.

This is just scratching the surface – there is lots more to this. Tons of reading material out there – a great place to start is Master Algorithm by Pedro Domingos, which gives an excellent tour of the different schools of thought in AI today. More on it later. [End detour!]

Back to our problem – how do we wade through this world of small, messy data? Not surprisingly, the answer could be with the tacit knowledge that has collectively accumulated in humans (‘experts’) and then build on it by learning from the world? All that sounds fine, you might ask – but what does it mean?

Here is how we are helping our bank along the journey of using data science to impact digital sales in a thoughtful, phased manner:

Step-1: Ask the ‘experts’: A good starting point is to build a rules engine, using a combination of tacit knowledge (e.g. millennials tend to do better with email campaigns – so focus on that cohort) and historical inputs from other channels (e.g. call-center campaigns tend to work better with clients who have a life event coming up – e.g. retirement, moving houses) and so on. In other words, we are building an ‘expert system’ that creates rule-based target lists (without of course, propensity scores). And then off we go to the races with a clear process of refining the rules based on response rates. We are essentially helping build a Symbolic AI system to help accelerate the bank’s digital journey.

Step 2: ‘Learn from the world’: Take a leaf from the digital natives who have near-perfected the art of crowd sourcing of decision making through experiments. Since our bank has a cold-start problem (i.e. no historical data), we start with taking each of the initial cohorts defined by the experts and running experiments on each cohort. From straightforward A/B experiments to more adaptive tests (e.g. Multi-Armed Bandit strategies) to adaptively learn on how cohort response rates change with campaigns.

We like to think of Step-2 as accelerated learning for the expert system. And perhaps more importantly, we expect this to build the right data foundation that will then set the stage for ML models (enter Connectionist AI) to learn and continue to improve the targeting process. And if all goes according to plan, we have the flywheel effect going.

And so, there you are: hand-wringing about how the ML models are not working because of data issues; which in turn makes your CFO raise uncomfortable questions about your digital and data science investments. Maybe, just maybe, you are missing the human intuition and experience that can help you get going on this journey.

Share this:

Leave a comment Cancel reply