Last week, we looked at how quantum physics can teach us a thing or two about problem solving in small data environments. Some of the most path breaking work in Physics started with a theory, refined by mathematical models and then proved (or disproved) with experiments. This week, let’s take a look at Biology – which, as we shall see, is a study in contrasts on two counts: first, biology has its roots in the observational world – it is by and large, a study in understanding the underlying principles from data: from the large (i.e. at a species level) to the very small (i.e. at a cell level). Second, there is no dearth of data – if anything, solving some of the most important problems in biology require working with huge amounts of data (think gene sequencing) and with all the problems that come with big data.
The evolution of analysis in Biology
First up, a brief (and definitely simplistic) look at how Biology evolved as a science. You could trace the roots back to the Greek philosophers (them again!) who were interested in trying to understand the essence of organisms – i.e. what makes us human, by trying to define the essential characteristics and use that to describe the organism and species. They called this ‘Natural Philosophy’ and so it remained until the 17th/18th centuries when the science really took off once the empirical approach came into Biology. This was the observational era – with the emphasis on observing and gathering data about the organism. The usual process was the painstaking gathering of data from the natural world; use that to propose hypotheses; validate the hypotheses further with more data; and then propose a theory. The most famous example of course, was Charles Darwin.
Around the time that Darwin was thinking of the big picture at a species level, there was a second movement that was working on going small – very small. Cellular biology took off – and with it, the idea of Reductionism took root. And with it, two main underpinnings of problem solving:
- The approach of linear decomposition: go all the way to the DNA and reconstruct from the basic building blocks
- The approach of building knowledge through experimentation: run controlled studies in labs (e.g. fruit-flies, mice etc.) and then generalize across species (all the way to humans).
This turned out to be the playbook that lasted for many years – and still continues to do so when the problem space is relatively less complex. There are two major reasons why reductionism falls short:
- The assumption that you can clearly map the gene(s) to specific expressions (e.g. disease) is not true. As data continues to show, the mapping is far, far more complex
- Genes are not ‘independent’ – they are in some way, ‘connected’ to each other
Which brings us to now – thanks to compute power, the biology world is well on the way to model the complex systems that organisms really are: the researchers are now able to build models, validate them further (through experiments) and then make predictions. To be sure, this has opened up huge opportunities in clinical and disease research.
There is of course the catch that is true for all modeling: models are point in time representations of the underlying phenomena – and only as good as the data they can learn from. In other words, most machine learning strategies are ‘passive’: i.e. based on a specific problem and the accompanying dataset, you build a model and use that to make a prediction.
But what if the problem itself changes? Or the underlying data itself changes? This has led to some of the computational biologists pushing the envelope with some of the newer methods like Active Machine Learning (to help guide the selection of protein combinations for faster and accurate predictions) and Reinforcement Learning (actively being used to solve one of the most sought after problems in Biology – that of protein folding, i.e. the process of predicting the structure of a protein from its building blocks)
What can we learn from all this?
Some have called the organization as an organism – and I believe, rightfully so. Much like an organism, organizations are entities that are more than just the sum of their parts and are constantly evolving in response to the environment. If we are to run with this analogy, the brief tour of the evolution of analysis in biology could hold important lessons:
- The problem-solving capabilities built from the BI/Reporting teams will have a tendency to take a ‘reductionist’ view of the business – one typical (and myopic) approach that I see ever so often is the tendency to decompose metrics in a tree-like fashion. But that falls apart quickly when you look at relatively complex problems like say, omni-channel behavior where it is not uncommon to see customers hopping channels multiple times: in such situations, how do you measure the impact of a multi-channel campaign over time?
- In the recent past, most organizations have invested in building or hiring ML talent to explore better solutions to some of the most important business problems. Many organizations have managed to make some significant advances in bringing in more sophisticated and domain-specific methods: which has improved the accuracy of models. The best and the most progressive organizations have already started to ask some of the questions that the biologists have been asking: what if the problem is shifting? What if the underlying data is changing? Which is where I think there is an opportunity to explore Active Machine Learning (e.g. if you are trying to re-calibrate your risk models – which you should be in this environment – how can you accelerate the process of building your training data when there is a shortage of labels?) or Reinforcement Learning (e.g. you are trying to improve the engagement in your digital channels – again a desperate necessity in these times – can you run experiments that can learn and improve as they go along?)
Topic for next week: Active Learning Systems and how I think the mindset and approach they bring can be of interest to some of the more common problems that organizations will need to solve over the next few years.
- The Structure of Scientific Revolutions: Thomas Kuhn : great read, if you are remotely interested in how science has evolved and changed over the centuries. And if you are looking for who to blame for coining the term ‘paradigm shift’, he is your man (no, it was not some management consultant)