Bayes: What is the big deal?

It would be no understatement to say that Thomas Bayes was an obscure figure in the much-storied annals of mathematics and logic for a better part of the last 100 years. And then something happening in the last 10 years and now, he is everywhere – and has even entered mainstream culture by the eponymous theorem making an appearance in Big Bang Theory (!)

My belief: If it shows up in Big Bang Theory, it must be important!

And as I have pointed out (repeatedly) in the last few blogs, there is a strong case for looking at Bayesian approaches to problem solving in an increasingly VUCA world.

Just who was Thomas Bayes?

Some history first. Rev Thomas Bayes was a theologian (who dabbled in mathematics on the side) from the 18th century. This was the time when the main interest in probability was in the gambling circles – given that, it is odd that a man of the cloth should be interested in this area. Which was probably good – he was not interested in the problem of odds of a certain outcome given different causes. Instead, he wanted to know about “inverse probability” of the causes given the results: when we observe some evidence, what is the likelihood of its different possible causes? Some say that this was prompted by David Hume (the celebrated skeptical philosopher of the day) who argued that reports of miracles are more likely to stem from inventive witnesses than the actions of a benign deity (now that’s a topic for some other day). Whatever the trigger might have been, his theorem has something of a spectacular revival in recent times – it would not be far fetched to say that this theorem has spawned a revolution of sorts in the Machine Learning world almost 300 years later.

The theorem itself (most of us remember it from high school) looks fairly simple – most of us have learnt it as a trivial bit of probability arithmetic. But as it often happens, that is missing a more fundamental and deeper concept: The probability that a belief is true given new evidence equals the probability that the belief is true regardless of that evidence times the probability that the evidence is true given that the belief is true divided by the probability that the evidence is true regardless of whether the belief is true. Or: Initial belief plus new evidence = new and improved belief.

Calculation of probabilities in a belief network

What is the big deal and why now?

The really important and game-changing insight that Bayes gave to the world was that of inductive reasoning that starts with an initial belief and then builds on it. And here is the most important point: the plausibility of your belief depends on the degree to which your belief explains the evidence for it. The more alternative explanations there are for the evidence, the less plausible your belief is. In short, beware of false positives! And it is this insight that has enabled Bayes Theorem to become front and center of a large number of applications across areas – from autonomous cars to deciding treatment plans based on diagnostic tests.

On a side note, this also turned out to be the main problem with Bayes’ theorem that got the orthodox statisticians all riled up: how can you rely on personal hunches to guide scientific reasoning? And the world of ‘significance tests’ took over: one thing led to another, and the first wave of Data Science was completely dominated by the narrative of hypotheses that lived and died at the altar of observed data. There is an obvious flaw in this line of thinking: we don’t make decisions in a vacuum – we almost always start with an initial hunch and then adjust them in light of experience. One of the more storied cases of Bayesian thinking is from World War II when Alan Turing and other code breakers from Bletchley Park would start their code-breaking work based on initial hunches and rectify as the data would come in.

Bayesian techniques can be computationally intensive – especially when well-defined prior probabilities are not available. Starting in the 1980s, computer scientists started developing ‘Bayesian networks’ as graph-based systems for simplifying Bayesian inferences. The cross-over to Machine Learning accelerated in the last few years and increasingly, the core Bayesian ideas have become mainstream in multiple areas. Which brings us to today.

Bayesian Belief Networks

Now on to the real world. And we go back to our favorite example: in a post Covid-19 world, a bank’s notion of risk has changed dramatically, and all the current risk prediction models are well, useless. Once we move past the difficult idea that we have to figure out a way to make predictions where there is little data, we can embark on the journey of simulation. And for that, we first need to model the real world. Enter Bayesian Belief Networks (BBN)

There are two basic ideas that we first need to appreciate:

  1. We start with a set of beliefs – our core intuition on how the real world can be modeled. We need to be able to formalize these beliefs as variables – and more importantly, be open to updating this belief as data comes in. This could either be updating the nature of the variables (i.e. the probability distribution) or adding new variables or most importantly, the inter-connections between these variables.  
  2. The variables in a ‘belief system’ are inter-connected (remember, everything in life is a network!) and can be represented as a graph (more formally, a DAG – Directed Acyclic Graph):
    1. Each node is labeled by a random variable, with a domain for each variable
    1. A set of conditional probability distributions given P(X|parents(X)) for each variable X. These are the inter-connections between the variables
  3. A BBN is acyclic by construction and the way you chain them gives the ordering. An important element of belief network is causality. In other words, a node is influenced by its parents and more importantly, it can be modeled with, you guessed it, the Bayes Theorem.

And so there you have it, a definition for a BBN. For instance, in our example, the terminal node will be ‘Loan Default’ and the probability is a conditional event dependent on its parents, which are in turn, conditional on their parents and so on. How you set up the overall network is of course, driven by the business understanding – which is why these BBNs have been called expert systems.

So, what next?

I started this mini-series of blogs with the need to create models that could work with tail events and working with small data. And as we have seen, there is value in adopting a Bayesian approach to better navigate through this increasingly uncertain world, it is not good enough to just rely on data, but to start with the assertion that human intuition is becoming critical as never before. This post was to nudge the thinking on how to make this happen by looking at Bayes and the revolution his work spawned a little deeper. We are just scratching the surface here – some links below.

In the next (and final in this mini-series), we will look at simulations – starting with Markov, who I must say, was a far more interesting character than Thomas Bayes, though not quite as influential.

Further reading:

Excellent primer on Bayesian probability: http://yudkowsky.net/rational/bayes

Good overview of Bayesian Belief Networks: https://artint.info/html/ArtInt_148.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: