In one of the many corners of the internet, there was a lot of excitement about the discovery of a new black hole (Sagittarius A*). For anyone remotely interested in Astronomy and scientific research in general (like me), this was a pretty big event – after all, this is only the 2nd time that scientists have managed to piece together images of a black hole. What really caught my attention was the mechanisms they used to do this:
- Capture mechanism: Created a network of telescopes from around the world and studying signals from them together (essentially, creating a telescope as big as the earth). And this is can scale out horizontally – i.e. they can add more telescopes as they go along, expanding the observability universe (poor pun, but couldn’t resist it!)
- Data Processing mechanism: The raw data itself ran into over several Petabytes that they then had to stitch together to create a complete dataset. Factoid: apparently, the data volumes got so big that the teams had to ship hard drives at times.
- Simulation mechanism: This was the really massive achievement here. By throwing huge computational power, the research teams ran simulations to infer the black hole (after all, you cannot actually observe a black hole). Apparently, it took 100 million CPU hours over 5 years, 300-plus researches from 80 institutes to piece this together.
In other words, this was as much a triumph of scientific research as it is a remarkable testament to the power of data and computation. When you are looking to find a needle in a haystack, you can improve the odds of finding the needle by increasing the size of the haystack (i.e. the scope) and looking through the haystack faster (i.e. the speed). And once you find the first needle, the time and effort to get to the next one and the one after that keeps coming down. In other words, scale makes all the difference.
You might think that this is cool and just another scientific curiosity that slips past us in a world that is consumed by so many other real-life issues: in other words, so what? Here’s a connection: in my last post, I had talked about how we would do well to leverage the power of data and computation to get a grip on the ever-increasing complexity in global Supply Chains. My belief is that the exact same mechanisms as above can (and must) be leveraged if we are to move the needle on how to get better at getting better predictability and reducing uncertainty in decision making.
Manufacturing environments constantly grapple with the issue of planning for parts – from Raw Materials (e.g. a resin that is used as an insulant) to components (e.g. a chip) to sub-assemblies (e.g. a functional processor unit). At different points in the supply chain, what is the right amount to stock so that you are neither overstocked (and invite the wrath of the CFO for locking up working capital) or understocked (and invite the wrath of Manufacturing for holding up the assembly line). Traditionally, the ERP/APO systems looked at historical data and generated the recommended stocking levels. All that works well – until it doesn’t when a shock hits the system. And there was the safety stock to hedge against such situations – except that of late, it is not only the magnitude but also the frequency and the variance of these shocks seems to be going up all the time. How do we solve for that?
Here’s how the same mechanisms that I mentioned above can help:
- Capture mechanism: Capture all the demand variations across products from history. Use that to truly understand the statistical distributions – and now that we don’t have the computational constraints, we no longer need to make the simplifying assumption of a normal distribution. Or even force a time invariant distribution and accept a family of distributions
- Data Processing Mechanism: Once you have the raw distributions, use them to generate synthetic data – and here’s where you can introduce noise in 2 ways:
- A simple (and highly effective way) of doing this is to tweak the distribution parameters and generate streams of observations. Think of them as micro-shocks
- Introduce external shocks. These could even be major ‘black swan events’ (e.g. what if an entire sourcing country goes completely offline?)
- Simulation mechanism: Run through simulations at a massive scale. If you running a Genetic Algorithm, the chances of getting to a better approximation of reality go up as you continue to run more simulations and incorporate more variants (or mutations.
Needless to say, all this requires thinking at a scale that is possible today with the data and compute capabilities we have at our disposal. And once you set this infrastructure up, you will create more data. More data means better simulations and so on – setting the virtuous cycle of the data flywheel in action.
The scientific community pieced together the first black hole image in 2019 (M87*). The 2nd one (Sagittarius A*) took a mere 3 years after that. My guess is that the 3rd, 4th and so on will not take that much longer – now that we have the infrastructure in place. Scale makes all the difference.
More on the black hole discoveries: https://www.hpcwire.com/2022/05/13/supercomputing-an-image-of-our-galaxys-supermassive-black-hole/
Lots of videos on black holes. Here’s a primer : https://www.youtube.com/watch?v=i1fhtjhL_3k