Last week, I got into a discussion on Forecasting with a friend. He works with operational data flowing from server farms and in his line of work, he is interested in use-cases that go beyond the standard use-cases like load forecasting, what-if scenario analysis for failures to more nuanced (yet critical from an SLA point of view) like phase changes in workloads, contention on systems. While all of these continue to fall in the overall domain of Forecasting and Anomaly Detection, it is also obvious that these scenarios are increasingly complex from a data modeling point of view. For scenarios like load forecasting, some of the best-known supervised learning methods like ARIMA, Prophet et al. work well. Other techniques like Neural Networks (e.g. LSTM) return better results. His question was – can we get better at these edge cases which have the obvious challenge of extremely sparse data to train algorithms? Have we looked at other signals and tried to better triangulate?
This post is NOT about a listing of the relevant Machine Learning algorithms for these use-cases. This is (and continues to be) a well-studied topic. Instead, this is a (slightly) conceptual rumination about time-series in general. But before that a slight diversion.
Richard Feynman made a famous observation: Even though the scope of Physics is vast, It is possible for a physicist to get a broad knowledge of the physical world for three reasons: 1/ There are general principles that apply to many phenomena (e.g. conservation of energy and momentum) 2/ Many complex phenomena depend on a few fundamental laws (i.e. electricity and quantum mechanics) 3/ There is a remarkable coincidence: The equations for many different physical situations have exactly the same appearance. Some have called it the ‘unity of nature’ but Feynman has a beautiful explanation: one thing common to all of these phenomena is space – and once you grasp that, it comes down to understanding the rates of change of quantities with position in space. Obviously, I am not doing justice to this – he devoted an entire lecture to this topic (see link below). Definitely worth a read.
So why did we dip into Physics and Feynman? Just as space is integral to a lot of physical phenomena, is there a similar ‘unity in nature’ that can describe a variety of situations across industries? Could time be that integral factor in all these real-world phenomena?
If you think about this, it does make sense – we make inferences based on historical observations all the time. Forecasting has been around for a while and we rely on time-series models to predict a wide-variety of phenomena: product sales by channel; web traffic by time of day; equipment failure probability – the list goes on. Then we use time-series models for Anomaly Detection by predicting expected behavior and flagging any deviations. To be sure, we have gotten better and better at building learning systems that learn from historical data. And these have broadly been of two kinds: Classical time-series models (univariate/multi-variate models based on system/component level observations of the past); often supplemented by Causal models (using external, environmental factors that drive system-level behavior).
In the very recent past, we have seen wide deployment of heterogeneous networks – both low-layer (e.g. autonomous vehicles) and high-layer (e.g. social networks). Some of the fundamental characteristics of data generated from these networks are: high volume, high variety and high velocity (time, again!). And in almost all cases, the goal is to predict a behavior in the next unit of time. The assisted driving feature in your car is trying to predict the risk of a negative event by combining multiple signals: e.g. visual data of the car in front and the traffic around; speed, brake application data from the car. Similarly, your social media site is trying to describe an event based on video images and texts. In all these cases, there are three underlying factors: 1/ Data is multimodal: structured, semi and unstructured describing the same event. 2/ Data needs to be meaningfully fused across these modes to create features that can then be used to make predictions. 3/ And obviously, Time is most critical in all of these different data streams. Data will have to standardize on the time dimension – i.e. data in different formats and importance (based on source and predictive power) will have to be normalized on the time dimension to be able to make meaningful predictions.
Which leads me to this: Forecasting/inferences is ready to go through a fundamental transformation. We will move on from time-series forecasting to far more sophisticated methods that will be built using multimodal data fusion to build features that will then be used by prediction models. Underlying all these rich variety, volume of data is the fundamental, normalizing feature that runs through all these datasets: Time. Exciting times ahead for anyone who is interested in working with time-series data. A good place to start would be: have we really looked at all the organization processes and evaluated whether time is a fundamental feature? And if it is, chances are that you can use time-series data-driven decisions to make meaningful business impact.
After all this discussion (and a few beers later!) my friend is all excited about the potential of all the streaming data to not just make better predictions, but also solve for all the edge use-cases that continue to haunt SLA driven, operational managers like him.
- Feynman lectures: These get very dense, but highly recommended to read the opening and the final paragraphs for his big insight. https://www.feynmanlectures.caltech.edu/II_12.html
- Multimodal Data fusion: Yes, it is a thing and lots of interesting reading, especially academic. My prediction is that this will creep into mainstream data engineering/management very quickly. https://direct.mit.edu/neco/article/32/5/829/95591/A-Survey-on-Deep-Learning-for-Multimodal-Data