Continuing from my previous post, this is an attempt to define metrics for measuring the quality of Agentic AI Systems
#81: How do we evaluate Agentic AI Systems? Part-1
I recently met with the Head of Data Science at one of the largest media conglomerates and as often happens these days, evaluation and quality of AI Agents came up. One of the most important problems he has been trying to solve is how to evaluate the quality of AI agents, and once in production,... Continue Reading →
#80: Measuring Human and AI Agents
Since 2023, we have been hearing doomsayers talking about the coming AI Apocalypse that will take our jobs, run economies etc. And they keep pointing to the impressive performance gains and the reasoning capabilities of the frontier models. Meanwhile, those of us in Enterprise AI continue to be frustrated by the temptation to treat these... Continue Reading →
#79: Demand Forecasting and Language Models
Demand Forecasting is a widespread application for Machine Learning. Almost every enterprise, across various sectors, keeps investing in this area. None more so than industries with physical Supply Chains, where it is estimated that a 10-20% improvement in forecast accuracy can translate to a 5% reduction in inventory costs and 2%-3% increase in revenues (McKinsey).... Continue Reading →
#78: Model-free vs. Model-based learning
Over the holidays, I read A Brief History of Intelligence by Max Bennett, a fantastic book that traces the evolution of intelligence from worm-like nematodes (microscopic worms) all the way to us humans. The author does an absolutely brilliant job of the journey by breaking it into five key breakthroughs starting from the minute organisms... Continue Reading →