Stochastic Intelligence

A random walk across AI, tech, science, medicine, and academic life.

  • As I mentioned in my previous essay, modern AI is mostly about predicting the unknown. And in virtually every interesting situation, prediction forces us to confront uncertainty. We don’t get to output a precise answer with full confidence. Humans don’t, and neither do our machines.

    Much of AI research has not taken this seriously. It’s one reason today’s models often sound so overconfident – they learned that from us. And we’re hardly paragons of calibrated reasoning (see this great video).

    Given that  prediction is central to AI, let’s unpack what we mean by uncertainty.

    To start, assume a deterministic universe – Newton’s world – where randomness arises only because we lack information. This is not the whole story. Quantum mechanics shows that some phenomena really are intrinsically random. But in our daily lives, the Newtonian approximation works just fine. Most of the uncertainty we face is ignorance, not cosmic randomness.

    Probability is the tool we use to formalize this. Historically, probability emerged from games of chance like rolling dice, where repeated trials allow us to define probabilities as long-run frequencies. These are objective, at least in principle: roll the die enough times and the empirical frequencies align with the theoretical ones.

    But this “frequentist” view is narrow. Bayesian thinking introduced a deeper idea: probability as a measure of belief. This notion of probability is subjective, but rigorously updateable and practically useful. This is the interpretation that underlies most discussions of uncertainty in prediction. See this fantastic book on the topic.

    Given data X and a target Y, our uncertainty is encoded in a posterior probability: p(Y | X). If Y takes on discrete values, this is a probability vector. If Y is continuous, we talk about a probability distribution – often something parametric like a Gaussian, expressed through a mean and variance.

    Evaluating such predictions requires repeated observations. With a single sample (datapoint) Xi  and a posterior p(Yi | Xi ), you cannot meaningfully judge whether that probability was “good” or not. You need many trials to compare predicted probabilities with actual outcomes. This is why probability evaluation gets slippery and why comparing, say, election forecasts is notoriously hard – we have very few data points.

    A common method is to add up the log-probabilities that predictors assign to events that actually occur. The predictor with the higher total “wins.” This is the essence of maximum likelihood. But maximum likelihood can reward unjustified confidence and punish well-calibrated caution.

    Calibration matters. For example, in a binary classification task, if a model outputs a predicted probability between 0 and 0.1 for certain samples, then in a well-calibrated model roughly 0-10% of those samples should belong to the positive class. And yes, you could achieve perfect calibration by ignoring X and always outputting the prior. For a fair coin, always predicting 50-50 is perfectly calibrated. But it’s also perfectly useless.

    In practice, accuracy and calibration seem to trade off. This isn’t a theoretical rule as much as a recurring empirical pattern – perhaps a byproduct of our learning algorithms. And it naturally raises a question: as we build ever more accurate models, can we push uncertainty to zero while staying calibrated?

    To answer that, we need to ask what uncertainty is meant to represent. Even in Newton’s deterministic universe there are two fundamentally different sources of uncertainty.

    The first is epistemic uncertainty. This is the uncertainty that comes from limited data, imperfect models, incomplete knowledge, and constrained compute. It’s the uncertainty of ignorance, and science and engineering are, in a sense, centuries-long attempts to reduce it. Gather more data. Build better models. Expand our theoretical understanding. Push our hardware. All of these shrink epistemic uncertainty.

    But my focus in this essay is the second kind.

    Aleatoric uncertainty is the irreducible randomness of p(Y|X) itself. Not randomness in the laws of physics, but randomness because X simply does not contain enough information to pin down Y. A fair coin toss is the cleanest example. No amount of data – no amount of internet-scale context – will help you predict the outcome better than 50-50. Most real-world prediction tasks share this basic structure. Weather forecasts, hospital readmissions, sports outcomes – countless hidden factors play crucial roles, many of them unknowable in practice.

    This produces a prediction ceiling: the accuracy an oracle could achieve if given exactly the same input X. And crucially, the oracle will still often fall short of 100%.

    This ceiling is rarely acknowledged in AI discussions, especially in healthcare (see our recent letter on this topic). People often talk about “minimum acceptable performance” in terms of an absolute threshold – e.g., an area under the curve of 0.80 or above – as if this were a law of nature rather than a human convention. But what if the Bayes limit for a given problem, with the information available, is 0.75? What if no model – no matter how vast the dataset or how advanced the architecture – can exceed that?

    If patient readmission depends on unmeasured social support networks, personal behaviors, or random life events, then those unobserved variables impose a ceiling. Before judging prediction performance, we must understand how much of the outcome is governed by chance (our ignorance) given the input we have. If an oracle can’t do better than 75%, it’s unreasonable to expect a real model to surpass it.

    This brings me to what I think is an overlooked point. The prediction ceiling depends entirely on what X is. And the most powerful way to lift the ceiling is not through bigger models or more training data, but through richer, more informative inputs.

    Machine learning research often treats the input as fixed: an image, a sentence, a lab test. But in the real world, we get to choose what we measure. We design sensors. We invent imaging sequences. We develop new assays. We conduct new surveys. We discover new biological markers. We create entirely new data modalities. Better scientific understanding leads to better measurement, and better measurement leads to better prediction.

    The next phase of AI should not ignore this. Vision and language models benefit from the accident of abundant data, but the domains where AI could make the biggest difference – medicine, biology, climate, materials, human behavior – will require entirely new kinds of data, captured in new ways, at new scales.

    If we want AI that truly transforms our world, we must do more than build bigger models.
    We must measure and observe the world more wisely.
    Only then can we raise the very ceiling of what is predictable.

  • Imagine you’re asked to guess someone’s age from a photo.
    If you can see their face up close, you’ll have useful clues – wrinkles around the eyes, graying hair, even the expression in their gaze. Their clothes or posture might offer more hints. With a full picture, you might confidently say they’re in their 50s. But you’d rarely claim something precise like 53. You know you can’t be certain.

    Now imagine the picture is blurry or taken from far away. Suddenly, the task becomes harder. Maybe you can still tell if they’re a teenager or an older adult, but that’s about it. The less you can see, the less precise your guess. And if the photo is from another culture or era – with unfamiliar clothing or setting – your confidence drops further. Even the signs you normally rely on, like wrinkles or teeth, might mislead you. (Botox and cosmetic dentistry don’t help much with this kind of inference.)

    Now let’s make the task even harder: instead of guessing how old the person is, I ask you to guess how long they have left to live. This isn’t as hopeless as it might seem – you might take your age estimate and subtract it from the average life expectancy, adjusting based on context. Someone in a hospital bed, for example, might not have long. Someone lounging in a Hamptons living room might have decades.

    From a technical standpoint, both problems – estimating age or remaining lifespan – are prediction problems. You’re inferring an unknown quantity from patterns in the data: the pixels in the image. If you wanted to test how good your guesses are, you’d compare them to reality. Predicting “53” will almost always be off by at least a year or two. Saying “between 40 and 60” will be right more often, but less useful. Accuracy and precision always trade off.

    How would you improve your guesses? You can envision a Sherlock Holmes type approach that does a more thorough job of combing the image pixels for any sign. You would, of course, have to aggregate these signs into a final prediction. What type of thinking would this entail? Logical or probabilistic reasoning? A biological understanding of aging? Leaning into your knowledge about societal norms, culture, technology, or medicine?

    It turns out, there is one sure way of mastering this task: through experience. Say, if you played a game where you get feedback on each guess, you’d develop an intuition – a “feel” . In Radiology, we call this gestalt: the learned, experience-driven sense that tells you when something in an image looks off, even before you can articulate why. This is essentially how neural networks – both biological and artificial – get good at what they do. They learn from repeated feedback, adjusting internal patterns until they can make better predictions.

    That principle – learning predictive patterns from data – lies at the heart of today’s AI revolution. When ChatGPT predicts the next word in a sentence, when DALL-E imagines a “Van Gogh-style” portrait, or when a self-driving car identifies a pedestrian; each is solving a prediction problem. The models don’t have an understanding of the underlying physical reality. They’re extraordinarily skilled at anticipating what comes next, or guessing the unknown, based on vast experience.

    In the case of generating a “Van Gogh painting” of a new scene, there’s no ground truth, you might wonder – Van Gogh never painted it. Yet after seeing hundreds of examples, one can develop an intuitive feeling for a Van Gogh painting. This, in turn, can be used to train a prediction engine for Van Gogh painting generators. 

    Let me spell this out clearly: Virtually all the remarkable progress in AI today comes from building better prediction engines. Larger models, more data, cleverer training. But intelligence is not just about prediction.

    Life does not simply consist of a series of guessing questions or chess games. To be sure, being able to predict things, events, outcomes precisely can be very helpful. But if you don’t understand the complex causal relationships and mechanisms involved (like the biological processes underpinning aging or disease like cancer, or the physical systems that govern the weather, or the social norms that dictate human interactions), you will be just an observer. And observers cannot be intelligent. 

    True intelligence requires understanding – a model of how the world works, so you can reason about causes, effects, and goals. Prediction tells you what is likely; understanding tells you why, and what to do about it.

    We are only at the beginning of creating systems that can act purposefully – systems that can explore, reason, and discover. To move beyond prediction, AI will need something akin to the scientific method: curiosity, experimentation, and causal understanding. No amount of data or compute alone will get us there. That next step – from intuition to understanding, from prediction to purpose – is where real intelligence begins.

  • I wrote the following piece with the intention of publishing as an opinion article in some mainstream media outlet. It would have required some more editing and developing – but since I now have my own blog, I’ve decided to put it here. -mert

    I saw glimpses of genius in my early interactions with Large Language Models like ChatGPT. Until I started to notice the flaws. And our conversations, once stimulating and informative, began to irritate me. Yet I was hooked. As an academic scientist who has worked on Artificial Intelligence (AI) for over two decades, I could now imagine a future where machines exceled at any cognitive task that we, as humans, can do. Not just the boring and tedious ones, like writing emails or doing arithmetic. But those that demand innovation and creativity, like writing poetry and doing science. Optimists argue that this type of artificial superintelligence, or ASI, will bring us to the promised land, allow us to unlock the mysteries of the universe, and solve all our problems. Pessimists worry that it will spell the end of humanity as we know it and an evil ASI will enslave us for its own good. 

    Importantly, though, there seems to be an emerging consensus that ASI is around the corner, and we have figured out how to get there. We need to do that quickly, so that we can make sure the ASI is good and serves our needs.  That is why Big Tech is rushing to build hyper-scale datacenters to devour all our data, believing that this will, soon enough, give birth to ASI. Yet we are betting the farm on a mirage.

    The proposal is tempting. That we have figured out how to create an all-knowing artificial brain that can make the best decisions, give the wisest advice, solve the hardest problems, and do the most mind-numbing tasks, with incredible speed and efficiency. There seems to be an implicit understanding that like the Manhattan project or the landing on the moon, we know what the finish line for ASI looks like. The problem is we don’t. We are chasing shadows. We think we can enumerate all the different capabilities of our brains and turn them into quantifiable tests. The moment we do this though, we realize there are all these other aspects of our cognitive worlds that we are not capturing. 

    Our brains don’t work in a vacuum, on a stream of clearly defined problems. The human existence happens in a biological body, while perceiving and interacting with an ever-changing physical world, and, importantly, shaped by our societal norms. We can all agree that the human brain is proof that intelligence is achievable. Yet its capabilities are not constant, universal, or even completely measurable. Thus, we should have no reason to believe that an ASI, which is supposed to represent the pinnacle of intelligence across every imaginable cognitive domain, can exist. No wonder, then, AI researchers, like myself, often complain about constantly moving goalposts. 

    Big Tech’s big bet has another secret. It rests on the hypothesis that we do not need to list or measure cognitive tasks one by one to reach ASI. We merely need to feed a gigantic neural network model with massive amounts of data, set some generic objectives; and, miraculously, intellectual capabilities emerge. This so-called “scaling hypothesis” is convenient because it reduces the pursuit for intelligence to computation, i.e., crunching numbers, which we have now mastered. It also dispenses with the need to develop theories and models of the world, which is the never-ending struggle of science.

    Our supposed march toward ASI has therefore no precedence – not because once complete, it will represent the mother of all technologies. But because, our scientific and technological progress has historically relied on testability and verifiability — and the pursuit of ASI lacks this type of rigor. Furthermore, the scaling hypothesis that we are betting on is based on magical thinking and discards a universal law in nature: growth, no matter how fast at first, will always slow down. There are some recent signs that, on the tasks we have tracked so far, we have reached the phase of diminishing returns with our current AIs.

    To be sure, our brains are capable of incredible feats and trying to replicate these with artificial technologies can be immensely beneficial – both in terms of the utility of these technologies and for gaining insights into the infinite complexity of the human existence. However, I believe that scaling our way to ASI is a pipe dream that is wasting too many valuable resources, negatively impacting our environment, and distracting us from working on real and immediate problems, where we can agree on goals and measure progress. Instead of spending hundreds of billions of dollars chasing the fantasy of a God-like technology, we could be building specialized AIs to tackle important, yet concrete questions like how to prevent Alzheimer’s disease and produce sustainable energy cheaply and at scale. This type of focused AI research, I believe, is what we should be investing in.

  • OK. Let’s do this. I will try my best to keep up this blog that will be a place for me to lay out my thoughts on technology, AI, medicine, science, academia, etc. After more than two decades in academia, I think I have something to say, beyond my technical publications and in the public domain. If you are a student, researcher, academic, or anyone who is interested in the aforementioned topics, I’m hoping that you will find these pages interesting/useful/etc. I’m sure it will take some time to develop my voice. I’m also sure that much of what I will post will turn out to be wrong, incomplete or misleading. I’ll try my best to correct course as I change my mind – that is what scientists are supposed to do! Transparency is something I strongly believe in, and I also don’t think we should all agree on everything. In fact, contrary to popular belief, I think that inconsistencies and shape-shifting are hallmarks of intelligence. As I wrote in a technical pre-print years ago, stochasticity (or randomness), in my mind, is a crucial ingredient of intelligence. An animal, or an artificial system that always behaves the same way under the same conditions cannot, therefore, be intelligent. There is irreducible (aleotoric) uncertainty about the world and the environment is constantly changing. Adaptation requires rolling the dice, every now and then. And that’s why I decided to call this site “Stochastic Intelligence.” Also to suggest that my posts will have a lot of randomness (timing, content, etc.) which I hope you will find interesting. Let’s see how it goes.

    -mert