It seems that the vast majority of mathematical formalisms in cognitive science, psychophysics, and neuroscience almost always deal with probability theory in one way or another. This includes formalisms for how the brain models the world using probabilistic models, how constant stimuli elicit variable neural and behavioral responses, how words and concepts map onto the world, and more.

The thing about formalisms, though, is that they are often open to more than one interpretation. On its home turf of formal mathematics, probability theory makes statements about the behavior of sets of things (like the set of all integers), functions on those sets (like P(k)=0.34), and other things you can derive from them. What it *doesn’t* say is whether a statement like P(number\_of\_puppies > 1)=0.3 tells you anything about the real-world possibility of encountering two or more puppies. On the one hand, there is the formal mathematics stating that there is a **random variable** number\_of\_puppies with a defined **probability mass function** P which, when integrated over all integers larger than 1, evaluates to 0.3. On the other hand, there either are adorable small dogs or there are not, and it seems that this should be a closed case. The mathematics gives us a **valid** statement; but **interpretation** of such a statement as it pertains to real-world events is in the realm of philosophy. The same goes for neuroscientists and cognitive scientists – we need to be sure, independently of the mathematics, that statements about probability are grounded in reality. This means that first we need to have a solid understanding of what probability means.

Anybody using probability theory (or information theory, decision theory, signal detection theory, rate distortion theory, etc.) ought to grapple with these philosophical questions. In fact, everybody should do this because it’s interesting and fun. In the remainder of this post, I will attempt to outline my own perspective.

## True Randomness vs Ignorance

**tl;dr** we use probabilistic models not because something is *random*, but because we’re missing some information about it.

Although it’s intuitive to think that the ideas of probability only bear on *random* events, here I’ll try to convince you that approaching a problem probabilistically is more an admission of ignorance than a claim of randomness. Or, if you like, to claim that a system is random is a claim of ignorance. Note that by “random” I don’t mean “completely unpredictable.” For example, if you roll and sum two standard dice, the result is both “random” as well as more likely to be a 7 than a 2. That is, P(die_1+die_2=7) > P(die_1+die_2=2)

Speaking of dice, probability theory has its historical roots in gambling. Back in the 17th century, there was an immediate practical application to the idea that, when you roll a 6-sided die, there is a 1/6 chance that each number comes up, rather than attributing success to luck or to fate.

Gambling examples help reveal the **dual nature of probability** as both a *frequentist* count of the fraction of times an event happens after many, many repeats, and as a *belief *about single events that haven’t happened yet.

In the **frequentist** perspective, the statement P(die=4)=1/6 is a claim that, repeatedly rolling the die *under the same conditions* N times, the fraction of times 4 appeared will approach 1/6 as N gets large. In this view, it makes little sense to talk about the probability of a single isolated event, just as the statement P(number\_of\_puppies > 1)=0.3 could not be interpreted as a “yes” or “no” answer to the Big Questions, “will there be puppies? here? soon?”

This brings us to the notion of probability as a **belief**. Under this interpretation, we don’t need N repeats of the same circumstances as N gets large. Instead, probability is like a graded prediction for *each* future event. Based on the event’s actual outcome, it affords us a graded level of satisfaction (if the actual event was predicted with high probability) or surprise (if the event was predicted with low probability).^{1}

Frequencies and belief perhaps should be related, but there is no mathematical law requiring them to be. Again, the *interpretation* of formal mathematics is outside the realm of formal mathematics. There are, however, plenty of practical and epistemological arguments why probability as a belief should be **calibrated** to probability as a frequency. You may strongly believe that my dice are loaded and that 3 is the most likely number. But if it is truly a fair die, then I will always win money in the long run by having beliefs that are closer to the true frequencies (assuming bets are based on beliefs).^{2} Calibration of probabilities as beliefs is a practical concern in machine learning; it is dangerous for a virtual doctor to make poorly calibrated diagnoses, even if it typically gets the “most likely” answer correct. For example, if a patient is 55% likely to have disease A and 45% likely to have disease B, but a model spits out 90% for A and 10% for B, then doctors may proceed over-confidently with the wrong treatment. Even though the math police do not make arrests for poorly calibrated beliefs, it’s still a good idea to get them right.

Perhaps the best way to calibrate your beliefs would be to take the frequentist approach and repeat an experiment N times and simply count how many times each outcome occurs. Immediately you run head-first into another philosophical conundrum: what counts as a repeat? Let’s look at the dice example more closely. Can you roll a jar full of different colored dice, or do you need to roll the same die each time? Does it count if you roll it on different surfaces? If each of your N rolls is under slightly different circumstances, then are you justified in aggregating all the results together when estimating frequencies? Clearly there are many factors influencing the outcome of the die (see the next box-and-arrow diagram)

If we let all of the “causes” in this diagram vary freely, we’ll see that the results are, well, fairly unpredictable.

Next, you might (rightly) suggest that mis-shapen dice into our experiment should not count as “repeats” of the experiment. Remember: the goal of the experiment is to estimate the actual frequencies of a 6-sided die. Let’s see what happens when we control for the shape.

But why stop there? There is quite a bit of variability between the surfaces still. Might that affect the results? (In the above animations, reddish = bouncy and blueish = slippery). Let’s control for surface variety.

Great. Surface variations are now accounted for. Still, our die experiment still has a lurking cause. Let’s deal with the final variable.

Hmm – we get the same roll every time. Dice are the gold standard of random. They show up in every probability theory textbook. Probability was *invented* to make sense of them. Yet, given more and more control of the **context**, their randomness evaporates. Much of the world is, of course, deterministic in this sense. The frequentist idea of probability requires that we precisely repeat an experiment N times, just not too precisely. This is not at all a criticism of the frequentist perspective; having well-calibrated **beliefs** about a well-controlled die would, of course, mean predicting that the result is almost always the same (*conditioning* beliefs on some additional knowledge is akin to *controlling* more conditions of a frequentist’s experiment).

So, dice are not really “random” after all, as long as we know enough about the context and have all the computing power in the world to aid us in making our predictions. This is where probability comes in. The reason we say a die has a 1/6 probability of landing on each number is because we never roll it the same way twice, and we never know the exact physical parameters of each roll. Probability theory allows us to make sense of the world despite our ignorance of (or inability to compute) the minutiae that makes every event unique.

(In case it is not clear by now, I am using the term “ignorance” to simply mean missing some information about a system. Complex systems have many interacting parts; if only some of them are observed, then the unobserved parts may impart forces that appear “random” in many cases.)

##### Quick side-note on randomness in quantum physics for those who are interested…

Dice may be the gold standard of randomness to statisticians, but the quantum world is the only “true” source of randomness to a physicist. Modern physics has largely accepted this philosophy that “uncertainty” in quantum physics is random in the truest sense of the word. Take the Heisenberg uncertainty principle, which states that the better you know a particle’s location the less you can know about its momentum (or vice versa); there is no “less ignorant” observer who could, even in principle, make a more accurate prediction. This is true randomness. Einstein, for what it’s worth, always hoped that quantum physics might some day be understood in deterministic terms, where what we currently think of as “random” might be explained as the workings of yet smaller or stranger particles “behind the scenes” that we simply haven’t observed yet. That is, we see the quantum world as “random” because we’re ignorant of some other hidden aspect of the universe.^{3} When Einstein wrote

God does not play dice with the universe.

clearly he had in mind a more random kind of die than I’ve described here. Perhaps he meant

God does play dice with the universe, but we can’t see how the dice are rolled (yet)

## Are neurons really random? Or, what did one brain area say to the other?

**tl;dr** maybe, but given everything above, it’s a moot point regardless.

A commonplace in neuroscience textbooks is the statement that neurons are stochastic. The classic example is that the same exact image can be presented on a screen many times, but neurons in visual cortex never seem to respond the same way twice, even when reducing our measurements to something as simple as the total spike count (\mathbf{r}). Models typically assume that the stimulus – and some other factors – set the *mean firing rate* of the neuron (\lambda), but that spikes are Poisson-distributed given that mean rate:

P(\mathbf{r}) = \frac{\lambda^\mathbf{r}e^{-\lambda}}{\mathbf{r}!}

As with the dice above, we can ask what unobserved factors contribute to the apparent randomness of sensory neurons. A back-of-the-envelope sketch might be the following:

Sure enough, there is evidence that the “randomness” of sensory neurons begins to evaporate the better we control for eye movements, fluctuating background attention levels, etc. The more we understand and can control for these latent factors influencing sensory neurons, the less the Poisson distribution will be the appropriate choice for computational neuroscientists (and a few alternatives have been proposed under the banner of “sub-Poisson variability” models).

The 1995 study by Mainen and Sejnowski^{4} tells this story beautifully in a single figure:

Just like the dice above, given enough control of its inputs, the randomness of single-neurons seems to evaporate. Does this mean that probability is the wrong tool for understanding neural coding? **Absolutely not!** In the dice example, perhaps we should have stopped after controlling for the shapes of the dice. Let the rest be random.

Similarly, there is a “sweet spot” to the amount of control we should exact in our models of neural coding. Too little, and we underestimate their information-coding capabilities. Neurons with super-Poisson variability may be an indication that we are in this regime. Similarly (and perhaps surprisingly), too much precision may over-estimate the information content of a neural population. Fitting a model of individual neurons’ spike times and how they depend on the stimulus may be an indication that we are in this regime. Between these extremes is a balance between **controlling** for things that should be controlled and **summarizing **things where details are irrelevant.

If you are not convinced that having “too good” a model is a problem, consider this: how much of the information *in* one brain are ever makes it *out** of *that brain area? Just as we needed to know extremely precise details of how the die was thrown (or of the inputs to a single neuron) to make precise predictions, for one brain area to “use” all of the information in another brain area would mean that an incredible amount of detail would need to be communicated between the two. Experimental evidence suggests, however, that cortical brain areas typically communicate via the *average firing rate* of cells, glossing over details like individual cells’ timing or the synaptic states of local circuits. These are all irrelevant details from the perspective of a downstream area. If we, as neuroscientists, want to understand how the brain encodes and transmits information, then using probabilistic models is not simply a matter of laziness or imprecision. Probabilistic models are how the experimenter to plays the role of a homunculus “looking at” the output of another brain area, because what one brain area tells the other is only ever a summary of the complex inner-workings of neural circuits.

## Recapitulation

- we use probabilistic models not because something is
*random*, but because we’re missing some information about it. Or, if you like, this is really what we mean when we say something is random. - some things are just too complex to model even with all necessary information, so we fall back on probabilistic models.
- maybe individual neurons are somewhat stochastic, but it’s a moot point regardless, since…
- what brain area A tells brain area B is limited. What neuroscientists measure about brain area A is analogously limited. How neuroscientists “decode” this limited information is a better model of how brain areas communicate than more detailed encoding models (in certain situations).

## Footnotes

**Information Theory**adpots this philosophy and formalizes*surprise*as the negative log probability of an event.- This betting argument is based on the
**Dutch Book Argument**. - Disclaimer: I am certainly
**not**a physicist, so I apologize for any blatant misrepresentations here. - Mainen, Z. and Sejnowski, T. (1995) “Reliability of spike timing in neocortical neurons.”
*Science*268(5216):1503-6.