Image: “Rain Room” by Victoria Pickering on flickr, licensed under CC BY-NC-ND 2.0.
In the last post, we discussed a suggestion for what probability means. In that framework, when you say a coin has a 50% chance of landing heads, you mean that if you flip the coin a lot of times, you expect it to turn up heads about on about 50% of those flips.
That explanation of coin-flipping is an example of a frequentist interpretation of probability. In frequentism, a probability, loosely, describes how frequently something happens. This makes sense for things that repeat many times. It makes intuitive sense to say, “When I draw a card from a deck, there is a 1 in 52 chance that I will draw the king of hearts,” if I’m talking about how frequently the king of hearts comes up. The same thing goes for flipping coins: it seems sensible to use probability to describe how frequently a coin turns up heads.
In both of those cases, the interpretation leans on the fact that these events are things that can happen many, many times – I can flip a coin thousands of times, or draw millions of cards, and then talk about how frequently heads or kings come up in those trials. (In abstract, those probabilities would be defined in terms of the limit of the frequency as the number of flips or draws approaches infinity.)
But extending that interpretation to an election was a stretch. Maybe you could fit the facts into a frequentist interpretation somehow, but it does clash with common sense. Elections happen once; they’re not experiments that you can repeat many times and get different results.
In the context of elections, it helps to talk about probability from a Bayesian perspective.
According to the Bayesian interpretation, probabilities have to do with uncertainty: how sure the speaker is that an event will occur. Hopefully, these probabilities are backed up by data and careful analysis – otherwise, they’re not very useful – but the probabilities are tied into the speaker’s knowledge, and are not just a description of what happens over many repeated events.
This interpretation ends up being more intuitively applicable in a lot of real-life situations. Let’s see how it plays out for a meteorologist in a lab, predicting the weather. I’m going to significantly oversimplify how weather prediction works in my example, because real meteorology is complicated!
Today, our fictional meteorologist looks at all of the information available to him, runs some calculations, and announces that there is a 80% chance of rain in Queens tomorrow. In the Bayesian framework, a 80% chance of rain can translate, roughly, to “we are pretty sure that it will rain.” That estimate is backed up by facts – including, no doubt, how frequently it’s rained in Queens before – but as we’ll see, the phrase “80% chance” captures how certain he is about rain according to all of the data he has.
Suddenly, the meteorologist’s intern rushes into the room. It turns out that the intern had forgotten to give over the latest humidity readings. Now, the meteorologist has some new facts.
Using that new information, the meteorologist runs some more calculations and comes to a new conclusion. He announces a new probability: a 99% chance of rain, by which he roughly means, “we are almost completely certain that it will rain.”
Did something change? Did the rain clouds actually move closer to Queens as the meteorologist read the humidity data? Clearly, they didn’t. The only thing that changed was how much the meteorologist knew, and that’s what changed his probability estimate.
Bayesian analysis lets you quantify how certain you are. And then, critically, it lets you update that estimate by pulling in new facts. I’m leaving out the details of how this kind of analysis works, but there are some nice references at the bottom of this post.
Predictions about an election can be viewed in the same way as predictions about the rain. FiveThirtyEight proposed about a 65% chance of Democrat victory. They thought that Hillary Clinton would win, based on their analysis of the known facts. But “65%” means they thought that prediction was far from airtight: there was data to support the possibility of either outcome, and a lot of important information was still unknown. (I’m glossing over the way they got this number, which was a lot more complex than just slapping a number onto how sure they felt. And, they actually do run simulations of the election, with some randomness thrown in to account for uncertainty. But thinking about uncertainty, not simulations, is useful in terms of interpreting their conclusion.)
The Huffington Post, meanwhile, also thought Hillary Clinton would win. The difference is that based on their model, they thought that she was pretty much a sure win: they put the odds of her victory at about 98%. They were saying that, according to their analysis of the data, the U.S. was almost certain to have a Democratic president.
A high probability like this isn’t necessarily a sign of overconfidence, just as a weather forecaster who predicts a 99% chance of rain isn’t being presumptuous. It was, however, a stronger assertion about how obvious it seemed that Hillary Clinton would win. In essence, the two predictors were arguing about how clearly the facts, at the present time, pointed to a Democrat victory.
Of course, the Democrat candidate didn’t actually win. When a prediction comes out wrong, a good modeler will try to learn from their mistakes, so they can analyze things better next time.
Epilogue
These posts focused on a philosophical distinction, and talked about what probability means conceptually, according to my understanding. So, they left out a ton of stuff. Most notably, I didn’t talk about how a person comes up with the actual numbers for probabilities. I also didn’t talk about Bayes’s theorem, which is a big missing piece.
My goal is to address some of these things in a later post at some point. In the meantime, I hope this gave you some intuition about what probability means!
References
[1] An article how frequentist and Bayesian approaches can actually cause you to solve problems differently: Frequentism and Bayesianism: A Python-driven Primer. There’s also a corresponding blog post.
[2] These topics are also covered in Chapter 3 of this online Deep Learning book.
[3] For hard-copy books that talk about Bayes’ rule, which I didn’t cover here, try Thinking, Fast and Slow by Daniel Kahneman and The Signal and the Noise, by Nate Silver.
Very clear and explains a lot. Thanks!
Thanks, Michael Eisenbach!