# A 3.8-Sigma Anomaly

By Sean Carroll | February 4, 2012 9:33 am

Every professional football game begins with the flip of a coin, to determine who gets the ball first. In the case of the Super Bowl, the teams represent the National Football Conference (NFC) or American Football Conference (AFC). Interestingly, the last 14 coin flips have been won by the NFC.

Working out the numbers, the chances of 14 coin flips in a row being equal is 1 in 8,192. (The linked article says 1 in 16,000, which comes from 2^14; but that first coin flip has to be something, so the chances of 14 in a row are really 1 in 2^13. The anomaly would be just as strange if the AFC had won every time.) That’s a better than 3.8-sigma effect! Enough to call a press conference, if this were particle physics.

The question is … is this really a signal, or did we just get lucky? Is it a fair coin and the NFC has just been the happy recipient of a statistical fluctuation, or is there something fishy about the coin? Remember Barry Greenstein’s parable about how different people compute probabilities.

And let it be a lesson the next time you’re excited about 3-sigma anomalies.

CATEGORIZED UNDER: Entertainment, Science, Top Posts
• http://www.flisser.com Bob F.

Do they use the same coin every year, or is it a different one?

• http://www.savory.de/blog.htm Eunoia

Let me tell y’all how to make a fair toss even with a biased coin

Toss it twice. Possible outcomes are HH,TT,TH and HT.
Toss twice again if HH or TT come up. Repeat as necessary.

One party chooses HT and the other gets TH before tossing starts.
HT and TH have the same probability, even if the coin is biased.

• S.J. Esposito

This is the only aspect of football I’ve ever really understood.

• Marvin Gardens

Replace the coin toss with which league has sold the most beer or TV advertising. Tie the kick-off to something relevant.

I’ve found that SuperSunday is a perfect time to visit the state parks – you have the place to yourself.

• http://cards.devonyoung.com Devon Young

This is really interesting to me — ’cause I can win a Papa John’s pizza if I get it right LOL

Anyway, the question we should be asking isn’t about the AFC/NFC, but about how often heads/tails comes up. I notice they didn’t even touch that point. So the real question isn’t “Will the NFC win the flip again?” but “Will the coin flip result in tails?”. (especially since the Giants captain already told the media he’ll pick tails)

If the NFC was picking the result every year, then maybe that would be interesting, but they don’t. Also the pick is usually called while the coin is IN THE AIR — which means there’s really no way to make it biased (as the article asks) by the ref or the player. If heads/tails came up significantly more often, then I could see an argument for biasedness. Streaks happen in probability though, so it still wouldn’t be evidence (unless maybe 12 out of 14 were heads AND the AFC chose tails every time they picked)

So it really looks like an anomaly to me that the NFC has won 14 straight coin flips (including 7 times the AFC picked the result!). So… heads or tails is the question.

• Ian Liberman

I may be mistaken but from what I have read from this article ,you can not bias a coin. Therefore it will always be a fair coin. http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf
You Can Load a Die, But You Can’t Bias a Coin
Andrew Gelman and Deborah Nolan

• J. R. Staton

What about starting parameters, is the coin always heads up or tails up to start? Is there any correlation with the results?

• Kevin Anthoney

Actually, the first coin flip in the run has to be different from the one before, otherwise it’s a run of more than 14. Unless we’re talking about a run of at least 14…

• fraac

Devon, I bet you I can make the jack of spades rise of of this deck of cards and squirt water in your ear.

• http://somewhatabnormal.blogspot.com/ Robert Oerter

Clearly the NFC is on a roll with coin flips and we can expect them to win the flip again this year.

I don’t think the probability calculation is quite right. There are two possible questions:
1) What’s the probability of one team winning 14 tosses out of 14 tosses.
2) What’s the probability of one team winning a string of 14 tosses out of 45 tosses (one toss for each Super Bowl).

The first is (1/2)^13 as you’ve stated in the article. The second is different and much harder to calculate. Rather than finding an exact solution, I scripted a quick simulation and ran it 100,000 times. I came up with the number 0.2%.

• Kevin Anthoney

Alex @ 12.

Total number of possibilities: 2^45.

The sequence of 14 coin flips can occur a) at the begining; b) at the end; or c) in the middle. If at the beginning, we can have either 14 heads followed by a tail, or 14 tails followed by a head, and the remaining 30 can be anything we like, so that’s 2*2^30 combinations. Likewise, at the end we have 2*2^30 combinations.

If the sequence occurs in the middle, we can have a string of 14 tails sandwiched by two heads, or a string of 14 heads sandwiched by two tails, with the other 2^29 being anything we like. This sequence of 16 flips can occur anywhere from starting at position 1 to position 30, so that’s 2*30*2^29 combinations.

The total number of is about 2*2*2^30 + 2*30*2^29 = 3.651^10. I say “about” because sequences with more than one string of 14 coin flips will have been double counted, and I’m too lazy to allow for that. So I make the final probability of a run of 14 identical coin flips occuring somewhere in a sequence of 45 to be about 3.651^10 / 2^45 = 0.001038. Which is about half what you get.

Kevin @ 13

Very nice! Thanks for that.

I’ve thought of one thing that might account for part of the discrepancy. If I toss a coin 45 times, and get a string of 20 heads, it’s also true that that sequence contained a string of 14 heads. The difference is between getting *exactly* 14 heads and getting *at least* 14 heads. My simulation looks for at least 1 string of at least 14 heads, and I think you’ve calculated the probability of getting at least 1 string of exactly 14 heads. The former is more likely. Is it twice as likely? Maybe. Or maybe my simulation is broken, but it pretty closely agrees with some reference values I’ve pulled out of a textbook.

• http://math-frolic.blogspot.com Shecky R

seems pretty obvious that the NFC employs a psychic to sit in the stadium and will the coin to fall in their favor… anomaly solved! (anyone spied Uri Geller in Indianapolis this weekend?)

• joshua

The question isn’t of how the coin landed but how the teams called it.

• http://sacrilicio.us Matunos

@16 this

• http://www.rationalpastime.com J-Doug

@16 & 17: Unless the caller has information about the outcome more than its base probability, this doesn’t matter.

• Kevin Anthoney

Alex @14

Yes, that would be the reason. It’s double because one of the terminating coin flips would have to be fixed to make it exactly 14, but could be either for at least 14, doubling the number of combinations.

• Flip

Winning a coin toss is not the same thing as whether the coin comes up heads or tails.
I am sure there is info out there on whether the coin came up heads or tails, but being able to call a coin toss is an art. In most cases its question of how many rotations you would expect the coin to have from the initial state to its peak height and then back to the ground. If you take into account the fact that most people have a fairly decent muscle memory, an analysis of toin cosses by officials should reveal a bias if the official is tossing the coin under identical conditions. One would expect that the coin will rotate the same number of times after the toss under identical conditions by the same official. Its the same reason why the world record for free throws is so high.

Although there is nothing in this run of coin flips that violates the concept of randomness, I suspect that randomness is really more complicated than we currently understand it to be, especially when we are dealing with scientific truth.

Jonah Lehrer in a New Yorker article, “The Truth Wears Off”, talks how about a host of scientific research where results meeting standard statistical tests seem to become less and less significant over time to the point that some of the results appear to vanish. Although we might immediately think that the initial study that generated the results was poorly designed and the results were bogus to start with, that doesn’t always appear to be the case. There was a study of a memory phenomenon called “verbal overshadowing” cited over 400 times and extended to a variety of other tasks. The original discoverer Jonathan Schooler himself concluded, although there was nothing wrong with his design, he had been unlucky in his choice of original subjects and the phenomenon he himself discovered apparently didn’t exist or was vanishing with each new round of testing. A similar behavior can be seen the J.B. Rhine ESP tests where some subjects appeared to pull off amazing runs in guessing cards. One subject did a run a nine correct guesses (2 million to 1 odds) three times. The Amazing Randi would probably suspect cheating but, if so, the subject apparently got more and more honest over time and eventually could do no better than chance.

Of course, there are poorly designed studies, selective reporting of results, a bias in publishing towards studies with positive results, plain dishonesty, and other explanations. Regression to mean doesn’t seem particularly satisfying as an explanation when the results meet the standard statistical tests.

Lehrer talks about the Crabbe study involving cocaine and mice in which “same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of litter mates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.” The end result was that three separate labs reported three different results on the behavior of the mice and the differences were not trivial. Lehrer adds: “The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise.”

Randomness itself is something of an abstraction from reality. It is a concept and I suspect that it does not capture the whole reality of the coin flip. Randomness, of course, is usually counter-posed against causality/predictability, but there may be some twilight ground between the two when the conscious observer becomes involved.

• Flip

Just as a note, when I was a kid I could hit heads or tails with about a 90% accuracy when I was asked to toss a coin, much to the consternation of my siblings.

There are two objections to this analysis. The classical one has been covered above, but there’s also the all-important quantum objection, which I will explain via this manual TrackBack. Ping.

• NFC

3.8 sigma could be wrong sometimes…

No no no no no you people have it all wrong.

There may be 16 correct calls in a row, but there is only one lucky streak!

• http://www.astro.multivax.de:8000/helbig/helbig.html Phillip Helbig

I found this gem at http://www.thewaythefutureblogs.com/2012/02/bright-sayings-of-bright-people-no-25/ . It seems appropriate here.

“Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.”

—Aaron Levenstein

The coins are not identical. There is a lot of literature on how loaded coins from different mints are.But this is besides the point.

The 14 coins did not produce the same outcome year after year, they just resulted in 14 times one winning the outcome of the flip whatever it may have been by virtue of either guessing or of the other team guessing incorrectly. Which brings me to my point:
There is the overlooked random event: which team gets to make the call for the outcome.

• Andrew Cleland

The patriots won the coin toss this year, so the NFC winning streak has been broken.

• Pingback: Superbowl statistics | Stats Chat()

• Childermass

To continue on the thread of Alex @ 12,

Not only might one ask what are the odds that it will happen in the history of the Superbowl instead of the last 14 years (prior to this year), but what are the odds that such a string of coin flips would happen in a series of high-profile games. It would be just as noticed if it happened at the national-championship game for college football. It would also be noticed if a single team won (or lost) 14 coin tosses in a row. Of course we also pay attention to many other sports stats as well.

So the odds that such a coincidence will astound us are actually quite good — indeed all but certain.

I assume that the particle physics guys at the LHC are not just looking any old statistical anomaly in their millions of data points as they WILL find one. Rather the theory must say what it expects to find. When they calculate the sigma for the proposed Higgs observations, are they calculating the odds of finding anywhere in the proposed range of energies or just the odds it will be found where it was found? I hope it is the former. But even if it is not, the vast majority of statistical quirks that one one might dig out of the LHC data are not what Higgs hypothesis predicts. There is a big difference between noticing a statistical quirk and predicting one in advance.

• Zwirko

32,766 throws, on average, to have that 14-sequence appear.

• Rich Townsend

That 3.8sigma signal is a ‘local’ figure. The analysis should take account of the ‘look elsewhere effect’ – that’s what the particle guys do. It’s exactly as childermass explains @32

• http://slackwire.blogspot.com/ JW Mason

Three-sigma results will be observed in about 3 draws out of 1,000 from a normal distribution. So if you form a null hypothesis about a particular data-generating process (DGP) (e.g. a fair coin is being flipped), draw a sample from it, and observe a three-sigma departure from the predicted value, then if your prior probability for the alternative to the null (that it is not a fair coin toss, in this case) was not substantially less than one in 300, then you should seriously consider rejecting the null and accepting the alternative. Most of the comments here are various suggestions about what rejecting the null here would mean concretely.

But! This is not the situation described by this post. We did NOT first form a hypothesis about the DGP and then draw a sample from it. Rather, we are only talking about coin flips because we *already* observed the anomalous result. So the relevant question is, given the universe of comparably salient DGPs (in the Super Bowl, in professional sports, among stuff going on this past weekend, among stuff Sean might plausibly blog about — whatever we decide the relevant universe is) what is the probability of a three-sigma result being observed among at least one of them? And given that, for any reasonable definition of the potential universe of DGPs, there are far more than 300 of them, the answer is going to be close to 100%.

Sean gets this, obviously — the post was clearly tongue in cheek. But it seems that many commenters here don’t.

As for the application to physics, that’s way above my pay grade. But it depends whether (or to what extent) the sample with the three-sigma anomaly is drawn from a DGP identified as of interest on prior theoretical grounds, or whether it’s the result of a fishing expedition. Rich T.’s link suggests it’s often the latter; I don’t have any idea if that’s right, but it’s the right question.

• http://slackwire.blogspot.com/ JW Mason

(Sorry, I know we’re talking about 3.8 sigma, not 3 sigma. But I don’t think there’s any question that the universe of possibly salient DGPs — even just within the Superbowl — is considerably larger than 8,000. Or, what Childermass said.)

• Pingback: Higgs signal gains strength » imcity.ir()

• Charlie

I probably observed several sigma 5 results just walking to work today, though I wasn’t aware of it.

• Pingback: Higgs signal gains strength | science()

• Doc C

As a street scientist, I would like to point out that this analysis would all benefit from using some Bayesian analysis. What is the pre-DGP chance of the NFC winning 14 times in a row? One would need to know the odds of one team winning a coin toss each game. Theoretically 50:50, but not necessarily, based on the unmeasurable variables that go into calling tosses and biased coins, and the performance of each team within that set of variables. Add to that the occurrence of statistical anomalies in real life, and I suspect one gets a high chance of the 14 wins in a row. One knows this analysis is likely correct, because no one has asked for a new kickoff determining methodology.

The Sigma for Higgs taken in context of the Bayesian analysis of what the pre-test expectations for the experiment are, and accounting for 2 separate experiences coinciding makes the chance of the phenomenon being real much higher than simply saying that they are likely to be wrong 1/1000 times, and that kind of error is pretty common.

By the way, same goes for the superluminal neutrinos. There are more than one set of experiments revealing the same result, which raises the Bayesian chance of it being a true finding. That does not mean that they beat the speed of light, only that they traverse known space faster than photons can. There may be other reasons besides speed that they do that (like unknown space shortcuts only they have access to). Either way, the street would say the odds of the findings being real are higher than a single result would indicate.

• https://www.cfa.harvard.edu/~rkirshner/ Bob Kirshner

Uh.. Sean. I hate to admit I know more about football than another person, but since it is you: statistics aside, it is simply not true that “Every professional football game begins with the flip of a coin, to determine who gets the ball first.”

The winner of the toss gets to DECIDE whether to kick or receive.

• Valdis Kletnieks

@36 JW Mason: Richard Feynman skewered this problem as well:

“You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!” — Six Easy Pieces

NEW ON DISCOVER
OPEN
CITIZEN SCIENCE

### Cosmic Variance

Random samplings from a universe of ideas.