Famous Statistical Blunders in History

Literary Digest, 1936

In 1936, Literary Digest, a national magazine of the time, sent out 10
million "straw" ballots asking people to tell them who they planned on voting for in the 1936 presidential election. They recieved back 2.4 million ballots (not bad for a pre-internet time when polling was still in its infancy, but still only about a 25% return rate, which leaves the door open for voluntary response bias). They predicted Alf Landon would beat Franklin Delanor Roosevelt 57% to 43%. As it turned out, Roosevelt won 62% to 37%.

There were two large problems:

First, the sample chosen was not representative of voters. Literary Digest used lists of phone numbers, drivers' registrations, and country club memberships to select its sample. But 1936 was the height of the
depression. At that time, phones, cars, and country clubs were more often available only to rich. The economy was the central campain issue of the election with Roosevelt advocating the New Deal that was attractive to lower income people, so the opinions of the rich and poor were bound to differ on this issue. As such, their sample turned out to be incredibly biased, and ultimately, useless.

Second, the sample chosen did indeed suffer from voluntary response bias. In Chicago, the Literary Digest used a systematic sampling technique. They asked every 3rd registered voter. The results still predicted that Landon would win, even though Roosevelt was the overwhelming favorite when the ballots came actually came in. This suggests that the people who
supported Roosevelt were less likely to respond to the survey, thus creating the nonresponse bias towards Landon winning.

The Literary Digest "lost face" and later went out of business.

Chicago Tribune, 1948

In the 1948 presidential election, Thomas Dewey, then governor of New York, ran against the incumant, Harry S. Truman. Three major polls (Gallup included) predicted Dewey would win. He did not. The Gallup poll came closest with 49.5% Dewey, 44.5% Truman, 5.5% other, and 0.5% undecided.

There were several problems:

First, they stopped polling too soon, and Truman was notably successful at energizing people in the last days before the election. The prediction came two weeks in advance of the election with 15% announcing undecided. It was assumed that the 15% undecided would split in the same proportions as those who had decided, leaving only 0.5% truly undecided. (The 0.5%
undecided was Gallup's error factor.)

Second, the telephone polls tended to favor Dewey because in 1948 telephones were generally limited to more well-to-do households, and Truman was less popular among elite voters.

Third, one of the methods used sent interviewers into an area and told them to interview a certain number of people who met some given demographic criteria. For example, a pollster might be told to choose and interview 10 men, 11 women, 8 african americans, 2 asians, and 11 caucasians, etc... where the numbers chosen for each demographic is representative of the overall population.

Unfortnately for the pollsters, just because a sample is representative of the population demographically, doesn't mean that it will be representative with regard to the issue at hand.

Even worse, as long as the pollsters matched the quotas for their demographics, they were free to choose whom they like -- which can lead to substantial bias in the sample when pollsters avoid certain people that for some reason are hard to approach.

Draft Lottery, 1970

The days of the year, from 1 to 366, were written on slips of paper and the slips were placed in plastic capsules. The capsules for each month, starting with January and ending with December, were mixed in a shoebox and then they were dumped into a deep glass container. Capsules were then
drawn from the jar one at a time.

The first number drawn was 258 (which corresponds to September 14). Men of draft age (those born between 1944 and 1950) whose birthday was Sept. 14 would then constitute the first group to be drafted. The second group of men to be drafted corresponded to birthdays agreeing with the second capsule/number drawn, and so on.

It was quickly noticed that men with birthdays in December seemed to get drafter earlier, on average, than birthdays in other months. With hindsight begin 20/20, its easy to see how this happened. The capsules were put into the jar month by month. So January capsules probably stayed on the bottom of the jar, while February capsules were on top of them, and so on, until December capsules ended up on top. Whatever subsequent mixing efforts were taken was not sufficient to completely overcome this ordering, producing a sample biased toward later months in the year.