Friday, October 24, 2008

What's up with the polling?

One of the interesting things about the election coverage are the polls. Like most people, I am intrigued by the daily movements, and love to speculate why the polls are moving. But, taking a step back and contemplating the polls a bit more rationally, you begin to realize that something is seriously wrong.

Most of these polls are supposed to have a +/- 3.5% accuracy. The 3.5% represents the 95% confidence interval. That means, that there is supposed to be only 5% chance that the actual result is outside the predicted range. Except that, when I look at the polls, the number of outliers are far more frequent, and the spread is far more than would be expected. So, it would appear that there is something seriously wrong.


I don't have a precise answer for why this is happening, I do have three potential hypothesis:


  • The first is that the polls try to forecast "likely voters". A closer examination of the polls suggest that the polls that have Obama up by large margins tend to skew in favor of people who claim in the form that they intend to vote. In contrast, the polls which have Obama down in the dumps, appear to be overweighting prior election turnout ratios. Essentially, this suggests if the same people who voted lasted were to vote this time, this race would be a toss-up and Obama would probably have a high chance of losing. On the other hand, if all the voters who say they will vote turn out in large numbers, this could be an Obama landslide. They key uncertainty, therefore, is voter turnout, and the polls frankly are just getting at what will happen on this front.


  • The second reason has to do with demographics. The historical polling data used a segmentation scheme that was appropriate for a white man running against a white man. Having an African American and a woman on the tickets has completely changed which demographics are relevant. For instance, do pollsters know which groups tend to be racist and which tend to be sexist? Pollsters have got better as they can extrapolate from state and local elections. However, it seems that different agencies are using different weights and samples of the various groups, thereby skewing the results differently.

  • Finally, there is a systematic methodological error that may be hugely significant this time. Most polls are telephone polls, conducted over land lines. Unfortunately, land line use in the US has dropped dramatically in the last four years, particularly among young people. Pollsters were still confident in their results as they assumed that there were no statistically significant systematic differences between those with land lines and those without. That may not be true this time. If the people who use land lines behave differently from people who don't, then the polls could be wrong. Different pollsters have been oversampling different demographics to adjust for this, which may explain some of the variance.

Overall, all I can say is that currently the polls suggest that this race is anything from a toss up to an Obama landslide. It all depends on who votes. Beyond this, any inference from any poll, is pretty much meaningless.


One interesting side note is this article, which shows that the polls actually mirror Google Trends. Here is how McCain, Palin, Obama and Biden track on Google trends in October:



(Obama - Yellow, McCain - Blue, Palin - Red, Biden -Green)

No comments: