Sunday, March 3, 2019

On the representativeness of exit polls I: the 2016 general presidential election

If exit polls are perfectly representative of the electorate, then calculating the percent of votes each candidate received in a given state from the exit polls conducted there should exactly match the actual percent of the vote each candidate actually received in that state. So I'll try to do exactly that here from the 2016 exit polls conducted by CNN.

I will start with my own state, Georgia. So what actually happened in this state was that Trump got 50.4% of the vote, while Clinton got 45.3%. So if we look at the gender exit polls for Georgia, we see the following:

  • 55% of voters were female, 45% were male.
  • Of the female voters, 54% voted for Clinton, while 43% chose Trump.
  • Conversely, of the male voters, 60% chose Trump while only 37% chose Clinton.
So, assuming we are focusing only on Georgia, let M be the event that a election voter is a male, C be the event that they voted for Clinton, and T be the event that they voted for Trump. The exit polls thus indicate that: 
  1. P(M) = 0.45, 
  2. P(C|M) = 0.37, 
  3. P(T|M) = 0.6,
  4. P(F) = 0.55,
  5. P(C|F) = 0.54, and
  6. P(T|F) = 0.43.

So we can estimate Trump's total vote share from these data by multiplying the % of voters who were of each sex by the  % of each sex who voted  for him. Doing this gives (45%*60%)+(55%*43%)=27.0%+23.7%=50.7%. This matches up quite nicely with the actual percent of the vote he got in Georgia (it's only 0.3% higher than his actual result of 50.4%), so the poll appears to be quite representative. 

But what about Clinton? Let's do the same thing for her:
(45%*37%)+(55%*54%)=16.7%+29.7%=46.4%. This is 1.1% higher than the 45.3% of the vote Clinton actually got, so it's a little further off than with Trump.

What about race? 60% of those in the exit poll were white, and 30% were black. But let's break everyone into a dichotomous category of white vs. non-white, as CNN does for some of their exit polls. So we have 60% white voters and 40% non-white voters. Needless to say, Trump did much better among white voters than among non-white voters: he got 75% of the white vote but only 14% of the non-white vote. 

This indicates that Trump would receive (60%*75%)+(40%*14%)=45%+5.6%=50.6%. Again, we are very close (only 0.2% away) to Trump's actual result of 50.4%. For Clinton, she got only 21% of the white vote but 83% of the non-white vote. This points to (60%*21%)+(40%*83%)=45.8% of the vote. This is also close to (0.5% more than) the 45.3% of the vote she actually got.

Lastly, age. Let's again split people into two categories: 18-44 and 45 and older. Of 18-44 year olds, Clinton beat Trump 55 to 40. But among those 45 and older, Trump beat Clinton 60 to 38. 46% of voters were 18-44 and the remaining 54% were 45 and older. 

So this indicates that Clinton got 45.8% of the vote, again 0.5% more than the actual amount. Trump would be predicted to get 50.8% of the vote, or 0.4% more than he actually got. 

If you use the data broken down by six different age groups instead, you get a Clinton prediction of 46% and a Trump prediction of 50.68%. 

All of my results for Georgia are shown in the table below. The parentheses in the left column are the number of categories each set of results is broken down into (e.g. Race (2) = just "white" and "non-white").


Georgia C T How far off (Clinton)? How far off (Trump)?
Actual 45.3% 50.4%
Sex 46.4% 50.7% 1.1% 0.3%
Race (2) 45.8% 50.6% 0.5% 0.2%
Age (2) 45.8% 50.8% 0.5% 0.4%
Age (6) 46.0% 50.7% 0.7% 0.3%
Age (4) 46.2% 50.6% 0.9% 0.2%
Race (5)* 42.0% 48.8% -3.3% -1.6%
Race & gender 43.8% 50.3% -1.5% -0.2%
*I should note here that this was broken down into 5 categories, but there are 2 ("Asian" and "Other race") with no results given for how they voted. This explains why the estimates based on these specific polls are both much lower than the actual results. 

Next I chose Arizona solely because it's the first state listed on CNN's exit poll page (they're listed alphabetically and they didn't do AL or AK for some reason). Note that with many of these poll results you also have significant percentages of the voters for which there are no estimates of how they voted (these will all be denoted with an asterisk). Specifically, for AZ, there were no voting data for 14% of the voters in the Age (6) poll, for 9% of those in the Race (5) poll, and for 9% of those in the Race & gender poll. This leads to estimates of the results (% of all votes for each candidate) that are always somewhat lower than the actual values. That being said, my results are here:

Arizona C T How far off (Clinton)? How far off (Trump)?
Actual 44.6% 48.1%
Sex 44.6% 48.9% 0.0% 0.8%
Race (2) 44.8% 48.5% 0.2% 0.4%
Age (2) 44.7% 48.9% 0.1% 0.8%
Age (6)* 37.6% 44.3% -7.0% -3.8%
Age (4) 44.4% 48.5% -0.2% 0.4%
Race (5)* 39.2% 45.2% -5.4% -3.0%
Race & gender* 39.7% 44.6% -4.9% -3.5%
Education (4) 44.7% 48.8% 0.1% 0.7%
Education (2) 44.5% 48.5% -0.1% 0.4%
From this we see clearly that these polls seem to be quite representative of the entire electorate in these two states. In almost all cases without a lot of missing data, we see errors of less than 1 percent! In all cases without entire groups where we have no clue how they voted, we see errors of less than 2%. 

Let's look at the entire country now. Overall, Trump received about 45.9% of the popular vote in the US as a whole, and Clinton received 48.0%. Conveniently, we can include not just CNN's national exit poll results, but also the New York Times'.

Lastly, I included California exit poll results (also from CNN) because it is the most-populated state, so surely they should be especially accurate there.

My results for the entire country, as well as for AZ, CA, and GA, are shown below. Note that these results include only the MOV as estimated from each exit poll category (sex, race (2), etc.), not the % estimated for either candidate. Overall we see that the exit polls seem to be very representative. We see that excluding missing data (corresponding to all values shown in red below) makes both the AZ and CA exit polls more accurate, but it has no effect for the national polls for the simple reason that missing data was nonexistent for these polls, and for GA this exclusion actually made the estimated MOV less accurate.

Lastly, CNN's national exit poll matched the actual results much more closely than did the Times'. Why? CNN's national poll was based on 24,558 respondents, while the Times' was apparently based on 24,537. It seems unlikely that those 21 extra voters made such a big difference in accuracy between the two polls. Additionally, at the bottom of the page for the Times' poll, it says: "Data for 2016 were collected by Edison Research for the National Election Pool, a consortium of ABC News, The Associated Press, CBSNews, CNN, Fox News and NBC News." This seems to imply that the source for CNN's and the Times' exit poll data is actually exactly the same. Why the results are slightly different, then, is not clear (e.g. CNN says Trump got 52% of the male vote, Times says 53%). 



MOV  AZ CA GA National (NYT) National (CNN)
Sex -4.3% 28.7% -4.3% 0.5% 1.7%
Race (2) -3.8% 29.4% -4.8% 0.0% 1.2%
Age (2) -4.3% 29.7% -5.0% 0.0% 1.7%
Age (6) -6.7% 29.6% -4.7% 0.0% 1.6%
Age (4) -4.1% 29.5% -4.5% 0.6% 1.8%
Race (5) -6.0% 28.1% -6.8% 0.9% 1.8%
Race & gender -4.9% 24.7% -6.4% 0.0% 1.9%
Education (4) -4.1% 28.2% -4.5% 1.1% 1.9%
Education (2) -3.9% 28.6% -5.0% 0.0% 1.5%
Average -4.7% 28.5% -5.1% 0.3% 1.7%
Average (excl. miss.) -4.1% 29.1% -4.7% 0.3% 1.7%
Actual MOV -3.5% 30.0% -5.1% 2.1% 2.1%

No comments:

Post a Comment