vendredi 28 octobre 2016

USA2016: Is the race tightening? It's all about mode


In this post, I examine whether the actual race between Clinton and Trump is really tightening as some suggest these days.

D'abord un mot pour les francophones: Désolée, ce billet sera seulement en anglais comme je le fais toujours lorsque j'analyse des sondages faits dans un pays anglophone. J'aimerais avoir le temps de traduire mais ce n'est pas le cas.

I show first our estimation of the progression of the race. The methodology used here is different from that used by other analysts. I explain these differences at the end of this post.

Some pollsters ask two questions related to voting intentions, one listing the four candidates and one asking preference between Clinton and Trump only. All the data analysed here is based on the first question. The fist graph shows the change in voting intention since August 1st 2016. The vertical lines indicate the three debates. The graph shows that, since the beginning of October, there are almost no polls where Trump has more support than Clinton. This would be shown by red dots (support for Trump in a poll) being higher than blue dots. There is indeed some variation in the polls and there were some polls between the two debates that showed Clinton very high (three blue dots between 52% and 55%). This may have lead some to believe that the gap between Clinton and Trump was widening substantially. But these polls seem to be outliers -- or related to specific news about Trump during that period -- as it can be seen by all the other polls that are rather close in their estimation.

Therefore, the estimation from all the polls is now Clinton at 49%, Trump at 42% and others at 10%. This estimation differs somewhat from other estimations probably because of methodological features (see methodology at the end). Notice however that the lines illustrating the estimation are regression lines that give more importance to data that are close to one another and less to outliers.

If we use the same data to figure out support for Clinton vs Trump only, i.e., the proportion of support for each candidate on the sum of their support, we get the following graph. This gives us Clinton at 54% and Trump at 46% of the total support for the two of them.

We clearly see, after mid-October, three red dots that are positioned higher than the 50% line, i.e. that show the support for Trump higher than the support for Clinton. And we have two other red dots that show support for Trump at 49%. Do these polls have specific characteristics that would explain their estimation? This is what we examine in the next section.

It's all about modes

 The next two graphs show, for Clinton and for Trump, the estimation of their support (on the sum of their support) traced by the polls according to the mode of administration, i.e. Web, live phone or IVR/online. We focus on support for Clinton, Trump's being the exact opposite. As we can see, the estimation lines for Clinton traced by Web and live phone polls (in blue) are almost identical. The green line represents the estimation traced by polls using a combination of IVR (interactive voice response) to households with landline phones and web polls to respondents who do not have access to a landline phone and who are members of an opt-in internet panel. These polls give a quite different estimate of support for Clinton, usually systematically lower than the other polls. There are three pollsters using this technology, i.e., Survey USA, Public Policy Polling (PPP) and Rasmussen. They all use a likely voter model. However, only Rasmussen estimates that Clinton's support is lower than Trump's. Since Rasmussen conducts a tracking poll and publishes 3-day estimates every day, if other analysts integrate all its estimates, it influences the average of the polls downward. In our case, we integrate tracking polls only once every "period" in order to include only independent data in the analysis (see methodology).

In short, without Rasmussen polls, the trend for Clinton is still going up, though it probably has reached a plateau. Her share of the support for the two main candidates is estimated at close to 55% by the Web and live phone polls but at 50% only by the IVR/online polls, this low figure being due solely to Rasmussen polls' estimates however.


The race does not seem to be tightening right now. There are some outliers that put Clinton somewhat higher but the main "problem" is with Rasmussen polls estimations that depart seriously from other pollsters, including those using the same methodology.

Acknowledgements: Luis Pena Ibarra is responsible for entering the data, conducting the analyses that produce the graphs and editing the graphs.

Methodology for this analysis.

1) The estimation produced is not an average, weighted or not. It is produced by a local regression (Loess). This analysis gives more importance to data that is close and less to outliers. It is however rather dependent on where the series start. For example, I started this series with the polls conducted after August 1st. This means that all the polls conducted since then influence the trend. My first graphs started June 1st. If I would still start at that date, the trend would be different because of the influence of the polls conducted in June and July. I try to balance the need to have enough polls for analysis and the importance of not having old information influence the trend too much.

2) Every poll is positioned on the middle of its fieldwork, not at the end of it. This seems more appropriate since the information has been gathered over a period of time and reflects the variation in opinion during that period.

3) All the tracking polls are included only once for each period. A poll conducted on three days is included once every three days. In this way, we only include independent data, which is appropriate in statistical terms.

4) The data used comes from the answer to the question about voting intention for the four candidates. Undecideds (non-disclosers) are attributed proportionally -- for now.

5) For Canadians, note that, in the USA, IVR cannot be used to call cell phones. This is why pollsters use Web opt-in for a part of their sample (20% in the case of Rasmussen).

Aucun commentaire:

Publier un commentaire