vendredi 24 juin 2016

Brexit: we should have known better


Well, Quebec voted no to leave Canada, Scotland, no to leave UK and then UK leaves the EU. It is a bit ironic when you see it from Quebec, although Europe is not a country per se.

So what happened?

I am not sure that pollsters failed that much. However, I think analysts -- and I include myself -- failed. And I know how and why I failed. At the beginning of the campaign, a colleague of mine, Henry Milner, told me that it may be that, in this referendum, the status quo side was the Leave side. His argument was that older people tended to vote Leave, that they were raised in a country that was outside the EU and they may want to go back to "normality".

So what is the consequence in terms of analysis? The "Law of even polls" -- which states that when it is equal, status quo or the more conservative side is likely to prevail -- still applies...but you ought to know which side is status quo! I should have known better. I amend my law, adding this by-law: If you want to know which side is status quo, look at how older people vote. They are the ones who win elections. People between 18 and 34 years old form less than 20% of the population and an even lower proportion of the voters.

So, here is the graph that I get when I attribute 67% of non-disclosers to the Leave side instead of the Remain side (the reverse of what I have done so far).  I get a perfect prediction of the results.

In conclusion

A number of analysts, journalists, pollsters noticed during the campaign that older people clearly favored the Leave side. This should have rung the bells and led us to conclude that, in a very close situation, the Leave side was likely to win. In my case, I should have listened and attributed two thirds of the non-disclosers to the Leave side instead of Remain. With this procedure, the prediction is perfect.

jeudi 23 juin 2016

Brexit, an update that changes things a bit


I was not supposed to update unless there was some substantial change. Since yesterday, we added 10 new polls, i.e., those who were published since Tuesday and the Survey Monkey polls that were on the lists that we had consulted.

With these new polls, the situation is somewhat different.

The first graph shows change over time, with non-disclosers. It shows that there is a tendency towards a decrease in the proportion of non-disclosers (mostly undecideds). It also shows clearly now that the tendency is towards an increase in the support for Remain.

The second graph shows the estimates when non-disclosers are attributed proportionally. Even with this type of allocation, support for Remain is now ahead of Leave.

The final graph shows even more clearly that the Remain side is ahead of Leave. With this allocation, all the polls give a majority to Remain except one that puts the two sides at par. And the gap between the two is now estimated at 4.5 points.


With these new results, it is possible to conclude that the fatal shooting of Jo Cox probably had an impact on the campaign. It is rather clear from the second graph that most of the polls before the shooting gave an advantage to the Leave side and, on the contrary, most polls carried after give an advantage to Remain. If we look at the last graph that uses non-proportional allocation of non-disclosers, it is less clear, but nonetheless, the only estimates that gave a majority to Leave were before the shooting. More sophisticated statistical analysis will allow to validate -- or not -- this conclusion.

With these new results, we may conclude that Remain is likely to end up with a clear advantage of at least four points. Like everybody, I am eager to see the final results.

P.S. Thanks to Luis Pena Ibarra who recuperated the data and did most of the graphs for this campaign.

mercredi 22 juin 2016

Brexit, the day before


In this last analysis before election day, I use only the polls that were conducted during the campaign, i.e., from April 15 to June 20. If new polls were published since then, it may hardly change much what we see now. However, if there are such polls, I may update this message during the day. I first look at change in support overall and then, I look at the different portraits traced by the two modes of administration, i.e., telephone and web opt-in.

Change in support

The first graph shows the estimates of the different pollsters. It shows that the two sides are very close to each other. It also shows that the proportion of non-disclosers -- including undecideds and those who say they will not vote for pollsters who keep them in the samples -- is quite stable. However, this proportion varies much between pollsters -- from 3% to 26% -- so that it is not appropriate to look at the estimates of remain and leave without attributing these non-disclosers so that the proportions of Remain and Leave add up to 100%.

The next graph shows change over time when non-disclosers are attributed proportionally to each side for each poll. This is the procedure used by all pollsters, except for one recent BMG telephone poll. I will go back to this question later on. The portrait that emerges is that the positions have "crystallized" since the end of May. Since then, support for leave appears to be somewhat higher than support for remain. One also has to notice that the ceiling was reached not after the shooting of MP Jo Cox, but much before.

It is interesting to point out that the same situation occurred in Scotland for the referendum on independence. You can see in my last post of that campaign that support for both sides had also reached a ceiling close to 50% in the last weeks of the campaign. However, In Scotland, it was slightly more favourable to the status quo.

However, what Scotland -- and Quebec 1995 -- also show is that a proportional attribution of non-disclosers is likely to overestimate support for change. For example, in Scotland, a non-proportional attribution of 67% to the No side gave an estimation that was still a few points lower than the results of the referendum. You can see this analysis in my post Scotland, the day after.
I used the same non-proportional attribution of non-disclosers for the Brexit. One pollster, BMG Research, used the same attribution for its telephone polls (not its web polls). The pollster states that it asked a number of questions (which ones, we don't know) that led to conclude that this allocation was the appropriate one. You may look at the BMG report here. In addition, this post by Elections, etc. shows that polls almost always overestimate change.

The following graph shows the likely change in support over time using non-proportional attribution. Remain appears to be about two points ahead of Leave. In fact, with this allocation, there was only a short period last week where Leave was ahead of Remain. The last polls tend to show Remain ahead, at least when we use non-proportional attribution of non-disclosers.

By mode

Is the portrait traced by the two modes of administration the same? Not exactly. The next two graphs show the portrait of change over time in support for Remain, using either proportional or non-proportional attribution of non-disclosers. The two graphs show that the portrait is not the same according to mode. They both show also that telephone polls tended to estimate support for Remain five points higher than web opt-in polls at the beginning of the campaign but this discrepancy was reduced to two points at the end. With proportional attribution, telephone polls estimate support for Remain at 50%, opt-in web polls at 48%. With non-proportional attribution, the respective estimates are 52% and 50%. This means that the global estimates depend in part on the proportion of Web versus telephone polls that are conducted, so that weighting according to mode of administration -- like Number Cruncher does -- is not a bad idea.


It is interesting to notice that, as with the Scottish 2014 or the Quebec 1995 referendum, the Change side had momentum during the campaign but it  reached a ceiling in the last two weeks (or the last few days in the Quebec case). It seems clear that referendum campaigns do make a difference. I leave it to political scientists to analyse why and how it does.

The fact that the two modes of administration do not give the same estimates, not only of the level of support but also of change over time, is problematic. It is even more problematic since often, in small markets, the only polls that are conducted are web opt-in polls. We will see tomorrow which mode  led to better estimates. But nonetheless, there is an urgent need for research on ways to improve samples and estimates of polls if we do not want polls to mislead voters.

Will Remain win tomorrow? Like many others (see Elections, etc, for example), I think that it will. First, I think a non-proportional attribution of non-disclosers is more realistic and appropriate than a proportional attribution. Second, my own analysis is that the "Law of even polls" applies, i.e., when polls' estimates show two sides at par, the status quo side is likely to win, as it was the case in recent elections (Israel, UK, for example). If Remain does not win, I will have to modify the Law to take into account exceptions and figure out why this campaign was an exception, at the end.

I will have a last post on Friday to compare estimates and results.

Au plaisir

mardi 14 juin 2016

Brexit: It's all about modes?


We are close to entering the last week of the campaign. I will present an update of my last week analysis but, for this post, I will mainly focus on the major differences between modes.

First an update

As we can see in this first graph, the progression of the leave side went on during last week. However, it is important to notice that the Stay side remained stable. Its support did not decrease. What happens in that the progression of the Leave side seems to come almost entirely from a decrease in the proportion of respondents who say that they don't know how to vote or that they will not vote. This proportion, as noticed in a previous post, varies substantially, from3% to 15% during the last week.

If we attribute non-disclosers -- i.e. don't knows and will not vote -- proportionally like all the pollsters do, we get the following graph. We see that support for Leave has now pass support for Stay, as others have shown.

However, as I explained in preceding posts, empirically it is much more sound to attribute non-disclosers non proportionally, attributing more of them to the status quo side, in this case, Stay. If I keep the same non-proportional attribution that I used before -- 67% to stay and 33% to leave -- I get the following portrait of the situation. Support for Stay is slightly ahead of support of Leave, by about three points (it was five points last week).

However, it is very relevant to ask whether the portrait traced by polls is the same for telephone and Web opt-in polls.

It's all about modes

The first question is whether there is a difference between modes, controlling for change over time. In order to check for that, I perform a regression. The conclusions are:
  •  As we can easily see with the form of the curves in the preceding graphs, change over time is in the form of a reverse U (quadratic).
  • For Stay with proportional attribution of non-disclosers, after taking into account change over time, web polls give on average 5.1 points less to Stay than telephone polls. The mode, by itself, explains 45% of the variation between polls (which is huge!).
  • Using non-proportional attribution, the difference between modes is slightly corrected. Support for Stay in Web opt-in polls is 3.66 points lower on average than in telephone polls and mode explains 26% of the variation between polls.
 The second question is whether the two modes trace the same portrait of change over time. The simple answer is no. The first graph shows chance in support for Stay with proportional attribution of non-disclosers according to mode. It shows that, while telephone polls have estimated a steady decrease in support for Stay since the beginning of 2016, the web opt-in polls trace quite a different portrait. Is is only recently that they show a small decrease. These quite different portraits however converge to a similar estimate -- close to 50% -- in the last few days. The same thing happened in a way in the Scottish referendum where the difference between modes disappeared in the last weeks before election day.

If we use non-proportional attribution of non-disclosers, the portrait is similar but the endpoint estimate is slightly different, at 51.5%. However, since non-proportional attribution corrects for some of the differences between modes, there is no difference left in estimates according to mode.


Although the polls using different modes of administration do not trace the same portrait of change in support over time, it seems that, at the end, they tend to agree. So, as of now, we do not have to start a battle on who's right.

It seems to me that referendums on "independence" somewhat look alike, if one compares Quebec 1995, Scotland 2014 and the actual Brexit. During the campaign, the "change" side always gains support and, in the days before election, comes close to 50 percent . In Quebec and in Scotland, the "Law of even polls" was respected. This is, when the two sides are at par, the status quo side is likely to win. Why is that? We may speculate. It is possible that people who are for status quo tend less to reveal their preferences to pollsters or are less present in the samples. It is also possible that some people who are in favour of change are afraid of what could happen if change win by a tiny margin. they may therefore change their minds at the last minute. Anyhow, it is easier and less consequential to tell a pollster that you are going to vote for change than to do it for real. And "the message" is sent to leaders nonetheless.

Another nine days to go to see whether what happened in Quebec and Scotland will happen for the Brexit. We know however that the situation is somewhat different, in particular in the sociodemographic profile of the supporters for the two sides. Support for change in the actual campaign comes more from older people, who tend to turnout in larger proportions.

**Notice on methodology: In the graphs, each point represents a poll estimate positioned at the middle of the fieldwork; lines represent the likely change in support estimated using Loess (Epanechnikov, 0.65).

 For methodologists and other interested people

A question to ask is whether there is more variation according to mode and whether there is variation within mode. The next graph shows a box-and whiskers plot of the variation according to mode in support for Stay with proportional attribution of non-disclosers. The graph again illustrates that support for Stay is estimated higher by telephone polls. However, there is not much difference between modes in the level of variation and not that many polls differ significantly from other polls using the same mode. Two poll estimates by Survation are significantly higher than the other web opt-in polls and one YouGov poll is somewhat lower. For telephone polls, ICM and ORB each have two polls that are somewhat low.

 A similar graph done with estimates using non-proportional attribution of non-disclosers show a similar portrait. However, this procedure reduces variance among telephone polls and now show two Ipsos-Mori and one ComRes polls somewhat higher than the other telephone polls.

A general conclusion from this analysis would be that the major difference between modes is in the estimates -- in this case the median estimate -- not in variation. And there is not much difference within modes either.

jeudi 9 juin 2016

To Brexit or not to Brexit,...

Hi everybody,

Welcome to my first analysis of the polls regarding the Brexit. I will perform the same analysis as for the Scottish referendum, using graphs of local regressions. I will look at the likely change in support for the Brexit and at the differences between modes.

First, here is the graph that takes into account all the polls conducted since January 2016. The dots represent poll estimates. The lines represent the estimation of change using local regressions (epanechnikov .65 for the specialists).

The graph shows that the two sides are now practically at the same level according to the published polls. It also shows that the proportion of non-disclosers -- including the undecideds and those who say they will not vote -- has decreased since March, from around 17% to 11%. It is the Leave side that has gained most from the decrease of the non-disclosers. The proportion of supporters for Stay has remained the same over the period.

However, the graphs also allow to notice the the proportion of non-disclosers -- the dots in the graph -- varies much, from 4% to 30%. This proportion varies by pollster  -- from an average of 4.7% for ORB to 27.8 for TNS.-- and by mode -- 16.8% for the Web polls, 10.2% for the telephone polls. Note that the proportion of non-disclosers  was not published for three ORB polls. Since this would have biased the analyses, I attributed a proportion of 5% of undecideds to these ORB polls and modified the proportion of stay and leave accordingly.

The following graph illustrates the change in support when undecideds are allocated proportionally to each side, which is the usual procedure for all the pollsters. The portrait is quite the same as with the preceding graph, i.e., the two sides are at par, with a possible tiny advantage for stay.

For the Scottish referendum, I had suggested that a non-proportional attribution of non-disclosers be used as was the case for the Quebec 1995 referendum. I had proposed to attribute 67% of the non-disclosers to the No side and 33% to the Yes side. This procedure produced a very good prediction. I had predicted at least a 7 points difference between the two sides. It ended up at 10 percentage points. The argument here is not that the non-disclosers really distribute themselves in these proportions. This procedure is a way to correct for a number of phenomena. It is likely that partisans of the status quo are less likely to be in the samples since generally they are more likely to be older and harder to contact. It is even more likely with Web polls. It is also likely that partisans of the status quo are less prone to answer polls and, when they do, to reveal their vote. In addition, the fact that the proportion of non-disclosers vary between pollsters means that it is a feature of the methods used more than of the real proportion in the population. Using a non-proportional attribution means that the higher the proportion of non-disclosers the higher the proportion that is attributed to the status quo. Empirically, for the polls conducted in 2016, there is a positive correlation between the proportion of non-disclosers and the proportion of supporters for Leave. This tends to justify the non-proportional attribution.

One could argue that the situation is different than for the Scottish referendum since, for instance, the older people were more likely to support the No side in Scotland while it is the opposite for the Brexit. Older people seem more likely to support the Leave side. However, this may be partly due to a paradox where older people who are for Leave are more likely to answer polls.

Since I do not have a theoretical or empirical justification to change the attribution that I used in the Scottish referendum, I decided to use the same. Here is the graph that I get using this procedure. The two sides are now about five points apart, which is -- I think -- more realistic.

In conclusion, it will be very interesting to follow the campaign in the next two weeks. My next post will deal with  the substantial differences in the portraits traced by web polls compared to telephone polls.

mercredi 21 octobre 2015

Canada 2015: Bilan: Ah les modes/ It's all about modes

Hi,  (see english text in red)

Dans ce dernier message de la campagne 2015, je regarde si certains modes d'administration ont fait mieux que d'autres selon les contextes et les partis en cause.

Si les estimations produites par une méthode sont trop variables, il est plus difficile de se fier à chaque sondage publié. Il s'agit d'un biais aléatoire ce qui veut dire qu'on n'est pas certain d'une fois à l'autre de la qualité de l'estimation. Si les estimations faites par un mode donnent toujours un parti politique plus haut ou plus bas que les autres, il s'agit d'un biais systématique. Dans ce cas, il faut corriger à chaque sondage les estimations qui sont produites.

In this last message of the Canada 2015 campaign, I examine whether different modes of administration performed better than others. 
If the estimates produced by a particulat method vary too much, then it is difficult to rely on each poll. It is a random bias, which means that we are not sure of the quality of each estimate. If estimates produced by a particular mode always give a higher or lower number for a given party than other modes, it is a systematic bias. In that case, we have to adjust each estimate to take this into account.

Est-ce que certaines méthodes produisent des estimations plus variables ou différentes?

Je postule que les divers modes d'administration se sont distribués de la même manière tout au long de la campagne. Par ailleurs, il faut noter que, pour ce qui est des sondages téléphoniques, ils sont presque tous faits par une seule firme, Nanos, qui utilise une méthodologie particulière. Donc, il peut être "normal" que ses estimations soient moins variables étant donné l'absence de variation dans la méthodologie.
I suppose that the polls are distributed in the same way over the campaign, whatever the mode. I have to stress that most of the telephone polls are conducted by only one pollster Nanos, with a specific and constant methodology, which may explain at least partly why the estimates from telephone polls vary less.

Voici la situation pour le Canada:

Comme le montre le tableau suivant, il n'y a pas de différences de moyenne importantes selon les modes d'administration.

La principale différence réside dans le fait que les sondages de type IVR (téléphonique automatisé) donnent des estimations plus variables que les deux autres modes -- ce qui est illustré par la variance -- et ceci pour tous les partis politiques.

As shown in the preceding table, the different modes do not differ in their average estimates. However, they differ much according to variation. As is illustrated by the variance, the estimates produced by IVR polls vary systematically more than those produced by the two other modes. 

Le graphique suivant, appelé boîte à moustache, présente pour chaque méthode, la variation entre les estimations pour chaque parti. La boîte elle-même représente 50% des estimations qui ont été faites. Plus elle est haute, plus les estimations ont varié. La ligne noire au milieu de chaque boîte donne la médiane et permet de constater qu'il y a des différences sur ce plan entre les modes d'administration. Le graphique permet de constater que, pour les sondages IVR, les boîtes sont plus importantes  que pour les autres modes, ce qui signifie que les estimations varient plus. D'autre part, la médiane des sondages IVR est moins élevée pour le NPD et le Parti Libéral que pour les autres modes, ce qui signifie que ces sondages ont produit plus d'estimés inférieurs aux autres..

The box-and whiskers plot illustrated the variation between estimates. Each box representes 50% of the estimates. The black line in the middle of the boxes show the median, i.e., the point where 50% of the estimates are higher and 50% lower. The graph illustrates that, the IVR polls produced estimates that were more variable for the Liberal Party and the NDP than the other modes. It also shows that, since the median for those party is also lower, the IVR polls produced more estimates that were lower.

D'où viennent les différences entre les modes? Sont-elles les mêmes partout?
Where do the differences come from? Are they similar everywhere?

Pour répondre à cette question, j'ai examiné les mêmes données pour les deux plus grandes provinces, soit l'Ontario et le Québec.


L'Ontario d'abord. Comme le montre le tableau suivant, la différence entre les modes est encore plus importante en Ontario que pour l'ensemble du Canada. On s'attend à plus de variation puisque les échantillons sont plus petits. Mais, en général les échantillons IVR sont plus importants que les échantillons téléphoniques classiques et donc ils devraient présenter moins de variation. Ce n'est pas le cas. La variance est nettement plus élevée pour les sondages IVR quelque soit le parti et encore plus pour le NPD.

Il y a également des différences de moyenne et donc des biais systématiques. Les sondages téléphoniques ont eu tendance à donner un plus fort appui pour le parti Conservateur. Comme celui-ci a été sous-estimé de deux points en Ontario, ils ont donc donné un portrait plus adéquat de la situation. Pour ce qui est du PLC, la différence entre les modes n'est pas suffisamment importante pour s'y attarder. Pour ce qui est du NPD, les sondages WEB ont contribué à la surestimation de son appui, lui donnant trois points de plus que les sondages téléphoniques.

As shown in the preceding table, the main difference according to modes of administration in Ontario is the variability of IVR polls. This happens in spite of the fact that the samples are generally larger than those of telephone polls. And it is present for all parties and even more for the NDP.

There are also differences in mean estimation. Telephone polls gave higher estimates for the Conservatives and therefore, since the Conservatives were underesetimated in Ontario, they were better in estimating that party. On the opposite, WEB polls tended to estimate the support for the NDP higher than the other modes and contributed to the bias in favor of that party.

Le graphique suivant illustre la situation. On peut aisément noter que la mediane des sondages IVR était moins élevée pour le parti Libéral et que la médiane des sondages WEB était plus élevée pour le NPD et celle des sondages téléphoniques moins élevée.

The preceding graph illustrates the difference between modes. In particular, it shows that the median of the estimates of IVR polls was clearly lower for le Liberal Party. It also shows that WEB polls tended to have a higher median for the NDP.

Et le Québec?

De façon surprenante, au Québec, on ne retrouve pas les mêmes problèmes qu'en Ontario. comme le montre le tableau suivant, la différence de variance selon les modes d'administration est minime, sauf pour le parti Libéral.

Par contre, les différences de moyennes sont plus importantes. Les sondages IVR donnent près de cinq points de plus au parti Conservateur et ont donc contribué à sa surestimation générale. Les sondages téléphoniques ont eu tendance à estimer l'appui au Parti Libéral plus élevé alors qu'il a été sous-estimé. Ils ont donc mieux estimé l'appui à ce parti.  Par contre, ils ont eu tendance à surestimer l'appui au NPD. Enfin, les sondages WEB comportaient un biais systématique d'environ trois points en faveur du Bloc Québecois et ont donc contribué à sa surestimation.

It is surprising to see that the problems are not the same in Quebec. There is not much difference according to  mode in the variance of the different estimates, except for the Liberal party. 

However, there is a somewhat substantial difference in averages. The IVR polls give as much as five points more to the Conservative party therefore contributing to the overestimation of this party in Quebec. On the opposite, telephone polls performed better in their estimation of the Liberal party, which was overall underestimated. However, they tended to underestimate the support for the NDP. Finally, WEB polls systematically overestimated the support for Bloc Québécois, by as much as three points and therefore contributed to the bias in favor of this party.

Finalement, les constatations faites plus haut se reflètent dans le graphique suivant. On y constate l'absence de différences notables dans la variation des estimations en fonction des modes. Par contre, les médianes se différencient comme les moyennes. La médiane pour les Conservateurs est plus élevée pour les sondages IVR, celle des Libéraux et du NPD, plus élevée pour les sondages téléphoniques, et enfin, celle du Bloc Québécois, plus élevée pour les sondages WEB.

As the preceding graph illustrates, there is not much differences in variance according to mode. However, the median for the Conservatives is higher for IVR polls, the median for both the Liberals and NDP is higher for telephone polls and finally, the median for WEB polls is higher for the Bloc Québecois.

Que conclure? Comment expliquer? Comment corriger?

En résumé, les sondages se sont différenciés tant pour ce qui est de la variation des estimés que pour l'estimation du niveau d'appui. Les différences dans la variation toutefois sont surtout dues à la situation en Ontario alors que les différences dans l'estimation du niveau d'appui sont présentes tant au Québec qu'en Ontario.

Comme les sondages IVR donnent des résultats plus variables en Ontario qu'au Québec, on peut émettre l'hypothèse que la cause réside peut-être dans la manière de constituer les échantillons. Est-ce que les sondeurs mettent la même proportion de numéros de cellulaire dans les deux provinces? Les résultats sont-ils plus variables dans les échantillons rejoints par téléphone cellulaire? Ce sont des questions auxquelles les sondeurs utilisant cette méthode doivent s'attaquer. Par ailleurs, un sondeur a combiné deux modes - IVR et téléphonique classique mais il l'a fait seulement à la fin de la campagne tout en modifiant d'autres aspects de sa méthodologie (utilisant de moyennes mobiles sur trois jours). Il est difficile de savoir si ces modifications ont amélioré la performance.

Pour ce qui est des sondages WEB, le problème réside surtout dans un biais systématique, en faveur du NPD en Ontario et du Bloc au Québec. Il est possible que la progression de l'accès au WEB réduise ce biais mais, encore ici, il serait approprié pour les sondeurs de tenter de voir d'où viennent ces biais pour pouvoir éventuellement les corriger.

Au plaisir, à la prochaine élection

In summary, polls differed according to mode both in variance and in estimates of the level of support for each party. The differences in variances come mainly from Ontario while the differences in average and median estimates are present in both provinces.

Since IVR polls have more variable results in Ontario than in Quebec, one may hypothesize that the cause is to be found in the way samples are set up. Is there the same proportion of cell phones in the samples in the two provinces? Are there differences between respondents joined by cell phone and by landline? Those are questions for the pollsters. One pollster combined IVR and classical telephone modes but it changed other features of its methodology at the same time and used this combination only at the end. Therefore, we cannot evaluate if it improved the estimates.

As for WEB polls, the problem is mostly one of systematic bias in favor of NDP in Ontario and Bloc in Quebec. It is possible that the progression of WEB access will eventually reduce this bias but, it would nonetheless be appropriate for pollsters to try to tackle the origin of the bias in order to adjust what needs to be adjusted.

Best, see you next election!

Remerciements: Je tiens à remercier Luis Pena Ibarra, mon assistant de recherche, pour son travail assidu, méticuleux et compétent tout le long de la campagne. Ces travaux ont été menés grâce à la subvention du CRSH no 430-2015-01208 "Pour une analyse historique des données d'enquête".

Aknowledgements: I wish to thank Luis Pena Ibarra, my research assistant, for his diligent, meticulous and competent work during all the campaign. This work benefitted from the SSHRC grant 430-2015-01208 "For an historical analysis of survey data". 

mardi 20 octobre 2015

Canada2015: Le lendemain/ the day after


Je commence ce message en offrant mes excuses à ceux à qui j'ai fait peur en parlant d'une possible sous-estimation des Conservateurs. Il n'y a pas eu de sous-estimation des Conservateurs au total. Il reste à tenter de comprendre pourquoi il n'y en a pas eu cette fois-ci. L'explication se trouve partiellement dans le fait que le Parti Conservateur a été de fait sous-estimé à certains endroits, dont l'Ontario et l'Alberta, mais qu'il a été surestimé ailleurs, entre autres au Québec.

Les prochains graphiques présentent simplement les dernières courbes qui incluent les sondages EKOS et FORUM publiés en fin de soirée dimanche. Pour chacun, les résultats de l'élection sont indiqués, ce qui permet de voir jusqu'à quel point l'appui à chaque parti a été bien estimé par les sondages.
Je ferai un court message bientôt sur la performance des sondages en fonction du mode d'administration utilisé.

I start this message  with apologies to all those who were stressed because I alluded to a possible -- and even probable -- underestimation of the Conservative party in my last post before the election. No underestimation of the Conservatives this time. I still have to figure out why.Part of the explanation comes from the fact that while the Conservative Party was indeed underestimated in some regions -- Ontario and Alberta, namely -- it was overestimated elsewhere, particularly in Quebec. 
The following graphs give an idea of the performance of the polls. It includes two polls, from EKOS and FORUM, published late on Sunday night (they were not included in my last post before the election). Election results are included in each graph so that we can compare them with the polls. English Translation is in red, after each graph.
I will have a final message soon on the performance of the different methods.

La performance des sondages

Pour ce qui est du Canada dans l'ensemble, les analyses des sondages donnent une très bonne estimation des résultats de l'élection. On note une minime sous-estimation du PLC aux dépens du NPD mais il faut reconnaître que les derniers sondages pointaient vers cette possibilité.

In Canada as a whole, the forecast is rather good as one can easily see. There is a thin underestimation of the LPC -- and overestimation of the NDP -- but the last FORUM tends to show that there were still movement towards the LPC on Sunday.

Pour ce qui est de l'Ontario, la prédiction est parfaite pour le PLC. On note une sous-estimation du Parti Conservateur de près de deux points et une surestimation du même ordre pour le NPD.

In Ontario, the forecast for the LPC is perfect. There is a two points underestimation of the Conservative Party and an overestimation at a similar level for the NDP.

Le Québec est un cas différent. Il est clair que le PLC a été sous-estimé globalement par les sondages. On ne peut toutefois écarter la possibilité de mouvements de dernière minute, puisque le dernier Forum donnait exactement 35% au PLC, un résultat  très proche du résultat final. Toutefois, les sondages ont bien prédit le 25% obtenu par le NPD alors que Forum lui donnait 21%. Enfin, les sondages ont bien prédit l'appui au Bloc (20%) mais ils ont surestimé l'appui au Parti Conservateur de trois points.

Quebec is a different case altogether. It is clear that, overall, the LPC has been underestimated by at least five points by the polls. FORUM did well in forecasting a 35% support for the LPC. Both the NDP and the Bloc have been correctly estimated. FORUM however, underestimated the final score of the NDP by giving it 21%. Finally, the Conservative Party has been overestimated by three points.

En Colombie Britannique, s'il y a eu des mouvements de dernière minute, ce serait plutôt entre le parti Vert et le Parti Libéral. En effet, l'appui au parti Vert est surestimé d'environ deux points à l'inverse de l'appui au Parti Libéral, projeté à 33% alors qu'il a obtenu 35%. Les appuis aux deux autres partis sont prédits parfaitement.

In British Columbia, if there were last minute moves, they occurred between the Green Party and the Liberal Party. This latter is underestimated by two points and the Greens are overestimated by the same amount.

Chose surprenante, l'appui au parti Conservateur a été sous-estimé de près de cinq points en Alberta. Il atteint 59,5% alors qu'il était estimé à 55%. A l'inverse, tant l'appui au PLC que l'appui au NPD ont été surestimés de deux à trois points.

It is surprising to observe that the Conservative Party has been underestimated by close to five points in Alberta. Consequently, the support for both the LPC and the NDP is overestimated by two to three points.

Dans les Prairies, peu de choses à dire sinon que la prédiction obtenue à partir des sondages publiés est parfaite.
In the Prairies, the forecast of the election results is perfect.

Dans les provinces Atlantiques finalement, le Parti Libéral est un peu sous-estimé et le NPD, un peu surestimé. Toutefois, les derniers sondages montrent que cela peut provenir de mouvements de dernière minute.

Finally, in the Atlantic provinces, there is a very small underestimation of the LPC  and overestimation of the NDP. However, the last polls show that there were last minute trends that can explain this situation.

En conclusion
J'ai eu peur que la méthodologie que j'utilisais ne soit pas adequate pour détecter des plafonnements de dernière minute. Ça n'a pas été le cas. La méthodologie, très simple et facile à produire, donne des résultats tout à fait fiables à partir des estimations faites par les sondeurs.

Je me suis demandé s'il était justifié de postuler que les biais des divers sondages, et particulièrement des divers modes d'administration, allaient s'équilibrer pour donner un portrait juste. Encore là, pari tenu. Est-ce que ce serait la même chose s'il n'y avait pas une répartition assez équilibré des sondages selon les modes? À voir dans une prochaine élection.

Comme d'autres chercheurs, je me suis basée sur l'ensemble des sondages en faisant bien attention de ne pas me mettre l'accent sur un sondage en particulier. Il s'agit d'une pratique essentielle, particulièrement lorsque l'on regarde les estimations des sondages faites au niveau provincial ou régional. Les variations d'un sondage à l'autre sont très importantes, comme je l'ai montré dans mes messages précédents.

I was afraid that the method of analysis that I used -- local regression smoothing -- would not perform well if a plateau seems to be reached during the last days. In fact, the method performed well all the way. Good news since it is very easy and simple to perform. 

I asked myself whether it was justified to suppose that the bias of each poll and each mode of administration would cancel each other out. It worked fine. Would it be the same if there weren't a similar share of the different modes? We will see next time.

Like the other researchers, I based my estimates on all the polls and I was careful not to focus on one poll in particular. I think it is essential to do that, particularly when we look at poll estumates at the provincial or regional level. Given the samples, these estimates vary much, a problem that I pointed out in previous posts.

Remerciements: Je tiens à remercier Luis Pena Ibarra, mon assistant de recherche, pour son travail assidu, méticuleux et compétent tout le long de la campagne. Ces travaux ont été menés grâce à la subvention du CRSH no 430-2015-01208 "Pour une analyse historique des données d'enquête".

Aknowledgements: I wish to thank Luis Pena Ibarra, my research assistant, for his diligent, meticulous and competent work during all the campaign. This work benefitted from the SSHRC grant 430-2015-01208 "For an historical analysis of survey data".