Since the spill motion on 9 February, the polls have improved for the Coalition. If an election was held now, it would still be a thumping loss, just not as thumping as early February.

I should point out that LOESS regressions can be overly influenced by outliers, especially at the end-points. We will need to see more polls to know whether the recent change in voting intention is as dramatic as that suggested by the previous chart.

# Mark the Ballot

Psephology by the numbers

## Monday, March 2, 2015

## Monday, February 9, 2015

### Newspoll 43-57

The Australian's Newspoll from 6-8 February 2015 continues the challenging numbers for the Coalition: with a two-party preferred voting intention of 43 to 57 in Labor's favour.

Dropping these numbers into the Bayesian model yields the following charts and a headline, aggregate TPP voting intention of 44.2 for the Coalition and 55.8 for Labor.

At this point in the blog, it is my normal practice to remind people that I anchor the above Bayesian aggregation with the assumption that the net bias across all of the polling houses sums to zero.

The LOESS model yields 43.7 per cent for the Coalition and 56.3 per cent to Labor.

Both models are suggesting a sizable decline in voting intention for the Coalition since the New Year.

Dropping these numbers into the Bayesian model yields the following charts and a headline, aggregate TPP voting intention of 44.2 for the Coalition and 55.8 for Labor.

At this point in the blog, it is my normal practice to remind people that I anchor the above Bayesian aggregation with the assumption that the net bias across all of the polling houses sums to zero.

The LOESS model yields 43.7 per cent for the Coalition and 56.3 per cent to Labor.

Both models are suggesting a sizable decline in voting intention for the Coalition since the New Year.

Labels:
aggregated polls,
Newspoll

## Sunday, February 8, 2015

### Galaxy 43-57

Another published poll ahead of (now) Monday's spill motion. Today's Galaxy poll is no change on last week's poll: 43 to the Coalition, 57 to Labor.

I have made a pre-processing change to the Bayesian model. Before I explain that change, let me provide some context. Rather than resurrect my code base from the 2013 Election (I have about 40 or 50 model and processing files in that directory), I decided to code for the 2016 Election from scratch. Which is what I have done.

I had been thinking about the model output, and in particular how choppy that output appeared. To address the choppiness, I added a Henderson moving average. It worked. But it was not the most comfortable solution.

Last night, I noticed that in the lead-up to the 2013 Election, I reduced the sample size for the Morgan polls down to 1000 as an adjustment for the observed over dispersion given the sample size of the Morgan multi-mode polls. I have made this same adjustment for the 2016 models. It has had the effect of reducing the choppiness in the model. But it also means the outlier poll from the middle of 2014 no longer has such influence on the model. Consequently, the middle of 2014 is no longer the Bayesian nadir for the Coalition.

With today's Galaxy poll factored into the mix, the Bayesian model is now reporting the national voting intention at 44.6 to the Coalition, and 55.4 to Labor. This is pretty much the same result as Kevin Bonham, who puts it at 44.4 to 55.6.

Because I have made a change to the model, I will provide a fairly comprehensive set of charts.

I have made a pre-processing change to the Bayesian model. Before I explain that change, let me provide some context. Rather than resurrect my code base from the 2013 Election (I have about 40 or 50 model and processing files in that directory), I decided to code for the 2016 Election from scratch. Which is what I have done.

I had been thinking about the model output, and in particular how choppy that output appeared. To address the choppiness, I added a Henderson moving average. It worked. But it was not the most comfortable solution.

Last night, I noticed that in the lead-up to the 2013 Election, I reduced the sample size for the Morgan polls down to 1000 as an adjustment for the observed over dispersion given the sample size of the Morgan multi-mode polls. I have made this same adjustment for the 2016 models. It has had the effect of reducing the choppiness in the model. But it also means the outlier poll from the middle of 2014 no longer has such influence on the model. Consequently, the middle of 2014 is no longer the Bayesian nadir for the Coalition.

With today's Galaxy poll factored into the mix, the Bayesian model is now reporting the national voting intention at 44.6 to the Coalition, and 55.4 to Labor. This is pretty much the same result as Kevin Bonham, who puts it at 44.4 to 55.6.

Because I have made a change to the model, I will provide a fairly comprehensive set of charts.

Labels:
aggregated polls,
Galaxy

## Friday, February 6, 2015

## Sunday, February 1, 2015

### Bayesian updates for Galaxy and IPSOS

Over the weekend we have two new polls. Both cast more light on the political impact of awarding a Knight of the Order of Australia to the Prince Consort: IPSOS (46-54; -2 on the first week of December for the Coalition) and Galaxy (43-57; also -2 for the Coalition on the first week of December). Individually, both polls suggest a movement of voting intention away from the government of two percentage points since the first week of December.

However, the Bayesian model was not convinced. In part, given the small number of polls we have from both these houses, it has responded by adjusting the house bias for these houses. But, to be fair, the January 27 ReachTEL poll only had a one point movement away from the government between 20 November and 27 January. Also the December Morgan poll and the latest Morgan poll show a one point movement in favour of the Coalition. Consequently, the Morgan and ReachTEL results would have mediated the latest IPSOS and Galaxy movements in the Bayesian model.

While the Bayesian model might not have moved as shockingly for the government as the two most recent polls, it is still moving in the wrong direction for the government. If an election was held now, it would be a thumping landslide win for Labor.

As a very rough rule-of-thumb, we can use the cube rule to estimate the number of seats Labor would win with 55 per cent of the two party preferred vote. That estimate is 97 seats.

However, the Bayesian model was not convinced. In part, given the small number of polls we have from both these houses, it has responded by adjusting the house bias for these houses. But, to be fair, the January 27 ReachTEL poll only had a one point movement away from the government between 20 November and 27 January. Also the December Morgan poll and the latest Morgan poll show a one point movement in favour of the Coalition. Consequently, the Morgan and ReachTEL results would have mediated the latest IPSOS and Galaxy movements in the Bayesian model.

While the Bayesian model might not have moved as shockingly for the government as the two most recent polls, it is still moving in the wrong direction for the government. If an election was held now, it would be a thumping landslide win for Labor.

As a very rough rule-of-thumb, we can use the cube rule to estimate the number of seats Labor would win with 55 per cent of the two party preferred vote. That estimate is 97 seats.

### Updated

- 5 February 2015 - added maxima, minima and endpoint statistics to the charts (at the request of Andrew Catsaras). Also added an observation based on the cube rule.

Labels:
aggregated polls

## Friday, January 30, 2015

## Saturday, January 10, 2015

### Polling accuracy

Leigh and Wolfers observed in 2006 that "the 'margin of error'
reported by pollsters substantially over-states the precision of
poll-based forecasts. Furthermore, the time-series volatility of the
polls (relative to the betting markets) suggest that poll movements are
often noise rather than signal" (p326). They went on to suggest, "for
forecasting purposes the pollsters' published margins of error should at
least be doubled" (p334).

Leigh and Wolfers are not alone. Walsh, Dolfin and DiNardo wrote in 2009, "Our chief argument is that pre-election presidential polling is an activity more akin to forecasting next year's GDP or the winner of a sporting match than to scientific probability sampling" (p316).

In this post I will examine these claims a little further. We start with the theory of scientific probability sampling.

We can use computer generated pseudo-random numbers to simulate the taking of many samples from a population, and we can plot the distribution of arithmetic means for those samples. A python code snippet to this effect follows.

In each simulation we draw 200,000 samples from our imaginary population. In the first simulation each sample was 400 cases in size. In the subsequent simulations the sample sizes were 800, 1200, 1600, 2400 and finally 3200 cases. For each simulation, we assume (randomly) that half the individuals in the population vote for one party and half vote for the other party. We can plot these simulations as a set of probability densities for each sample size, where the area under the curve is one unit in size. I have also reported the standard deviation (in percentage points) from the simulation (SD) against the theoretical standard deviation you would expect (TSD) for a particular sample size and vote share.

As the sample gets larger, the width of the bell curve narrows. The mean of larger samples, when randomly selected, is more likely to be closer to the population mean than the mean of the smaller samples. And so, with a sample of 1200, we can assume that there is a 95% probability that the mean of the population is within plus or minus 1.96 standard deviations (ie. plus or minus 2.8 percentage points) of the mean of our sample. This is the oft cited "margin or error", which derives from sampling error (the error that occurs from observing a sample, rather than observing the entire population).

So far so good.

Terhanian (2008) notes that telephone polling "sometimes require telephone calls to more than 25,000 different numbers to complete 1,000 15-minute interviews over a five-day period (at least in the US)". Face-to-face polling typically excludes those living in high-rise apartments and gated communities, as well as those people who are intensely private. Terhanian argues that inadequate coverage and low response rates are the most likely culprits when polls produce wildly inaccurate results.The reason for inaccuracy is that the sampling frame or approach that has been adopted does not randomly select people from the entire population. Segments of the population are excluded.

Other issues that affect poll outcomes include question design and the order in which questions are asked (McDermott and Frankovic 2003) both of which can shift poll results markedly, and house effects (the tenancy for a pollster's methodology to produce results that tend to lean to one side of politics or the other) (Jackman 2005; Curtice and Sparrow 1997).

Manski (1990) observed that while some people hold firm opinions, others do not. For some people, their voting preference is soft: what they say they would do and what they actually do differ. Manski's (2000) solution to this problem was to encourage pollsters to ask people about the firmness of their voting intention. In related research, Hoek and Gendall (1997) found that strategies to reduce the proportion of undecided responses in a poll may actually reduce poll accuracy.

A final point worth noting is that opinion polls tell us an historical fact. On the date people were polled, they claim they would have voted in a particular way. Typically, the major opinion polls do not seek to forecast how people will vote at the next election (Walsh, Dolfin and DiNardo 2009, p317). Notwithstanding this limitation, opinion polls are often reported in the media in a way that suggests a prediction on how people will vote at the next election (based on what they said last weekend when they were polled). In this context, I should note another Wolfers and Leigh (2002) finding:

Nonetheless, the margins of error that are often reported with opinion polls overstate the accuracy of those polls. These statements only refer to one of the many sources of error that impact on accuracy. While the many other sources of error are rarely as clearly identified and quantified as the sampling error, their impact on poll accuracy is no less real.

There are further complications when you want to take opinion polls and predict voter behaviour at the next election. Only polls taken immediately prior to an election are truly effective for this purpose.

All-in-all, it is not hard to see why Leigh and Wolfers (2006) said, "for forecasting purposes the pollsters' published margins of error should at least be doubled" (p334).

Leigh and Wolfers are not alone. Walsh, Dolfin and DiNardo wrote in 2009, "Our chief argument is that pre-election presidential polling is an activity more akin to forecasting next year's GDP or the winner of a sporting match than to scientific probability sampling" (p316).

In this post I will examine these claims a little further. We start with the theory of scientific probability sampling.

### Polling theory

Opinion polls tell us how the nation might have voted if an election was held at the time of the poll. To achieve this magic, opinion polls depend on the central limit theorem. According to this theorem, the arithmetic means from a sufficiently large number of random samples from the entire population population will be normally distributed around the population mean (regardless of the distribution in the population).We can use computer generated pseudo-random numbers to simulate the taking of many samples from a population, and we can plot the distribution of arithmetic means for those samples. A python code snippet to this effect follows.

# --- initial import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt # --- parameters sample_size = 400 # we will work with multiples of this num_samples = 200000 # number of samples to be drawn in each simulation threshold = 0.5 # proportion of the population that "vote" in a particular way # --- model fig = plt.figure(figsize=(8,4)) ax = fig.add_subplot(111) for i in [1,2,3,4,6,8]: # get sample size for this simulation s = i * sample_size # draw num_samples, each of size s m = np.random.random((s, num_samples)) # get the population proportion (as a percent) that is less than the threshold m = np.where(m < threshold, 1.0, 0.0).sum(axis=0) / (s * 1.0) * 100.0 # perform the kernel density estimation kde = sm.nonparametric.KDEUnivariate(m) kde.fit() # plot ax.plot(kde.support, kde.density, lw=1.5, label='Sample size: {0:} SD: {1:.2f} TSD: {2:.2f}'.format(s, np.std(m), 100.0 * np.sqrt(threshold * (1 - threshold) / s))) ax.legend(loc='best', fontsize=10) ax.grid(True) ax.set_ylabel(r'Density', fontsize=12) ax.set_xlabel(r'Mean Vote Percentage for Party', fontsize=12) fig.suptitle('The Central Limit Theorem: Probability Densities for Differnt Sample Sizes') fig.tight_layout(pad=1) fig.savefig('./graphs/model0', dpi=125)

In each simulation we draw 200,000 samples from our imaginary population. In the first simulation each sample was 400 cases in size. In the subsequent simulations the sample sizes were 800, 1200, 1600, 2400 and finally 3200 cases. For each simulation, we assume (randomly) that half the individuals in the population vote for one party and half vote for the other party. We can plot these simulations as a set of probability densities for each sample size, where the area under the curve is one unit in size. I have also reported the standard deviation (in percentage points) from the simulation (SD) against the theoretical standard deviation you would expect (TSD) for a particular sample size and vote share.

So far so good.

### Polling practice

But sampling error is not the only problem with which opinion polls must contend. The impression of precision from a margin of error is (at least in part) misleading, as it "does not include an estimate of the effect of the many sources of non-sampling error" (Miller 2002, p225).Terhanian (2008) notes that telephone polling "sometimes require telephone calls to more than 25,000 different numbers to complete 1,000 15-minute interviews over a five-day period (at least in the US)". Face-to-face polling typically excludes those living in high-rise apartments and gated communities, as well as those people who are intensely private. Terhanian argues that inadequate coverage and low response rates are the most likely culprits when polls produce wildly inaccurate results.The reason for inaccuracy is that the sampling frame or approach that has been adopted does not randomly select people from the entire population. Segments of the population are excluded.

Other issues that affect poll outcomes include question design and the order in which questions are asked (McDermott and Frankovic 2003) both of which can shift poll results markedly, and house effects (the tenancy for a pollster's methodology to produce results that tend to lean to one side of politics or the other) (Jackman 2005; Curtice and Sparrow 1997).

Manski (1990) observed that while some people hold firm opinions, others do not. For some people, their voting preference is soft: what they say they would do and what they actually do differ. Manski's (2000) solution to this problem was to encourage pollsters to ask people about the firmness of their voting intention. In related research, Hoek and Gendall (1997) found that strategies to reduce the proportion of undecided responses in a poll may actually reduce poll accuracy.

A final point worth noting is that opinion polls tell us an historical fact. On the date people were polled, they claim they would have voted in a particular way. Typically, the major opinion polls do not seek to forecast how people will vote at the next election (Walsh, Dolfin and DiNardo 2009, p317). Notwithstanding this limitation, opinion polls are often reported in the media in a way that suggests a prediction on how people will vote at the next election (based on what they said last weekend when they were polled). In this context, I should note another Wolfers and Leigh (2002) finding:

Not surprisingly, the election-eve polls appear to be the most accurate, although polls taken one month prior to the election also have substantial predictive power. Polls taken more than a month before the election fare substantially worse, suggesting that the act of calling the election leads voters to clarify their voting intentions. Those taken three months prior to the election do not perform much better than those taken a year prior. By contrast, polls taken two years before the election, or immediately following the preceding election, have a very poor record. Indeed, we cannot reject a null hypothesis that they have no explanatory power at all... These results suggest that there is little reason to conduct polls in the year following an election.

### Conclusion

The central limit theorem allows us to take a relatively small but randomly selected sample and make statements about the whole population. These statements have a mathematically quantified reliability, which is known as the margin of error.Nonetheless, the margins of error that are often reported with opinion polls overstate the accuracy of those polls. These statements only refer to one of the many sources of error that impact on accuracy. While the many other sources of error are rarely as clearly identified and quantified as the sampling error, their impact on poll accuracy is no less real.

There are further complications when you want to take opinion polls and predict voter behaviour at the next election. Only polls taken immediately prior to an election are truly effective for this purpose.

All-in-all, it is not hard to see why Leigh and Wolfers (2006) said, "for forecasting purposes the pollsters' published margins of error should at least be doubled" (p334).

### Bibliography

**John Curtice and Nick Sparrow**(1997), "How accurate are traditional quota opinion polls?",*Journal of the Market Research Society*, Jul 1997, 39:3, pp433-448.**Janet Hoek and Philip Gendall**(1997), "Factors Affecting Political Poll Accuracy: An Analysis of Undecided Respondents",*Marketing Bulletin*, 1997, 8, pp1-14.**Simon Jackman**(2005), "Pooling the polls over an election campaign",*Australian**Journal of Political Science*, 40:4, pp499-51.**Andrew Leigh and Justin Wolfers**(2006), "Competing Approaches to Forecasting Elections: Economic Models, Opinion Polling and Prediction Markets",*Economic Record*, September 2006, Vol. 82, No. 258, pp325-340.

**Monika L McDermott and Kathleen A Frankovic**(2003), "Horserace Polling and Survey Method Effects: An Analysis of the 2000 Campaign",*The Public Opinion Quarterly*, Vol. 67, No. 2 (Summer, 2003), pp244-26.**Charles F Manski**(1990), “The Use of Intentions Data to Predict Behavior: A Best-Case Analysis.”*Journal of the American Statistical Association*, Vol 85, No 412, pp934-40.

**Charles F Manski**(2000), "Why Polls are Fickle", Op-Ed article,*The New York Times*, 16 October 2000.**Peter V Miller**(2002), "The Authority and Limitations of Polls", in Jeff Manza, Fay Lomax Cook and Benjamin J Page (eds) (2002),*Navigating Public Opinion: Polls, Policy and the Future of American Democracy*, Oxford University Press, New York.**George Terhanian**(2008), "Changing Times, Changing Modes: The Future of Public Opinion Polling?",*Journal of Elections, Public Opinion and Parties*, Vol. 18, No. 4, pp331–342, November 2008.

**Elias Walsh, Sarah Dolfin and John DiNardo**(2009), "Lies, Damn Lies and Pre-Election Polling",*American Economic Review: Papers & Proceedings 2009*, 99:2, pp316–322.**Justin Wolfers and Andrew Leigh**(2002), "Three Tools for Forecasting Federal Elections: Lessons from 2001",*Australian Journal of Political Science*, Vol. 37, No. 2, pp223–240.
Labels:
wonkish

Subscribe to:
Posts (Atom)