## Thursday, December 13, 2012

### How I convert national TPP estimates into likely election outcomes

This is a short methodology discussion on how I generate a possible House of Representatives outcome from a national two-party preferred (TPP) poll estimate. This is not the most sophisticated approach possible (and it will not be the ultimate way I generate these predictions as I refine my models). But, it is what I do at the moment.

The first thing I did was to secure an estimate of the TPP vote for each seat in the 2010 federal election adjusted for boundary changes and redistributions since the 2010 election. I have shamelessly used Antony Green's pendulum for this purpose.

Secondly, building on some of the assumptions Antony made, I thought about how I should handle the treatment of independents (which sit a little outside of the mechanics of a TPP estimate). My current approach is based on the following assumptions (which are not dissimilar to Poliquant's approach):
1. Bob Katter will win Kennedy
2. Andrew Wilkie will win Denison (Labor polling in the Oz 26/06/12, ReachTEL 29/6/12)
3. Adam Bandt will lose Melbourne. [Note: this assumption is a little speculative. It rests on the Liberals changing their preference strategy from their 2010 approach (which they have said they will do). This would  see Liberal preferences flow to Labor:Greens at 67:33 in 2013 rather than the 20:80 flow in  2010. At the 2010 election Labor won 38.1 and the Greens 36.2 per cent of the primary vote. I am not aware of any subsequent polling in the Federal seat of Melbourne; but the Greens lost the 2012 by-election in the related State seat of Melbourne (where Liberals preferenced Labor ahead of Greens)].
4. Rob Oakshott will lose Lyne (Newspoll 24/10/11; ReachTEL 25/8/2011, 20/6/2012)
5. Tony Windsor will lose New England (Newspoll 24/10/11; ReachTEL 19/6/2012)
6. Peter Slipper's seat of Fisher will be a normal Coalition/Labor contest next election
7. Craig Thomson's seat of Dobell will be a normal Coalition/Labor contest next election
8. Tony Crook will re-contest O'Connor for the Coalition in a normal Coalition/Labor contest
I made some tweak's to the pendulum to give effect to these assumptions.

Next I made a quick estimate of the number of seats by calculating the swing from the previous election and summing the probabilities of a win for each of the 150 seats if that swing was applied. The R-code for this function follows. As you can see, it is a short piece of code. The heavy lifting is done by the sum(pnorm(...)) functions in the middle of this code.

seatCountFromTPPbyProbabilitySum <- function(pendulumFile='./files/AntonyGreenTPP.csv',
pendulum, LaborTPP) {

ALP.Outcome.2010 <- 50.12
swing <- LaborTPP - ALP.Outcome.2010

if(missing(pendulum)) {
pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
}

# Note: sd in next line comes from analysis of federal elections since 1996 ...
ALP = round( sum( pnorm(pendulum$ALP_TPP + swing, mean=50, sd=3.27459) ) ) pc <- pendulum[pendulum$OTHER == 'OTHER', ]
OTHER = round( sum( pnorm(100 - pc$ALP_TPP - swing, mean=50, sd=3.27459) ) ) COALITION = 150 - ALP - OTHER # Just to ensure it all adds to 150. # return a data frame - makes it easier to ggplot later results <- data.frame(Party='Other', Seats=OTHER) results <- rbind(results, data.frame(Party='Coalition', Seats=COALITION)) results <- rbind(results, data.frame(Party=factor('Labor'), Seats=ALP)) return(results) }  Update: I have updated the model to better manage how I treat Denison. seatCountFromTPPbyProbabilitySum <- function(pendulumFile='./files/AntonyGreenTPP.csv', pendulum, LaborTPP) { ALP.Outcome.2010 <- 50.12 swing <- LaborTPP - ALP.Outcome.2010 if(missing(pendulum)) { pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE) pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP) } # Note: sd in next few lines comes from analysis of federal elections since 1996 ... pc <- pendulum[pendulum$OTHER == 'OTHER', ]
other.raw <- sum( pnorm(100 - pc$ALP_TPP - swing, mean=50, sd=3.27459) ) OTHER <- round( other.raw ) carry <- other.raw - OTHER # this approach typically favours Labor (probably the right way to go) ALP <- round( carry + sum( pnorm(pendulum$ALP_TPP + swing, mean=50, sd=3.27459) ) )

COALITION <- 150 - ALP - OTHER # Just to ensure it all adds to 150.

# return a data frame - makes it easier to ggplot later
results <-                data.frame(Party='Other',         Seats=OTHER)
results <- rbind(results, data.frame(Party='Coalition',     Seats=COALITION))
results <- rbind(results, data.frame(Party=factor('Labor'), Seats=ALP))

return(results)
}


From this function we can plot a likely election outcome for a given a swing.

To get a more nuanced understanding of a potential election outcome, I undertake a simple Monte Carlo simulation (typically with 100,000 iterations). This is not a Bayesian MCMC approach. It's just a plain old fashioned MC simulation. The R-code for this procedure is more substantial.

storeResult <- function(N, pendulum, individualSeats=FALSE) {
# Use of R's lexical scoping

# entry sanity checks ...
stopifnot(is.numeric(N))
stopifnot(is.data.frame(pendulum))
stopifnot(N > 0)
seatCount <- nrow(pendulum)
stopifnot(seatCount > 0)

# sanity checking variables
count <- 0
finalised <- FALSE

# where I store the house wins ...
ALP <- rep(0, length=seatCount)
COALITION <- rep(0, length=seatCount)
OTHER <- rep(0, length=seatCount)

CUM_ALP <- rep(0, length=seatCount)
CUM_COALITION <- rep(0, length=seatCount)

# where I keep the seat-by-seat wins
seats <- data.frame(seat=pendulum$SEAT, state=pendulum$STATE, Labor=ALP,
Coalition=COALITION, Other=OTHER)

rememberSim <- function(simResult) {

# - sanity checker
stopifnot(!finalised)
stopifnot(count < N)
count <<- count + 1
stopifnot(length(simResult) == seatCount)

# - overall result
a <- table(simResult)
ALP[ a[names(a)=='ALP'] ] <<- ALP[ a[names(a)=='ALP'] ] + 1
COALITION[ a[names(a)=='COALITION'] ] <<-
COALITION[ a[names(a)=='COALITION'] ] + 1
OTHER[ a[names(a)=='OTHER'] ] <<- OTHER[ a[names(a)=='OTHER'] ] + 1

# - seat by seat result
if(individualSeats) {
seats$Labor <<- ifelse(simResult == 'ALP', seats$Labor + 1,
seats$Labor) seats$Coalition <<- ifelse(simResult == 'COALITION',
seats$Coalition + 1, seats$Coalition)
seats$Other <<- ifelse(simResult == 'OTHER', seats$Other + 1,
seats$Other) } } finalise <- function() { # sanity checker stopifnot(!finalised) stopifnot(count == N) ALP <<- ALP / N COALITION <<- COALITION / N OTHER <<- OTHER / N if(individualSeats) { seats$Labor <<- seats$Labor / N seats$Coalition <<- seats$Coalition / N seats$Other <<- seats$Other / N } for(i in 1:seatCount) { CUM_ALP[i] <<- 1 - sum(ALP[1:i]) CUM_COALITION[i] <<- 1 - sum(COALITION[1:i]) } finalised <<- TRUE } results <- function() { stopifnot(finalised) data.frame(seatsWon=1:nrow(pendulum), Labor=ALP, Coalition=COALITION, Other=OTHER) } cumResults <- function() { stopifnot(finalised) data.frame(seatsWon=1:nrow(pendulum), Labor=CUM_ALP, Coalition=CUM_COALITION) } winProbabilities <- function() { stopifnot(finalised) win <- (floor(seatCount/2) + 1):seatCount list(Labor = sum(ALP[win]), Coalition = sum(COALITION[win])) } seatResults <- function() { stopifnot(finalised) stopifnot(individualSeats) seats } list(rememberSim=rememberSim, finalise=finalise, results=results, cumResults=cumResults, seatResults=seatResults, winProbabilities=winProbabilities) } ## -- similate one Federal election simulateNationaLResult <- function(pendulum, swing) { rawPrediction <- pendulum$ALP_TPP + swing
probabilisticPrediction <- rawPrediction + rnorm(nrow(pendulum), mean=0, sd=3.27459)
ifelse(probabilisticPrediction >= 50, 'ALP', pendulum$OTHER) } ## -- run N simulations of one Federal election outcome simulateOneOutcome <- function(N=100000, pendulumFile='./files/AntonyGreenTPP.csv', pendulum, LaborTPP, individualSeats=FALSE) { ALP.Outcome.2010 <- 50.12 swing <- LaborTPP - ALP.Outcome.2010 if(missing(pendulum)) { pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE) pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP) } r <- storeResult(N, pendulum, individualSeats) for(i in 1:N) r$rememberSim ( simulateNationaLResult(pendulum, swing) )
r\$finalise()

invisible(r)
}


From this simulation, there are a few plots I can make:

I am currently working on a state-level frame for converting a series of state TPP estimates to a national outcome for the House of Representatives.