Thursday, December 13, 2012

How I convert national TPP estimates into likely election outcomes

This is a short methodology discussion on how I generate a possible House of Representatives outcome from a national two-party preferred (TPP) poll estimate. This is not the most sophisticated approach possible (and it will not be the ultimate way I generate these predictions as I refine my models). But, it is what I do at the moment.

The first thing I did was to secure an estimate of the TPP vote for each seat in the 2010 federal election adjusted for boundary changes and redistributions since the 2010 election. I have shamelessly used Antony Green's pendulum for this purpose.

Secondly, building on some of the assumptions Antony made, I thought about how I should handle the treatment of independents (which sit a little outside of the mechanics of a TPP estimate). My current approach is based on the following assumptions (which are not dissimilar to Poliquant's approach):
  1. Bob Katter will win Kennedy
  2. Andrew Wilkie will win Denison (Labor polling in the Oz 26/06/12, ReachTEL 29/6/12)
  3. Adam Bandt will lose Melbourne. [Note: this assumption is a little speculative. It rests on the Liberals changing their preference strategy from their 2010 approach (which they have said they will do). This would  see Liberal preferences flow to Labor:Greens at 67:33 in 2013 rather than the 20:80 flow in  2010. At the 2010 election Labor won 38.1 and the Greens 36.2 per cent of the primary vote. I am not aware of any subsequent polling in the Federal seat of Melbourne; but the Greens lost the 2012 by-election in the related State seat of Melbourne (where Liberals preferenced Labor ahead of Greens)].
  4. Rob Oakshott will lose Lyne (Newspoll 24/10/11; ReachTEL 25/8/2011, 20/6/2012)
  5. Tony Windsor will lose New England (Newspoll 24/10/11; ReachTEL 19/6/2012) 
  6. Peter Slipper's seat of Fisher will be a normal Coalition/Labor contest next election 
  7. Craig Thomson's seat of Dobell will be a normal Coalition/Labor contest next election
  8. Tony Crook will re-contest O'Connor for the Coalition in a normal Coalition/Labor contest 
I made some tweak's to the pendulum to give effect to these assumptions.

Next I made a quick estimate of the number of seats by calculating the swing from the previous election and summing the probabilities of a win for each of the 150 seats if that swing was applied. The R-code for this function follows. As you can see, it is a short piece of code. The heavy lifting is done by the sum(pnorm(...)) functions in the middle of this code.

seatCountFromTPPbyProbabilitySum <- function(pendulumFile='./files/AntonyGreenTPP.csv', 
    pendulum, LaborTPP) {

    ALP.Outcome.2010 <- 50.12
    swing <- LaborTPP - ALP.Outcome.2010

    if(missing(pendulum)) {
        pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE)
        pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
    }
    
    # Note: sd in next line comes from analysis of federal elections since 1996 ...
    ALP = round( sum( pnorm(pendulum$ALP_TPP + swing, mean=50, sd=3.27459) ) )
    
    pc <- pendulum[pendulum$OTHER == 'OTHER', ]
    OTHER = round( sum( pnorm(100 - pc$ALP_TPP - swing, mean=50, sd=3.27459) ) )

    COALITION = 150 - ALP - OTHER # Just to ensure it all adds to 150.

    # return a data frame - makes it easier to ggplot later
    results <-                data.frame(Party='Other',         Seats=OTHER)
    results <- rbind(results, data.frame(Party='Coalition',     Seats=COALITION))
    results <- rbind(results, data.frame(Party=factor('Labor'), Seats=ALP))
    
    return(results)
}

Update: I have updated the model to better manage how I treat Denison.

seatCountFromTPPbyProbabilitySum <- function(pendulumFile='./files/AntonyGreenTPP.csv', 
    pendulum, LaborTPP) {

    ALP.Outcome.2010 <- 50.12
    swing <- LaborTPP - ALP.Outcome.2010

    if(missing(pendulum)) {
        pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE)
        pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
    }
    
    # Note: sd in next few lines comes from analysis of federal elections since 1996 ...
    pc <- pendulum[pendulum$OTHER == 'OTHER', ]
    other.raw <- sum( pnorm(100 - pc$ALP_TPP - swing, mean=50, sd=3.27459) )
    OTHER <- round( other.raw )
    carry <- other.raw - OTHER

    # this approach typically favours Labor (probably the right way to go)
    ALP <- round( carry + sum( pnorm(pendulum$ALP_TPP + swing, mean=50, sd=3.27459) ) )
    
    COALITION <- 150 - ALP - OTHER # Just to ensure it all adds to 150.

    # return a data frame - makes it easier to ggplot later
    results <-                data.frame(Party='Other',         Seats=OTHER)
    results <- rbind(results, data.frame(Party='Coalition',     Seats=COALITION))
    results <- rbind(results, data.frame(Party=factor('Labor'), Seats=ALP))
    
    return(results)
}

From this function we can plot a likely election outcome for a given a swing.


To get a more nuanced understanding of a potential election outcome, I undertake a simple Monte Carlo simulation (typically with 100,000 iterations). This is not a Bayesian MCMC approach. It's just a plain old fashioned MC simulation. The R-code for this procedure is more substantial.

storeResult <- function(N, pendulum, individualSeats=FALSE) {
    # Use of R's lexical scoping

    # entry sanity checks ...
    stopifnot(is.numeric(N))
    stopifnot(is.data.frame(pendulum))
    stopifnot(N > 0)
    seatCount <- nrow(pendulum)
    stopifnot(seatCount > 0)

    # sanity checking variables
    count <- 0
    finalised <- FALSE

    # where I store the house wins ...
    ALP <- rep(0, length=seatCount)
    COALITION <- rep(0, length=seatCount)
    OTHER <- rep(0, length=seatCount)
    
    CUM_ALP <- rep(0, length=seatCount)
    CUM_COALITION <- rep(0, length=seatCount)
    
    # where I keep the seat-by-seat wins
    seats <- data.frame(seat=pendulum$SEAT, state=pendulum$STATE, Labor=ALP, 
        Coalition=COALITION, Other=OTHER)

    rememberSim <- function(simResult) {

        # - sanity checker
        stopifnot(!finalised)
        stopifnot(count < N)
        count <<- count + 1
        stopifnot(length(simResult) == seatCount)
    
        # - overall result
        a <- table(simResult)
        ALP[ a[names(a)=='ALP'] ] <<- ALP[ a[names(a)=='ALP'] ] + 1
        COALITION[ a[names(a)=='COALITION'] ] <<- 
            COALITION[ a[names(a)=='COALITION'] ] + 1
        OTHER[ a[names(a)=='OTHER'] ] <<- OTHER[ a[names(a)=='OTHER'] ] + 1
        
        # - seat by seat result
        if(individualSeats) {
            seats$Labor <<- ifelse(simResult == 'ALP', seats$Labor + 1,
                seats$Labor)
            seats$Coalition <<- ifelse(simResult == 'COALITION', 
                seats$Coalition + 1, seats$Coalition)
            seats$Other <<- ifelse(simResult == 'OTHER', seats$Other + 1, 
                seats$Other)
        }
    }
    
    finalise <- function() {
        # sanity checker
        stopifnot(!finalised)
        stopifnot(count == N)
        
        ALP <<- ALP / N
        COALITION <<- COALITION / N
        OTHER <<- OTHER / N
        
        if(individualSeats) {
            seats$Labor <<- seats$Labor / N
            seats$Coalition <<- seats$Coalition / N
            seats$Other <<- seats$Other / N
        }
        
        for(i in 1:seatCount) {
            CUM_ALP[i] <<- 1 - sum(ALP[1:i])
            CUM_COALITION[i] <<- 1 - sum(COALITION[1:i])
        }
                
        finalised <<- TRUE
    }        
    
    results <- function() {
        stopifnot(finalised)
        data.frame(seatsWon=1:nrow(pendulum), Labor=ALP, Coalition=COALITION, 
            Other=OTHER)
    }
        
    cumResults <- function() {
        stopifnot(finalised)
        data.frame(seatsWon=1:nrow(pendulum), Labor=CUM_ALP, Coalition=CUM_COALITION)
    }

    winProbabilities <- function() {
        stopifnot(finalised)
        win <- (floor(seatCount/2) + 1):seatCount
        list(Labor = sum(ALP[win]), Coalition = sum(COALITION[win]))
    }

    seatResults <- function() {
        stopifnot(finalised)
        stopifnot(individualSeats)
        seats
    }
    
    list(rememberSim=rememberSim, finalise=finalise, results=results, cumResults=cumResults,
        seatResults=seatResults, winProbabilities=winProbabilities)
}

## -- similate one Federal election
simulateNationaLResult <- function(pendulum, swing) {
    rawPrediction <- pendulum$ALP_TPP + swing
    probabilisticPrediction <- rawPrediction + rnorm(nrow(pendulum), mean=0, sd=3.27459)
    ifelse(probabilisticPrediction >= 50, 'ALP', pendulum$OTHER)
}

## -- run N simulations of one Federal election outcome
simulateOneOutcome <- function(N=100000, pendulumFile='./files/AntonyGreenTPP.csv',
    pendulum, LaborTPP, individualSeats=FALSE) {

    ALP.Outcome.2010 <- 50.12
    swing <- LaborTPP - ALP.Outcome.2010

    if(missing(pendulum)) {
        pendulum <- read.csv(pendulumFile, stringsAsFactors=FALSE)
        pendulum$ALP_TPP <- as.numeric(pendulum$ALP_TPP)
    }
    
    r <- storeResult(N, pendulum, individualSeats)
    for(i in 1:N) r$rememberSim ( simulateNationaLResult(pendulum, swing) )
    r$finalise()

    invisible(r)
}

From this simulation, there are a few plots I can make:




I am currently working on a state-level frame for converting a series of state TPP estimates to a national outcome for the House of Representatives.

No comments:

Post a Comment