Sunday, July 28, 2013

Exploring a non-linear house effects model

I have spent a couple of hours today exploring a non-linear house effects model. I have been troubled by the "fizziness" of individual polling houses. When the population voting intention moves to labor, some polling houses are much more sensitive to this movement than others. There appears to be a consistency to this tendency between polling houses (ie. some polling houses appear consistently over-sensitive to these movements, and other houses appear consistently under-sensitive).

One of the reasons I limit my weekly aggregation data to six months is to avoid problems from this non-linearity. If I can better model this non-linearity, I might be able to model longer time-sequences of data.

In today's exploration, I have modeled the non-linearity with a degree-1 polynomial for each pollster. In simplified terms, the house-effect for each pollster is given by the following equation.

house-effect-for-pollster = alpha-for-pollster + beta-for-pollster * (poll-result - minimum-poll-result)

In this model, the constant alpha is analogous to house-effect equation from my previous model. The value for beta indicates the extent to which a polling house is more or less sensitive to a population shift in voting intention. Returning to the term I introduced above, beta is my measure of fizziness.

I have used a sum to zero constraint across polling houses to anchor the alpha value. I have assigned Newspoll=0 to anchor the beta value. The part of this approach I am least comfortable with is the selection of priors for the beta value. These are neither uninformed nor vague. They significantly influence the final result. Clearly, some more thinking is needed here.

The initial experimental results follow (using all of the polling data since the previous election).

The code for the non-linear model follows.

model {
    ## Developed on the base of Simon Jackman's original model 
    ## -- observational model
    for(poll in 1:NUMPOLLS) { 
        # note: x and y are the original polling series
        houseEffect[poll] <- alpha[house[poll]] + beta[house[poll]]*(x[poll]-min(x)) 
        mu[poll] <- walk[day[poll]] + houseEffect[poll]
        y[poll] ~ dnorm(mu[poll], samplePrecision[poll]) 
    ## -- temporal model
    for(i in 2:PERIOD) {
        day2DayAdj[i] <- ifelse(i==DISCOUNTINUITYDAY, 
            walk[i-1]+discontinuityValue, walk[i-1])
        walk[i] ~ dnorm(day2DayAdj[i], walkPrecision)
    sigmaWalk ~ dunif(0, 0.01)            ## uniform prior on std. dev.  
    walkPrecision <- pow(sigmaWalk, -2)   ##   for the day-to-day random walk
    walk[1] ~ dunif(0.4, 0.6)             ## vague prior
    discontinuityValue ~ dunif(-0.2, 0.2) ## uninformative prior

    ## -- house effects model
    for(i in 2:HOUSECOUNT) { 
        alpha[i] ~ dunif(-0.1,0.1) ## vague prior
        beta[i] ~ dunif(-0.1,0.1)  ## could be problematic!!
    alpha[NEWSPOLL] <- -sum(alpha[2:HOUSECOUNT]) ## sum to zero
    beta[NEWSPOLL] <- 0 ## Newspoll as benchmark on non-linearity


  1. This is a particularly interesting idea; I have been experimenting with something myself.

    What is your reasoning for a prior of [-0.1, 0.1] on the beta parameter? This is not an uninformative prior given the intrinsic scale of the problem, and this is evident from your graphs where the right side of the confidence limits pile up against 0.1.

    It looks like that the Morgan F2F value wants to go much higher than this.

    Have you tried something truly uninformative, like [-1,1]? Beta = -1 corresponds to the case where the response is completely static i.e. changes in general sentiment have no effect on the panel. Perhaps a Gaussian with a standard deviation of 1.0 is reasonable.

    There is nothing wrong with having very uninformative priors. Remember that your priors should be based on your understanding of the problem before seeing the data - you should not be using the effects you are seeing in the data before you to select the prior on beta.

    It doesn't surprise me at all that the betas are poorly constrained. You can get an idea of the potential magnitude of the effect by plotting, for a particular poll series, the difference between the poll values and your model as a function of the model value (or the poll value, whichever you prefer). The 2PP vote has not really strayed outside the range ~44 to ~50, and the scatter of the poll values from the model is of the order of the statistical uncertainty, so there is not a good baseline from which to determine the betas. If 2PP was much more volatile then you'd be able to determine beta better, or if you had lots more poll data, or if the poll data had much higher sample sizes.

    An interesting alternative is to consider the case where beta relates not to the value from the poll under consideration, but the value of the model at that point in time. Unfortunately, this (I think) leads to a circularity problem, where the value of the model at any point depends on beta but the value of beta depends on the model.

  2. Incidentally, the effects being talked about here demonstrate why a "sum-to-zero" constraint is pretty dangerous. In somewhere like the US, with loads of polls and lots of organisations, I can find it plausible that there is no net bias.

    On the other hand, Australia has a small number of pollsters. I am not convinced that there is any good reason that the net bias should be zero.

    The better approach is to calibrate the biases off previous elections. The problems then are: 1) few data points, and 2) the pollsters may change their methods periodically. #1 is what it is - you can't get around that. #2 is potentially tractable analytically - you could extend the model to allow for the biases of the pollsters to follow a random walk with time, or alternatively perhaps have the jumps follow a Poisson model whereby they periodically jump in either direction with some amplitude.

    The most realistic model is a compound Poisson-Gaussian process (periodic jumps, the amplitude of which is Gaussian distributed). The techniques for dealing with this situation are well known for stochastic calculus etc (the jump-diffusion model is a very popular options pricing technique).

  3. Julian - you ask a heap of questions, which I will seek to answer.

    Simon Jackman's Bayesian model needs a constraint. Without a constraint, it yields the same shaped line, but randomly (up or down) placed on the chart with each run. I am pretty sure that Simon mentioned this in his text (or one of his journal articles), but being on the road, I can't give you the page reference. The sum to zero constraint is easy to implement, but (you are right) it offers no guarantee of correctness. So yes, you need to think about where the line should be placed, and therefore how it is constrained. For my own personal rule-of-thumb, I sometimes use the mid point between Nielsen and Newspoll as an approximation, but it is problematic.

    I have had a lengthy look at calibration from the previous elections (without compelling success). At the last election, the sum to zero was a point in Labor's favour. But problems present when applying to subsequent elections: Too few elections. New polling houses. Changed methodologies. Non-linearity in polling response response to changes in population voting intentions. etc.

    On to my non-linear model. I chose constraints to yield something that looked reasonable (bad practice, I know; which is why I flagged it as experimental and problematic). It was a test of concept; with huge problem. If I released the constraints entirely (non-informative prior) the model goes to something close to a flat line. Thinking about this some more, I think the problem was in this statement: << beta-for-pollster * (poll-result - minimum-poll-result) >>. Rather than the minimum, it should be a central tendency of some type (mid-point, mean, median, etc.). I may also need a degree-2 polynomial to better model. Anyway, something to think about some more.

    I wrestled with the circularity problem you mentioned (given the previous paragraph, some would say unsuccessfully). It is a limitation with JAGS' directed acyclical graph (DAG) approach.

    My expertise is not quite up to a "compound Poisson-Gaussian process (periodic jumps, the amplitude of which is Gaussian distributed)". I look forward to seeing your work on this.