Sunday, November 27, 2016

Update on US Presidential Election

The vote count continues. Turnout in 2016 is clearly up on 2012.


Dem_2012              65918507
Rep_2012              60934407
Other_2012             2384728
Total_2012           129237642
Dem_2016              64641091
Rep_2016              62438763
Other_2016             7673054
Total_2016           134752908 

For me the big story is the growth in the other party vote. In the US system of first-past-the post voting, these are largely wasted votes. My contention is that these votes were largely lost from Democrat voters, and they cost Clinton the election.

The first chart following is percentage point change in the other parties vote (without Utah, because it was a special case). The second chart is the percentage change in the number of votes cast for other parties (without Oklahoma, because there were no other party votes in 2012). The final chart is the vote share for other parties




The Democrat count is clearly down in the industrial mid-west. The first chart following is the vote share for the Democrats in each state. The second chart is the change in vote share (in percentage points) from 2012 to 2016. The final chart is the change in raw vote count from 2012 to 2016 (expressed as a percent of the 2012 vote count).




For the Republicans, the story is a mixed bag: votes were up in the mid-west (but a close result), but down in Texas, and the west of the country.




While the Michigan count is still not declared (and is now subject to a recount), the indicative results give Michigan to the Republicans.



A quick acknowledgement: I sourced the data for this analysis from http://uselectionatlas.org/RESULTS/.

Saturday, November 19, 2016

Update on US Presidential election count

Last week some 128.5 million votes had been counted. The count is now at 132.3 million votes (and we are still counting). It continues to look like six states will flip from their 2012 party preference.


The two-party swings are now posting as follows (in percentage points) ...


                          Swing  
State                            
Alabama                5.662658  
Alaska                 0.818982  
Arizona               -5.405065  
Arkansas               3.274192  
California            -5.570583  
Colorado               0.528426  
Connecticut            3.772887  
Delaware               7.196230  
District of Columbia  -3.141525  
Florida                2.082540  
Georgia               -2.640453  
Hawaii                10.522691  
Idaho                  0.075144  
Illinois               0.199778  
Indiana                8.825116  
Iowa                  15.247820  
Kansas                -0.561142  
Kentucky               7.168176  
Louisiana              2.432006  
Maine                 12.587298  
Maryland               0.833312  
Massachusetts         -4.135420  
Michigan               9.736549  
Minnesota              6.174261  
Mississippi            7.042194  
Missouri               9.761118  
Montana                6.889904  
Nebraska               4.276905  
Nevada                 4.327039  
New Hampshire          5.212831  
New Jersey             4.152788  
New Mexico             1.898950  
New York               6.955795  
North Carolina         1.695493  
North Dakota          16.097549  
Ohio                  11.430740  
Oklahoma               2.846545  
Oregon                 1.103355  
Pennsylvania           6.471411  
Rhode Island          12.127632  
South Carolina         3.792815  
South Dakota          11.772774  
Tennessee              5.791648  
Texas                 -6.648189  
Utah                 -30.227091  
Vermont                9.189564  
Virginia              -1.450352  
Washington            -1.025402  
West Virginia         15.004805  
Wisconsin              7.751092  
Wyoming                5.471542   

The change in the raw vote counts by party between 2012 and 2016 are as follows:





These are expressed as percentage changes on the 2012 base year, Because there were no other party votes in Oklahoma in 2012, it records an infinite change.

                      Delta_turnout  Delta_Rep  Delta_Dem  Delta_Other
State                                                                    
Alabama                    1.910248   4.658399  -8.796324   224.990096   
Alaska                   -11.234463 -16.700066 -20.354697   141.930344   
Arizona                   10.195707   1.081340  12.628556   193.732301   
Arkansas                   5.355466   5.411706  -3.905844   137.748490   
California               -12.143508 -22.101411 -10.144488    77.727811   
Colorado                   7.426883   0.923017   0.497241   273.206721   
Connecticut                5.345566   6.008672  -1.035234   287.374243   
Delaware                   6.684609  11.870030  -2.877766   256.398428   
District of Columbia       5.958525 -40.493896   5.901075   195.783926   
Florida                   11.615576  10.867510   6.222609   297.072726   
Georgia                    4.705211   0.499065   5.867765   124.322340   
Hawaii                    -1.350366   6.446308 -12.988411   372.423121   
Idaho                      5.401512  -2.528325 -10.528839   297.344211   
Illinois                   4.753142  -0.275195   0.833446   238.168018   
Indiana                    3.737928   9.365213 -10.209785   148.258804   
Iowa                      -1.121048   9.560413 -20.634033   283.045591   
Kansas                    -0.897513  -4.899907  -5.710285   174.477800   
Kentucky                   6.968557  10.646897  -7.438656   190.809197   
Louisiana                  1.753554   2.289063  -3.582441   115.051130   
Maine                      3.292296  13.793469 -12.165530   163.210532   
Maryland                  -1.041027  -4.829046  -4.565800   165.506648   
Massachusetts              2.012901  -8.867863   2.237895   219.212456   
Michigan                   1.297279   7.717884 -11.656618   301.184896   
Minnesota                  0.279953   0.206328 -11.542220   262.165629   
Mississippi               -9.565847  -4.637944 -17.887766    89.881403   
Missouri                   0.464235   6.969118 -13.801892   136.501140   
Montana                    2.072514   4.222030 -11.955073   155.323775   
Nebraska                   2.677437   3.286084  -7.913440   171.538819   
Nevada                    10.648348  10.300992   1.200663   269.996997   
New Hampshire              4.639845   4.808771  -5.703253   332.376229   
New Jersey                 5.278954   7.086192  -0.970606   237.207575   
New Mexico                 1.093579  -5.565118  -7.912408   184.228106   
New York                   0.445977   6.002740  -7.498578   207.735610   
North Carolina             4.362881   3.296343  -0.408329   230.832715   
North Dakota               6.736262  15.216063 -24.889647   250.814569   
Ohio                      -3.771069   4.153658 -18.060840   185.997367   
Oklahoma                   8.848788   6.485962  -5.224249          inf   
Oregon                    10.541226   2.595684   2.118007   229.820298   
Pennsylvania               4.886111   9.001266  -4.489789   205.165348   
Rhode Island               3.504324  14.493906 -10.338355   237.347295   
South Carolina             7.072335   7.814528  -1.220406   247.749887   
South Dakota               1.712959   8.114999 -19.027296   204.971834   
Tennessee                  1.214797   4.023169  -9.526298   165.279282   
Texas                     11.669525   2.483871  16.933071   213.531104   
Utah                       6.737450 -33.180934  20.132797   927.379781   
Vermont                    5.271476   2.881400 -10.372467   459.295526   
Virginia                   3.327627  -2.912393   0.489548   285.448983   
Washington                 2.659095  -7.181619  -2.689820   223.803708   
West Virginia              5.995962  15.750081 -21.770352   162.957703   
Wisconsin                 -2.970212  -0.066621 -14.684775   374.432034   
Wyoming                    2.725437   2.022087 -19.214560   188.857370    

A quick acknowledgement: I sourced the data for this analysis from http://uselectionatlas.org/RESULTS/.

Update

In addition to the change in the raw vote count (above), we can look at the change in vote share for Democrats, Others and Republicans in percentage points in 2016 compared with 2012. In these charts I have not reported Utah (which was a special case). We can see that the Democrat vote share was mostly down across the country, the Other vote share was up and it's an up-and-down mixed-bag for the Republicans.





                      Dem_2012_ppt  Rep_2012_ppt  Other_2012_ppt
State                                                           
Alabama                  -4.029955      1.632702        2.397253
Alaska                   -4.193305     -3.374323        7.567629
Arizona                   0.981314     -4.423750        3.442436
Arkansas                 -3.241861      0.032331        3.209530
California                1.368819     -4.201764        2.832944
Colorado                 -3.318534     -2.790108        6.108642
Connecticut              -3.516540      0.256346        3.260194
Delaware                 -5.253016      1.943215        3.309801
District of Columbia     -0.049292     -3.190817        3.240109
Florida                  -2.411126     -0.328586        2.739712
Georgia                   0.503919     -2.136534        1.632615
Hawaii                   -8.322469      2.200222        6.122247
Idaho                    -4.896985     -4.821842        9.718827
Illinois                 -2.151515     -1.951737        4.103252
Indiana                  -5.893867      2.931249        2.962617
Iowa                    -10.259427      4.988393        5.271034
Kansas                   -1.845608     -2.406751        4.252359
Kentucky                 -5.088959      2.079217        3.009741
Louisiana                -2.127898      0.304109        1.823789
Maine                    -8.420870      4.166428        4.254442
Maryland                 -2.207430     -1.374118        3.581547
Massachusetts             0.133802     -4.001618        3.867815
Michigan                 -6.911175      2.825374        4.085801
Minnesota                -6.207269     -0.033008        6.240277
Mississippi              -4.029578      3.012615        1.016963
Missouri                 -6.288028      3.473090        2.814938
Montana                  -5.725321      1.164583        4.560738
Nebraska                 -3.922406      0.354498        3.567908
Nevada                   -4.470427     -0.143387        4.613814
New Hampshire            -5.137918      0.074912        5.063006
New Jersey               -3.457540      0.695248        2.762292
New Mexico               -4.720900     -2.821951        7.542851
New York                 -5.010221      1.945574        3.064648
North Carolina           -2.210485     -0.514993        2.725478
North Dakota            -11.464071      4.633479        6.830592
Ohio                     -7.510519      3.920221        3.590298
Oklahoma                 -4.296000     -1.449455        5.745455
Oregon                   -4.133026     -3.029671        7.162697
Pennsylvania             -4.644232      1.827179        2.817052
Rhode Island             -8.385632      3.741999        4.643633
South Carolina           -3.414614      0.378201        3.036412
South Dakota             -8.129091      3.643683        4.485409
Tennessee                -4.142874      1.648774        2.494100
Texas                     1.949190     -4.698999        2.749809
Utah                      3.095627    -27.131464       24.035837
Vermont                  -9.892764     -0.703200       10.595965
Virginia                 -1.405104     -2.855456        4.260560
Washington               -2.907304     -3.932706        6.840010
West Virginia            -9.286462      5.718343        3.568118
Wisconsin                -6.377981      1.373111        5.004869
Wyoming                  -5.941530     -0.469989        6.411519  

Sunday, November 13, 2016

A quick look at the US election

I have read a lot of garbage about what the 2016 US election means and signifies. In this post I want to look at what the raw numbers suggest.

While the counting has not finished, it looks like five and possibly six states will flip from Democratic in the 2012 presidential election to Republican in 2016: Florida, Pennsylvania, Ohio, Michigan, Wisconsin and Iowa. The states where the count is still in doubt are New Hampshire and Michigan.

In this first chart, I have plotted the likely state winners according to American sensibilities: the states which Trump/Republicans won are red and the states Clinton/Democrats won are blue. The states that have flipped from one side of politics to the other are in a darker hue.


The biggest swings to the Republicans (on a two-party basis) were in the industrial upper mid-west. This suggests the economy (and the challenges of managing economic change in those states previously heavily dependent on the industrial/manufacturing sector) may have driven the Trump win. There is an irony here: Bill Clinton won the 1992 with the catch-phrase: it's the economy stupid.


The bigger swings by state follow. In this table, a positive swing is to the Republicans, a negative swing is to the Democrats. In Utah, the measurement of a two-party swing was compounded by Evan McMullin an independent and former Republican, who took votes from Trump.

Hawaii                10.402887  
Indiana                8.836585  
Iowa                  15.197928  
Maine                 12.587298  
Michigan               9.742464  
Missouri               9.761118  
North Dakota          16.080305  
Ohio                  11.492718  
Rhode Island          12.140835  
South Dakota          11.772774  
Utah                 -29.320763  
Vermont                9.187492  
West Virginia         15.004805  
Wisconsin              7.751092   

If the economy was the distal cause, the immediate factor I find most compelling in explaining the election outcome was the decline in raw Democratic votes in 2016. Put simply, Clinton was not as attractive to voters in 2016 as Obama was in 2008 or 2012. This outcome is driven less by a decline in turn-out and more by an increase in votes for the Other parties. In the US first-past-the-post voting system, these third party votes are effectively wasted. Of note: Maine voted on a referendum to introduce ranked-choice voting for the next Presidential election. (Ranked-choice voting is what we have in Australia).

Before looking at voting patterns by party, let's look at the change in overall turnout between 2012 and 2016. At this point in the count some 128.5 million votes have been counted. In 2012, there were 129.2 million votes counted in total. I expect the final 2016 vote count will exceed the 2012 count.


In the next three charts, we will look at percentage changes in the raw vote numbers by state for Republicans, Democrats and Others. The most significant thing to notice here is the dramatic increase in votes for other parties. But also of note, the decline of the Democratic vote in many states, compared with the neutral or slight growth in Republican votes.




The table of percentage changes in vote count, by state, as set out in the above charts follows.

                      Delta_turnout  Delta_Rep  Delta_Dem  Delta_Other
State                                                                    
Alabama                    1.910248   4.658399  -8.796324   224.990096   
Alaska                   -15.597930 -20.805096 -24.162590   129.167615   
Arizona                   -5.535995 -12.783973  -3.453852   137.245401   
Arkansas                   5.213714   5.252229  -4.000162   137.342120   
California               -27.072075 -34.879166 -25.381954    40.718861   
Colorado                   3.527116  -1.778285  -3.800161   255.222752   
Connecticut                1.609885   2.175307  -4.469738   272.546747   
Delaware                   6.671321  11.855527  -2.886835   256.244661   
District of Columbia      -2.549325 -45.966045  -2.563747   172.896669   
Florida                   10.707080  10.657011   5.897012   237.066350   
Georgia                    4.378169   0.306924   5.403909   123.319726   
Hawaii                    -6.232617   1.119696 -17.158528   344.105923   
Idaho                      5.401512  -2.528325 -10.528839   297.344211   
Illinois                   3.193796  -0.797905  -1.391417   234.500124   
Indiana                    3.674430   9.364862 -10.209785   145.281806   
Iowa                      -1.581868   8.996916 -20.956447   281.246769   
Kansas                    -0.897513  -4.899907  -5.710285   174.477800   
Kentucky                   6.968557  10.646897  -7.438656   190.809197   
Louisiana                  1.711629   2.261031  -3.641763   114.950095   
Maine                      3.292296  13.793469 -12.165530   163.210532   
Maryland                  -5.847539  -8.262225  -9.363087   137.265248   
Massachusetts              2.012901  -8.867863   2.237895   219.212456   
Michigan                   0.840977   7.689755 -11.688592   270.282940   
Minnesota                  0.349763   0.288436 -11.490609   262.405051   
Mississippi               -9.565847  -4.637944 -17.887766    89.881403   
Missouri                   0.464235   6.969118 -13.801892   136.501140   
Montana                    0.116206   2.310695 -13.564276   147.788272   
Nebraska                   1.259726   2.155078  -9.568626   166.380411   
Nevada                    10.648348  10.300992   1.200663   269.996997   
New Hampshire              2.934855   4.816955  -5.699736   226.555295   
New Jersey                 3.286754   5.435946  -3.089847   230.071427   
New Mexico                 1.093579  -5.565118  -7.912408   184.228106   
New York                   0.445977   6.002740  -7.498578   207.735610   
North Carolina             4.071784   3.058278  -0.714748   229.003640   
North Dakota               6.415458  14.864772 -25.075504   249.341081   
Ohio                      -4.471185   4.153658 -18.060840   147.541950   
Oklahoma                   8.848788   6.485962  -5.224249          inf   
Oregon                     7.923902   0.593894  -0.472443   219.613974   
Pennsylvania               4.822087   9.001266  -4.489789   200.825561   
Rhode Island               3.164675  14.132592 -10.646210   236.409250   
South Carolina             6.294428   6.970219  -1.867564   245.386703   
South Dakota               1.712959   8.114999 -19.027296   204.971834   
Tennessee                  0.689381   3.938646  -9.561272   135.283243   
Texas                     11.603998   2.445314  16.918713   211.061714   
Utah                     -11.301832 -43.420065  -0.330404   727.731299   
Vermont                    5.270808   2.873848 -10.372969   459.377125   
Virginia                   2.607012  -3.299823  -0.501009   283.482135   
Washington               -13.282854 -21.421277 -16.438627   147.327113   
West Virginia              5.995962  15.750081 -21.770352   162.957703   
Wisconsin                 -2.970212  -0.066621 -14.684775   374.432034   
Wyoming                    2.702149   2.001029 -19.227550   188.709860   

This table tells a fairly consistent story of votes leaking from Democrats to the Other parties.This raises the interesting conjecture on whether Bernie Sanders would have done a better job at holding the flow of votes. My suspicion (without supporting evidence) is that he would have held more votes lost to others on the left, but may have lost more votes to Trump on the right.

Some have contested that Trump doesn't have a real mandate because he did not get more than 50 per cent of the vote. Arguments can be made about the fairness of the US voting system: particularly as it looks like Clinton won more of the popular vote but not the electoral college vote. However, these arguments are not resolvable. Fairness, like beauty, is in the eye of the beholder. The American founders decided to weight their voting system to those who are engaged (through voluntary voting) and to those who live in the less populous states (by giving all states two electoral college votes, and then one or more votes weighted to the population of the state). They also decided on a first-past-the-post system for counting votes. While compelling arguments can be made for and against each of these design elements, ultimately the Presidential election was conducted under the rules accepted by the American people.

Finally, a quick acknowledgement: I sourced the data for this analysis from http://uselectionatlas.org/RESULTS/.

Wednesday, November 9, 2016

Another polling fail

We have had a few polling fails recently in the Anglo-sphere. Two United Kingdom examples quickly come to mind: the General Election in 2015, where the polls predicted a hung parliament, and Brexit in 2016, in which remaining in the EU was the predicted winner. Closer to home we had the Queensland state election in 2015, in which the polls foreshadowed a narrow Liberal National Party win.

Today's election of Donald Trump in the United States will be added to the list of historic polling fails.

  • The New York Times had the average of the polls with Clinton on 45.9 per cent to Trump's 42.8 per cent (+3.1 percentage points).  The NYT gave Clinton an 84 per cent chance of winning the Electoral College vote.
  • FiveThirtyEight.com had the average of the polls with Clinton on 48.5 to Trump's 44.9 per cent (+3.6). FiveThirtyEight gave Clinton a 71.4 per cent chance of winning the Electoral College vote.
  • The Princeton Election Consortium had Clinton ahead of Trump with +4.0 ± 0.6 percentage points. PEC gave Clinton a 93 per cent chance of winning the Electoral College vote.

While the count is not over, the current tally has Clinton ahead in the national popular vote by +1.1 percentage points, but losing the Electoral College vote. The most likely Electoral College tally looks like Trump with 305 Electoral College votes to Clinton's 233.

So today's big question: Why such a massive polling fail?

It will take some time to answer this question with certainty. However, I have a couple of guesses.

My first guess would be the social desirability bias. This is sometimes referred to as the "shy voter problem" or the Bradley effect. At the core of this polling problem, some voters will not admit their actual polling preference to the pollster because they fear the pollster will negatively judge that preference. It is not surprising that such a controversial figure as Donald Trump would prompt issues of social desirability in polling. Elite opinion was against Trump. Clinton labeled Trump supporters as "deplorable". No-one wants to be in that basket. Pollsters might also look at Latino voters in Florida that appear to have voted for Trump in larger numbers than expected.

The second area where I suspect pollsters will look is their voter turn-out models. Who actually voted compared with who said they would vote to pollsters. This was a very different election to the previous two Presidential elections. Turnout-out models based on previous elections may have misdirected the polling results (particularly on the basis of race and particularly in the industrial mid-west).

A final thing that might be worth looking at is herding. The final polls were close, perhaps remarkably close. This may have been natural, or it may have resulted from pollsters modulating their final outputs to be similar with each other.

Sunday, November 6, 2016

How are seat swings distributed around the national swing?

If we take all nine Federal elections since 1993 (that is the Federal elections in 1993, 1996, 1998, 2001, 2004, 2007, 2010, 2013 and 2016) we can look at the distribution of individual seat swings around the national swing for each election. This information is useful for Monte Carlo simulations of future elections. The summary statistics for this analysis follow.

    count    1342.000000
    mean       -0.014416
    std         3.224886
    min       -14.250000
    25%        -2.067500
    50%        -0.080000
    75%         1.990000
    max        17.350000

Let's start with a normal distribution with a mean of -0.014 and a standard deviation of 3.22 percentage points (as suggested by the raw data). It yields a plot as follows (where the normal curve is fitted to the data in red).


However, there is a potential problem with this normal distribution. There are too many outliers for a normal distribution. In the area between -3 and +3 standard deviations from the mean we would expect to see 99.73 per cent of all observations if the data was normally distributed. We would only expect to see 0.27 per cent of the observations outside of this range. We actually have 0.82 per cent of our observations outside of this range.

We also have more observations bunched in the middle of the distribution. For a normal distribution, you would expect to find 68.27 per cent of the observations between -1 and +1 standard deviations from the mean. We have 70.19 per cent of our observations in the middle of the distribution.

Not surprisingly therefore, the excess kurtosis statistic for the observations is positive: 1.15 (using Fisher’s definition of kurtosis, where the normal curve has a kurtosis of zero). From our sample, it would appear that the distribution of seat swings around the national swing is a touch leptokurtic. More observations than normal are clustered around the mean, but there is also a higher probability than normal of substantial outliers occurring from time to time (which is sometimes referred to as a fat tail risk distortion).

Given the fat tails, I tried a couple of other distributions to see if they would provide a better fit than the Gaussian normal distribution. My attempt to fit a Cauchy distribution - see next chart - yielded a poorer fit (as measured by the sum of squared errors). 


However, Student's t-distribution (next chart) was an improvement on the normal distribution in terms of fit. When it comes to a Monte Carlo simulation of election outcomes for individual seats around a national swing, the t-distribution looks the most promising. The parameters for this t-distribution are: df=10.5318772642; loc= -0.0531408340653; and scale=2.89669817523. I suspect it is simply an artifact of the raw data that the location parameter is not zero.


The relationship between the t and normal distributions can be seen in the next chart.


It is worth considering the outliers that lie beyond three standard deviations from the mean. The complete list follows. The critical column is the Adjusted Swing column, which is the swing for the seats minus the national swing (to Labor). There are interesting stories with many of these unusually large swings.

             Division StateAb   Swing  AdjSwing
1993-56       Calwell     Vic   11.37      9.83
1993-80   Maribyrnong     Vic   11.51      9.97
1993-87         Wills     Vic   12.27     10.73
1996-108        Oxley     Qld  -19.31    -14.25
1996-86         Wills     Vic   12.29     17.35
1998-112     Wide Bay     Qld   15.32     10.71
2010-53        Fowler     NSW  -13.81    -11.23
2010-82      Kingston      SA    9.49     12.07
2013-53        Fowler     NSW    8.04     11.65
2013-92         Lyons     TAS  -13.51     -9.90
2016-21          Burt      WA    13.2     10.07

The python code for this analysis is pretty rough. A fair bit of effort went into munging the raw data from the Australian Electoral Commission (AEC) into something that I could work with. The data format changed a number of times over the years. Swings prior to the 2004 election were always reported from the perspective of the Labor party. From the 2004 election, they are reported from the perspective of the government of the day (going into the election). State abbreviations were not used in the earlier data files from the AEC. There is also quite a bit of arcane code dedicated to statistical tests.

import pandas as pd
import numpy as np
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
from scipy import stats  

plt.style.use('../bin/markgraph.mplstyle')

# --- get data

# - a quick and dirty labeling function
def state_label(df):
    states = ['New South Wales', 'Victoria', 'Queensland', 'Western Australia', 
        'South Australia', 'Tasmania', 'Australian Capital Territory', 
        'Northern Territory'] # note: state order is important
    state_ab = ['NSW', 'Vic', 'Qld', 'WA', 'SA', 'Tas', 'ACT', 'NT']

    df['StateAb'] = None
    df.iloc[0, 0] = 'New South Wales'
    for name, ab in zip(states, state_ab):
        position = df[df['Division'] == name].index[0]
        df.loc[position:, 'StateAb'] = ab

    return(df)

# - get the very old data - from ugly tab separated files
# Note: some seriously ugly munging occurring here
# Note: edited the 2001 data file to remove trailing tabs
# Note: Swings in this series are from the perspective of the Labor Party
# Note: the National Swing in the 2001 data file appears incorrect
d = {} # dictionary of divisions
old_years =      ['1993', '1996', '1998', '2001']
old_nat_swings = [1.54,   -5.06,  4.61,   -1.93] # positive = to Labor
start_rows = [17, 17, 19, 19]
previous = None

for year, start, swing in zip(old_years, start_rows, old_nat_swings):
    print('Loading: ' + year + ' ================')

    # get raw data
    d[year] = pd.read_table('./Data/' + year + '-V1_9.TXT', header=start, 
        index_col=None, quotechar='"', sep='\t', na_values = ['na', '-', '.', ''])
    
    # munge the data into shape
    d[year] = state_label(d[year])
    d[year] = d[year].dropna(axis=1, how='all') # drop empty columns
    d[year] = d[year].dropna() # drop empty rows
    d[year] = d[year][~d[year]['Division'].str.contains(' Total')] # drop state totals
    
    # some swing checks ...
    current = (d[year]['ALP'].astype(float).sum() /  
        d[year]['Total'].astype(float).sum() * 100.0)
    if previous :
        print('Swing: ', current - previous)
    previous = current
    
    # get relative national swing for each seat
    d[year]['AdjSwing'] = d[year]['Swing'].astype(float) - swing # positive Labor
    d[year] = d[year][['Division', 'StateAb', 'Swing', 'AdjSwing']]
    d[year].index = year + '-' + pd.Series(range(len(d[year]))).astype(str)     

# - get the more recent data - in CSV files
# Note: Swings in this series are from the perspective of the govt. of the day
new_years =      ['2004',  '2007',  '2010',  '2013',  '2016']
new_nat_swings = [-1.79,   5.44,    -2.58,   -3.61,   3.13] # positive = to Labor
new_suffix =     ['12246', '13745', '15508', '17496', '20499']
reversables =    [True,    True,    False,   False,   True]

for year, suffix, swing, r in zip(new_years, new_suffix, new_nat_swings, reversables):
    print('Loading: ' + year + ' ================')
    
    # get the raw data
    d[year] = pd.read_csv('./Data/HouseTppByDivisionDownload-'+suffix+'.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])
    
    # munge the data into shape
    d[year].rename(columns={'DivisionNm':'Division'}, inplace=True)
    if r:
        d[year]['Swing'] = -d[year]['Swing'] # from the perspective of swing to Labor
        
    # some swing checks ...
    current = (d[year]['Australian Labor Party Votes'].astype(float).sum() /  
        d[year]['TotalVotes'].astype(float).sum() * 100.0)
    if previous :
        print('Swing: ', current - previous)
    previous = current
    
    # get relative national swing for each seat
    d[year]['AdjSwing'] = d[year]['Swing'].astype(float) - swing # positive = Labor bias
    d[year] = d[year][['Division', 'StateAb', 'Swing', 'AdjSwing']]
    d[year].index = year + '-' + pd.Series(range(len(d[year]))).astype(str) # re-index

# - combine the DataFrames
combined = pd.DataFrame()
for year in sorted(d) :
    combined = combined.append(d[year])

# --- some summary statistics
print(combined['AdjSwing'].describe())
print('Kurtosis: ', combined['AdjSwing'].kurtosis())

# - fit a normal distribution
print('Fit a distribution ================')
# let's histogram the original data
BINS = 100
y, x = np.histogram(combined['AdjSwing'], bins=BINS, normed=True)
x = (x + np.roll(x, -1))[:-1] / 2.0

# let's try the normal distribution
mean, stdev = stats.norm.fit(combined['AdjSwing'])
pdf_gauss = stats.norm.pdf(x, mean, stdev)
sse_gauss = np.sum(np.power(y - pdf_gauss, 2.0)) # sum squared error
print('SSE for normal PDF: ', sse_gauss)

# let's try the Cauchy distribution
loc, scale = stats.cauchy.fit(combined['AdjSwing'])
pdf_cauchy = stats.cauchy.pdf(x, loc, scale)
sse_cauchy = np.sum(np.power(y - pdf_cauchy, 2.0)) # sum squared error
print('SSE for Cauchy PDF: ', sse_cauchy)

# let's try the t distribution
shape, loc, scale = stats.t.fit(combined['AdjSwing'])
pdf_t = stats.t.pdf(x, shape, loc, scale)
sse_t = np.sum(np.power(y - pdf_t, 2.0)) # sum squared error
print('SSE for t PDF: ', sse_t)
print('t parameters df, loc, scale: ', shape, loc, scale)

# - a quick look at the outliers
print('A quick look at outliers ================')
outliers = combined[combined['AdjSwing'].abs() > stdev * 3]
proportion = len(outliers) / len(combined) 
print(outliers)
print("3 * stdev outlier proportion: ", proportion, 
    '; which compares with a theoretical 0.0027')
outliers = combined[combined['AdjSwing'].abs() > stdev * 2]
proportion = len(outliers) / len(combined) 
print("2 * stdev outlier proportion: ", proportion, 
    '; which compares with a theoretical 0.0455')
outliers = combined[combined['AdjSwing'].abs() > stdev]
proportion = len(outliers) / len(combined) 
print("1 * stdev outlier proportion: ", proportion, 
    '; which compares with a theoretical 0.3137')

# --- and plot ...

# - plot data fitted to normal 
ax = combined['AdjSwing'].hist(bins=BINS, normed=True, color='cornflowerblue')
ax.plot(x, pdf_gauss, color='r')
ax.set_title("Seat swings distributed around the national TPP swing (93-16)")
ax.set_xlabel('Seat Swing minus National Swing (percentage points to Labor)') 
ax.set_ylabel('Probability')
ax.axvline(0, color='#999999', linewidth=0.5)

fig = ax.figure
fig.text(0.55, 0.75, 'Normal distribution', fontsize='small', color='red')
fig.tight_layout(pad=1)
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
    fontsize='x-small', fontstyle='italic', color='#999999')

fig.savefig('./graphs/national_swing_hist_normal.png', dpi=125)
plt.close() 

# - plot data fitted to Cauchy 
ax = combined['AdjSwing'].hist(bins=BINS, normed=True, color='cornflowerblue')
ax.plot(x, pdf_cauchy, color='r')
ax.set_title("Seat swings distributed around the national TPP swing (93-16)")
ax.set_xlabel('Seat Swing minus National Swing (percentage points to Labor)') 
ax.set_ylabel('Probability')
ax.axvline(0, color='#999999', linewidth=0.5)

fig = ax.figure
fig.tight_layout(pad=1)
fig.text(0.55, 0.75, 'Cauchy distribution', fontsize='small', color='red')
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
    fontsize='x-small', fontstyle='italic', color='#999999')

fig.savefig('./graphs/national_swing_hist_cauchy.png', dpi=125)
plt.close() 

# - plot data fitted to t 
ax = combined['AdjSwing'].hist(bins=BINS, normed=True, color='cornflowerblue')
ax.plot(x, pdf_t, color='r')
ax.set_title("Seat swings distributed around the national TPP swing (93-16)")
ax.set_xlabel('Seat Swing minus National Swing (percentage points to Labor)') 
ax.set_ylabel('Probability')
ax.axvline(0, color='#999999', linewidth=0.5)

fig = ax.figure
fig.tight_layout(pad=1)
fig.text(0.55, 0.75, "Student's t distribution", fontsize='small', color='red')
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
    fontsize='x-small', fontstyle='italic', color='#999999')

fig.savefig('./graphs/national_swing_hist_t.png', dpi=125)
plt.close() 

# - plot data fitted to normal and t 
ax = combined['AdjSwing'].hist(bins=BINS, normed=True, color='cornflowerblue')
ax.plot(x, pdf_gauss, color='blue')
ax.plot(x, pdf_t, color='red')
ax.set_title("Seat swings distributed around the national TPP swing (93-16)")
ax.set_xlabel('Seat Swing minus National Swing (percentage points to Labor)') 
ax.set_ylabel('Probability')
ax.axvline(0, color='#999999', linewidth=0.5)

fig = ax.figure
fig.tight_layout(pad=1)
fig.text(0.55, 0.70, "Normal distribution", fontsize='small', color='blue')
fig.text(0.55, 0.75, "Student's t distribution", fontsize='small', color='red')
fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
    fontsize='x-small', fontstyle='italic', color='#999999')

fig.savefig('./graphs/national_swing_hist_normal+t.png', dpi=125)
plt.close() 

Saturday, November 5, 2016

The post election count favours the Coalition

For the first time at the 2016 election, the Australian Electoral Commission has provided a two party preferred (TPP) count for each seat by vote type. This affords another window on the contention that the final count is more favourable to the Coalition when compared with the count of ordinary votes, which is completed on election night.

You can see the earlier analysis here and here, which came to the same conclusion over the past five Federal elections, based on the two candidate preferred (TCP) counts for each seat by vote type.

If we look at the Coalition's TPP percentages (summed across all seats, for each vote type) it received:

    Ordinary Votes:             49.952 %
    Absent Votes:               46.090 %
    Provisional Votes:          40.739 %
    Declaration Pre-Poll Votes: 51.580 %
    Postal Votes:               56.251 %
    Total Votes:                50.356 %

In the 2016 election, the Coalition lost the TPP count on election night (ordinary votes in the above list). But by the final count it had improved its position by 0.404 percentage points to win the final TPP count. The distribution of the Coalition's bias (compared with ordinary votes) across seats can be seen in the following charts.






The python code for the above analysis follows.

import pandas as pd
import numpy as np
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
plt.style.use('../bin/markgraph.mplstyle')

# --- get data
e2016 = pd.read_csv('./Data/HouseTppByDivisionByVoteTypeDownload-20499.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])

# --- some useful frames
vote_types = ['OrdinaryVotes', 'AbsentVotes', 'ProvisionalVotes', 
    'DeclarationPrePollVotes', 'PostalVotes','TotalVotes']
vote_names = ['Ordinary Votes', 'Absent Votes', 'Provisional Votes', 
    'Pre-poll Declaration Votes', 'Postal Votes','Total Votes']

Coalition = 'Liberal/National Coalition'
Labor = 'Australian Labor Party'
Percent = 'Percentage'

# --- now let's calculate and plot the comparisons
votes = e2016.copy()
year = 2016
print(year)
    
# check vote sums
Laborlist = (Labor + ' ' + pd.Series(vote_types[:-1])).tolist()
Coalitionlist = (Coalition + ' ' + pd.Series(vote_types[:-1])).tolist()
assert((votes[Laborlist + Coalitionlist].sum(axis=1) == votes['TotalVotes']).all())

# let's focus on Coalition TPP only (and we will recalculate)
votes[Coalition + ' TotalVotes'] = votes[Coalitionlist].sum(axis=1)
for x in vote_types[:-1]:
    total = votes[Coalition + ' ' + x] + votes[Labor + ' ' + x]
    votes[Coalition + ' ' + x + Percent] = votes[Coalition + ' '+ x] / total * 100.0
    print(x + ': ', votes[Coalition + ' '+ x].sum() / total.sum() * 100.0, '%')
votes[Coalition + ' TotalVotes' + Percent] = (votes[Coalition + ' TotalVotes'] /
    votes['TotalVotes'] * 100.0)
print('Total Votes: ', 
    votes[Coalition + ' TotalVotes'].sum() / votes['TotalVotes'].sum() * 100.0, '%')
    
# and plot ...
types = (Coalition + ' ' + pd.Series(vote_types) + Percent).tolist()
ordinary = votes[Coalition + ' ' + 'OrdinaryVotes' + Percent]
for type, name in zip(types[1:],vote_names[1:]):
    votes[type+'-Ordinary'] = votes[type] - ordinary
    ax = votes[type+'-Ordinary'].hist(bins=25)
    ax.set_title(str(year)+' Coalition TPP Bias in '+name+' cf Ordinary Votes')
    ax.set_xlabel('Coalition bias in TPP percentage points') 
    ax.set_ylabel('Number of Seats')
    ax.axvline(0, color='#999999', linewidth=0.5)
        
    fig = ax.figure
    fig.tight_layout(pad=1)
    fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
        fontsize='x-small', fontstyle='italic', color='#999999')

    fig.savefig("./graphs/TCP3_Coalition_"+str(year)+'_hist_'+name+'-ordinary.png', 
        dpi=125)
    plt.close()

Sunday, October 30, 2016

Early voters favour the Coalition (redux)

In response to my post last weekend, @damonism identified a another way of looking at early voting for the elections in 2010, 2013 and 2016.

Since the 2010 election, pre-poll voting has been made easier. A new category of pre-poll vote was created for the 2010 election: the ordinary pre-poll vote, which is made within a voter's normal electorate. These ordinary pre-poll votes are counted on election night. Declaration pre-poll votes - which since 2010 are cast outside of a person's home electorate - are counted after the election night.

Prior to 2010, all pre-poll votes were treated as declaration pre-poll votes. Pre-poll voters had to provide a good reason to pre-poll (such as living too far from a polling booth). Since 2010, pre-poll voters simply need to declare that they won't be able to make it to a polling booth on election day.

In this analysis, I compare the vote outcomes for each seat and each vote-type with the on-election-day ordinary votes. Similar conclusions hold ...
  • Absent voters in a seat typically do not favour the Coalition (when compared with on-election-day ordinary voters)
  • Postal voters in a seat typically favour the Coalition (when compared with on-election-day ordinary voters)
  • Pre-poll ordinary voters in a seat typically favour the Coalition (when compared with on-election-day ordinary voters)
  • Pre-poll declaration voters in a seat typically favour the Coalition (when compared with on-election-day ordinary voters)
  • Provisional voters in a seat do not typically favour the Coalition (when compared with on-election-day ordinary voters)

The charts, this time sorted by vote type, follow ...

Absent Votes




Postal Votes




Pre-poll Ordinary Votes




Pre-poll Declaration Votes




Provisional Votes




Total Votes




Counts and Proportions by Vote Type








Code

For the curious (and to help ensure I have not made a doozy of an error), my python code for the above charts follows. The data comes straight from the Australian Electoral Commission.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('../bin/markgraph.mplstyle')

# --- get data
e2016 = pd.read_csv('./Data/HouseTcpByCandidateByVoteTypeDownload-20499.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])
e2013 = pd.read_csv('./Data/HouseTcpByCandidateByVoteTypeDownload-17496.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])
e2010 = pd.read_csv('./Data/HouseTcpByCandidateByVoteTypeDownload-15508.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])

o2016 = pd.read_csv('./Data/HouseTcpByCandidateByPollingPlaceDownload-20499.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])
o2013 = pd.read_csv('./Data/HouseTcpByCandidateByPollingPlaceDownload-17496.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])
o2010 = pd.read_csv('./Data/HouseTcpByCandidateByPollingPlaceDownload-15508.csv', 
    header=1, index_col=None, quotechar='"',sep=',', na_values = ['na', '-', '.', ''])


# --- some useful frames
years = [2010,  2013,  2016]
data_year_all = zip([e2010, e2013, e2016], [o2010, o2013, o2016], years)

vote_types = ['OrdinaryVotesAdj', 'AbsentVotes', 'ProvisionalVotes', 
    'PrepollOrdinaryVotes', 'PrePollVotes', 'PostalVotes','TotalVotes']
vote_names = ['Ordinary Votes', 'Absent Votes', 'Provisional Votes', 
    'Pre-poll Ordinary Votes', 'Pre-poll Declaration Votes', 'Postal Votes','Total Votes']
type_names = zip(vote_types, vote_names)

Coalition = ['LP', 'NP', 'CLP', 'LNQ', 'LNP']
Labor = ['ALP']
Other = ['IND', 'GRN', 'KAP', 'ON', 'XEN', 'PUP']


# --- now let's calculate and plot the comparisons
totals = pd.DataFrame()
for all, ords, year in data_year_all:

    all = all.copy() # let's be non-destructive
    ords = ords.copy() # let's be non-destructive
    print(year)
    
    # re-label parties
    def grouper(df):
        df['PartyGroup'] = None
        df['PartyGroup'] = df['PartyGroup'].where(~df['PartyAb'].isin(Coalition), 'Coalition')
        df['PartyGroup'] = df['PartyGroup'].where(~df['PartyAb'].isin(Labor), 'Labor')
        df['PartyGroup'] = df['PartyGroup'].where(~df['PartyAb'].isin(Other), 'Other')
        return(df)
    all = grouper(all)
    ords = grouper(ords)

    # check ordinary vote sums
    assert(all['OrdinaryVotes'].sum() == ords['OrdinaryVotes'].sum())

    # find the ordinary pre-poll vote totals
    indexer = ['DivisionNm', 'PartyGroup']
    prepoll = ords[ords['PollingPlace'].str.contains('PPVC|PREPOLL')]
    prepoll = prepoll.groupby(indexer).sum()[['OrdinaryVotes']] # return df
    prepoll.columns = ['PrepollOrdinaryVotes']
    prepoll.sort_index(inplace=True)
    
    # index all to match the ordinary prepoll votes DataFrame
    all = all.set_index(indexer)
    all.sort_index(inplace=True)
    
    # and joint them up on index ...
    all['PrepollOrdinaryVotes'] = prepoll['PrepollOrdinaryVotes']
    
    # and correct ordinary votes to account for ordinary pre-poll votes
    all['OrdinaryVotesAdj'] = all['OrdinaryVotes'] - all['PrepollOrdinaryVotes']
    all = all[vote_types]

    # check row additions
    assert((all['TotalVotes'] == all[vote_types[:-1]].sum(axis=1)).all())
    
    # calculate vote counts
    total_votes = pd.DataFrame(all.sum()).T
    total_votes.index = [year]
    totals = totals.append(total_votes)
    
    # convert to percent
    allPercent = all / all.groupby(level=[0]).sum() * 100.0

    # let's focus on Coalition seats only
    allPercent = allPercent[allPercent.index.get_level_values('PartyGroup') == 'Coalition']
    
    # weed out Nat vs Lib contests - as these Coalition v Coalition contests confound
    allPercent['index'] = allPercent.index
    allPercent= allPercent.drop_duplicates(subset='index', keep=False)
    
    # and plot ...
    for type, name in zip(vote_types[1:],vote_names[1:]):
        allPercent[type+'-Ordinary'] = allPercent[type] - allPercent['OrdinaryVotesAdj']
        ax = allPercent[type+'-Ordinary'].hist(bins=25)
        ax.set_title(str(year)+' Coalition Bias in '+name+' cf Ordinary Votes')
        ax.set_xlabel('Coalition bias in percentage points') 
        ax.set_ylabel('Number of Seats')
        ax.axvline(0, color='#999999', linewidth=0.5)
        
        fig = ax.figure
        fig.tight_layout(pad=1)
        fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
            fontsize='x-small', fontstyle='italic', color='#999999')

        fig.savefig("./graphs/TCP2_Coalition_"+str(year)+'_hist_'+type+'-ordinary.png', dpi=125)
        plt.close()
        
    # identify any unusual outliers
    strange = allPercent[allPercent['TotalVotes-Ordinary'].abs() > 4.0]
    if len(strange):
        print('Outliers:')
        print(strange)

# plot counts
totals = totals / 1000.0 # work in Thousands - easier to read
for col, name in zip(totals.columns, vote_names):
    ax = totals[col].plot(marker='s')
    ax.set_title('Vote count by year for '+name)
    ax.set_xlabel('Election Year') 
    ax.set_ylabel('Thousand Formal Votes')
    ax.set_xlim([2009.5,2016.5])
    ax.set_xticks([2010,  2013,  2016]) 
    ax.set_xticklabels(['2010', '2013', '2016'])
    
    yr = ax.get_ylim()
    expansion = (yr[1] - yr[0]) * 0.02
    ax.set_ylim([yr[0]-expansion,yr[1]+expansion])
    
    fig = ax.figure
    fig.tight_layout(pad=1)
    fig.text(0.99, 0.01, 'marktheballot.blogspot.com.au', ha='right', va='bottom',
            fontsize='x-small', fontstyle='italic', color='#999999')

    fig.savefig('./graphs/TCP2_Vote_Count'+name+'.png', dpi=125)
    plt.close()

Updated 2 November 2016

Additional text added to explain the ordinary pre-poll vote, which was introduced at the 2010 election.