As the drama of the US presidential race winds down for another four years, electoral forecasts have been ongoing, as the results have trickled in from the last of the states. Our Principal Data Scientist Dr James McKeone shares his knowledge and thoughts on the subject.

There have been two notable forecast models involved in the 2020 election race; the fivethirtyeight 2020 election forecast model lead by Nate Silver, which had Biden at 89% probability to win and the modelling team lead by Andrew Gelman and Elliot Morris building the US potus model for The Economist, which had Biden to win at 97% probability.

For me having never followed a US election before and not realising all that comes into play with the electoral college votes, the state-by-state differences in practice and the media circus – it didn’t feel like such a certain outcome for Joe Biden as the votes came in.

As there are only two possible outcomes to the US election race, it’s much easier to apply a pass/fail mark to any forecast that predicts the correct or incorrect result on the balance of probabilities. This is shown in such a way that isn’t so obvious for other predictions which we widely rely on, such as the weather or some economy forecasts. For anyone seeking a lesson in interpreting probability and understanding uncertainty in forecasts, Andrew Gelman’s blog is a masterclass in forecast calibration and statistical reasoning from the Bayesian perspective.

It’s interesting to see in both the fivethirtyeight and The Economist models, the post-election analysis by each team now that the majority of the results have come in. For instance, it seems that, as was the case in 2016, the pre-election polling data is a potentially biased, but also can be seen on the ‘messed up polls’Biden’s predicted win and a post-election update from Gelman’s team together with comments on their final election update, Biden’s victory and on exit polls from Silver’s team.

This questioning of results, being made even before the dust has settled on vote counts for all states, is a critical part of true forecasting models that are so often glossed over in industry applications of data science. Where the model is not only tested out-of-sample but assessed from the most fundamental elements:

  • Model specification

  • Data input; and

  • Interpretability by the end-user.

In my experience, the careful review of these three areas are what separates an out-of-the-box, elementary model-to-get-an-answer build and a model built to be built upon, a living forecast model that is built and re-built, torn down and re-fit, with results presented in ways that the user cares about and understands.