How to predict the premiership
With the football season nearly upon us we thought it’d be a good idea to have a go at predicting what we could see over the coming season. There’s a whole host of potential outcomes; last minute goals, penalties and red cards to name a few. These things all happen every season with apparent randomness, so how can we predict anything? It is very difficult to predict individual events and know when they may happen, but by using information from past seasons, we can have a go at nailing down some outcomes.
Goals win games, so first off, lets have a go at guessing how many goals will be scored in the whole season. The best way to do this is to look at the past. Figure 1 shows how many goals have been scored in total for the past five seasons, or in Mathematical terms ‘the sum of all goals scored in every Premier League game, for the past five seasons’.
As you can see a pattern begins to emerge. The number of goals in a season has stayed fairly consistent over the past five , which means we can predict the number of goals next season fairly accurately, using the ‘mean’ of these totals. The ‘mean’ is what is often called the ‘average’, in fact it is a type of average, others being the ‘median’ and the ‘mode’. To find the mean, we take the sum of all goals scored over the last five seasons, and then divide that number by the total number of seasons - in this case five. (If we used six years worth of data we would divide by six, if we used ten we would divide by ten, and so on.) This gives us a value of 1059.4 – which is our estimate for how many goals we expect to see in the coming season.
Obviously it's impossible to score 0.4 goals, so we round our estimate. To round to an ‘integer’, or whole number, we take the number after the decimal point and see whether it is closer to 0 or to 10. If it is closer to 0, we ‘round down’ to 1059, and if it is closer to 10 we ‘round up’ to 1060. If the number after the decimal point is 5 it is exactly half way, so the general rule is to ‘round up’. As our number is 4, we ‘round down’ and settle our prediction, we predict that there will be 1059 goals in the premier league next season.
The total goals in a season is interesting to work out, but it doesn’t tell us much about individual games, so it'd be even better if we could predict how many goals we should expect to see in each game.
We can do this by finding the ‘mean’ number of goals per game. We know that we expect 1059.4 goals in total (it is best practice use the un-rounded version, we only round when we get to our final answer, otherwise your calculation can end up way off the most accurate estimate). So, to find the ‘mean’ number per game, we divide this total by the number of games, which is 380. This gives us a value of 2.7878… which we can round to 2.79 expected goals per game. As this number is quite small, it will mean that rounding it could affect the result by a large amount. For this reason, we won't round it - what we can tell is that we should expect to see the majority of games have 3 goals a game in them, but expect some with less.
Some will of course have more, but these should be less frequent. The mean number of goals per game can give us a bit of information about what to expect, ‘some games with 3 goals, some with less, some with more’ but it is all very vague, so we need to find a way to be more accurate to get any useful information. For this, we can use Poisson distribution. The next section is a little more complicated but keep going and could end up with some pretty good premiership predictions!
The Poisson distribution is useful because many random events follow it.
If a random event has a mean number of occurrences in a given time period, then the number of occurrences within that time period will follow a Poisson distribution. For example, the occurrence of earthquakes could be considered to be a random event. If there are 5 major earthquakes each year, then the number of earthquakes in any given year will have a Poisson distribution with ‘mean’ 5 . So in our case, the number of goals in any given Premier League game will have Poisson distribution with ‘mean’ 2.79. The number of games with more than 7 goals a season are very rare so we will just predict the number of games that will take place next season involving 0-7 goals. The way we calculate these totals is by using the Poisson distribution definition as seen below in figure 2.
Where P(x) is our prediction as a fraction of the total, x is the number of goals in the game (0-7, e.g. if P(3)=0.5 we expect half the games to have 3 goals), and lambda is our mean (2.79, constant). We will need to compute this 8 times, for 0,1,2 all the way up to 7. This gives us the following results in figure 3:
So from the table we see that we can expect 23 games next season with no goals at all, 65 games with one goal, continuing up to 6 games with seven goals. To get a sense of whether this is at all realistic when put into practice we can compare these estimates with what happened last season. The image to the left shows the number of goals scored in a game (x-axis) and the frequency (y-axis) with which these occurred both last season, and with our Poisson model.
As you can see, although not perfect, the Poisson estimation provides a good idea of what we can expect next season. After all, if we knew everything that was going to happen, it wouldn’t be nearly as exciting.