Using quantitative methods to examine effects of small sample

What do you do when we are still at the beginning of the season and there is not enough data to draw reliable conclusions? In this article Dominic explains how bootstrapping can be used to minimize the effect of any parametric error due to small samples by giving two examples.

865434535

A key method used to develop match expectancies for football is the Poisson distribution as explained in a previous Pinnacle article. Findout more about how to predict a soccer betting winner using the Poisson distribution.

This basically assigns a scoring expected average for a home team depending on its attack and the away teams defence capabilities.  It also applies a scoring expected average on the away team.

At the start of the season, however, we would have a problem using this method as there are not enough games to sample. In addition to that, if we experience one extreme match, say a high scoring match or a series of goalless matches, this will greatly affect our estimation.

We would indeed be having a high parametric error. More information on how to build a sports betting model.

A suggestion to measure the amount of parametric error is to use bootstrapping techniques. Bootstrapping relates to a solution in which we invent the sample sizes.

At the time of writing most premiership teams have played less than 5 home and 5 away matches each.

As an example, I can recommend two methods.

Method 1: The straightforward approach

This method involves sampling with replacement, i.e. creating similar sample sizes by being able to pick the same value more than once.

So taking Leicester City’s home matches, to date they scored 3, 2, 2 and 1 versus Aston Villa, West Ham, Arsenal and Crystal Palace respectively. This sample has a mean of 2 home goals per match.

Now let’s produce another random sample of four goals using these values. This method is similar to creating random values from the Monte Carlo simulation. Extra sets of samples could therefore be:

  •       Sample 1: 2,2,2,1
  •       Sample 2: 1,1,3,2
  •       Sample 3: 3,3,2,2
  •       Sample 4: 1,2,1,1

Note that two goals should have twice the likelihood of being drawn than one or three goals at every draw and that we might have a different mean average in each case; it is not always two.

In this case the average per sample are 1.75, 1.75, 2.5 and 1.25 respectively. We think that the average is 2, but our values show that it can range from 1.25 to 2.5.

We can also extend this by calculating a significant number of different bootstrapped samples and see the standard deviation of results.

Method 2: Let’s get crazy

So for Leicester’s matches we could have generated an ‘expected score’. This can be generated in the same way as the Poisson method but using previous season’s data.

Let’s go through the match versus Aston Villa for example. The average goals scored at home during 2014/15  in premiership was 1.474. Leicester scored 28 goals in 19 home matches while Aston villa conceded only 32 in 19 away matches.

This results in Leicester’s ‘Attack Strength’ being 1, meaning they were just like the typical team at home. Aston Villa, on the other hand, conceded an average of 1.684.

If we divide this by 1.474 , we get 114.29%, meaning that Aston Villa conceded 14% more goals than usual when playing away. Therefore, Leicester would be expected to score an average of  1*1.1429*1.474 = 1.684 goals at home to Aston Villa.

By repeating the same process for all their matches the expected number of goals scored per match is given in the table below. Here we see that Leicester have been over performing by scoring more than expected with the exception of playing against Crystal Palace.

These are shown in the row named Difference, for which the technical term is residual.

Team Aston Villa West Ham Arsenal Crystal Palace
Expected goals 1.684 1.526 1.158 1.263
Actual goals 3 2 2 1
Difference 1.316 0.474 0.842 -0.263

In a similar fashion to method 1, we now have sample with replacement of some residuals. Hence some possible sampled residuals are:

  •       Sample 1: 1.316, 1.316, 0.474, 0.474
  •       Sample 2: 0.474, -0.263, -0.263, 0.474

We now add these sample residuals to expected scores to get other samples of home goals scores:

  •       Sample 1: 3.000, 2.842, 1.632, 1.737
  •       Sample 2: 2.158, 1.263, 0.895, 1.737

Each sample shall have its own average and we can use this to compute the average number of goals scored by the home team for different parameters.

Conclusion

This is not exactly a back-of-the-envelope calculation but there is no need for extensive programming knowledge. Fire up your spreadsheet and you can test the range of possible parameters. Keep in mind, though, that you will also need to analyse the residuals on the expected number of goals scored by the away team, should you use the second method described above.

Source: pinnacle.com