What is the goal of SEW Soccer Analytics?
Starting from season 2015-16, the Sports Economics research group of the Swiss Institute of Empirical Economic Research presents the «SEW Soccer Analytics». The main aim of this project is to predict the final table of the ongoing Bundesliga season. The first prediction of the performance of each team is created before the beginning of the season. This forecast is based on all available information about the teams at that point of time. The pre-season forecast serves as a benchmark to identify under- and overperforming teams. During the season, the forecast model is updated with new results and developments after each game. In addition, we also estimate and publish our predictions about the outcomes of every game in the upcoming round.
How does the forecast work?
Our predictions are based on an extensive database that includes results of previous games, value of the squads, schedule of the games, etc. Machine learning methods are used to filter the information with the highest predictive power for outcomes of matches. The model that predicted match outcomes most accurately in the past is then used to form our prediction. The model takes current information into account and forms predictions about the probabilities of a home win, a draw or an away win for all games to be played in the current season. These probabilites are used to form the expected points that each team will earn in each match. For example, the expected points for a team with 50% win, 30% draw and 20% defeat are 0.5*3 + 0.3*1 + 0.2*0 = 1.8. It is of course not possible to get 1.8 points but summing up the expected points of the remaining matches gives us a forecast of the points at the end of the season. Then we can rank the teams according to their expected points to get a prediction of the final table. If some teams are very close, unexpected wins or defeats can change the ranking dramatically. We account for this uncertainty by simulating the rest of the season 50,000 times. To this end, we draw an outcome of each match based on the probabilities that are predicted by the model. If we consider again 50-30-20 percent predicted probabilities for home win, draw and away win, we would observe in 100 simulations 50 home wins, 30 draw and 20 away wins. Depending on which outcomes are drawn in the simulation, we get different ranks for each team. Repeating this 50,000 times, we can see how often each team finishes on a specific rank and can calculate the according probabilities.
This working paper provides further details about the underlying statistical methods.