The elephant in the analytics room: Uncertainty

Posted on July 16th, 2019   

Analytics in football is no longer something very new. The poster child of Expected Goals has crept out of blogs and into usage within clubs and as part of post-game analysis on television. More generally clubs use data in their performance analysis and recruitment activities on a routine basis. For now, the adoption of data is going in only one direction. We at Dectech would certainly like to see this continue so that the usage of objective information becomes the norm when judgements are made at football clubs.

That this trajectory continues requires that clubs continue to see the value of using analytics in their decision making. This in turn means being honest about what the analytics are providing. To that end, in this blog piece we are going to discuss uncertainty.

We do not typically see uncertainty quoted alongside analytics. This is perhaps understandable, but it does hide some potentially crucial information from the decision maker. If for example you tell someone that a player is in the 70th percentile in terms of some attribute and leave it at that, then the assumption the person will make is that you are sure that’s where they are. The reality, depending on what is being measured, could be that they are anywhere from the 50th to the 90th percentile on that attribute. This is important information. Without it a decision maker could put too much weight on your analysis of that attribute. It is quite possible that highly uncertain analytics could lead to worse decisions than no analytics at all. Ultimately, over a long enough time period, this could lead to a distrust of analytics. That’s a bold statement, so we need to do some analysis to get a flavour of the uncertainty that is out there.

 

Detail vs Noise

We are going to focus on a very simple metric here, and we’re going to use that to investigate where uncertainty can come from. However, before doing that, let’s consider the wider landscape.

Let’s imagine we are concerned with understanding a player’s contribution to his team’s efforts. This can have an overall positive or negative impact based on the various models and metrics used to evaluate it. But you can go a bit deeper and refer separately to its defensive and attacking impacts. Or go even deeper and evaluate passing, shooting, tackling, etc. This could go on and on, as all these splits help a manager identify the strengths and weaknesses of his players in detail. But there is a pitfall to overdoing this, the uncertainty.

You don’t need to be a statistician or data analyst to understand that the more information you gather on something, the more confident you can be about the results of any analysis applied. It may be desirable to be able to rank a player in as many categories as possible, but the observed data might be limited, so that some rarely observed categories need to be grouped together in order to contain enough data to construct meaningful metrics. In other words, there is a trade-off between the precision of the analytics and their interpretability.

 

Passing: A Case Study

We’ll take passing as our case study, which is the most common action taken in a football match. Not all passes are of the same difficulty or importance. Their success rate and value can vary a lot depending on the situation. In order to demonstrate our point, we look at a single season of Premier League matches, focusing on midfielders because they typically attempt the widest range of pass types. We require that a player has attempted at least 500 passes to be included in the analysis.

Our metric is a simple one: the pass success rate. The experiment idea is also simple. We start by calculating an overall pass success rate for each player. This results in a certain ranking (which we convert to a percentile ranking) of the players. Then we start decomposing our metric by partitioning our data according to various pass types: Passes Within vs Outside the Final Third of the pitch, Long vs Short, and finally Open Play vs Set Piece. Each different data partition naturally leads to a different ranking of the players. For example, while initially we have a single metric (overall pass success rate), after partitioning the data into Within and Outside the Final Third, we have Final Third pass success rate and Outside Final Third pass success rate. After the introduction of the next partitioning layer (Long vs Short) we have four categories: Final Third-Long, Final Third-Short, Outside Final Third-Long, and Outside Final Third-Short, etc.

The final step of our experiment is the uncertainty calculation. The method we use to determine uncertainty is called bootstrap and it involves a repeated re-sampling with replacement from the original data set. For every sample, we estimate the pass success rates and calculate the percentile rankings of the players in each category. We can then use the samples we have collected to measure the uncertainty.  We do this by seeing how the percentiles collected for each player vary across the samples.

Example Percentile Distribution

We will call this uncertainty the Percentile Uncertainty Range (PUR), which is based on the standard error of the sample estimates. The higher the PUR, the higher the uncertainty of the percentiles.

 

Results

Given an estimated percentile of a player (say 80th) and the associated PUR (say ±10), one can expect his actual percentile to fluctuate within this range (i.e. to be between 70th and 90th percentile) with about 95% probability.

The following table displays the overall PUR for the various categories of the decomposed metric as described in the previous section.

According to the above dendrogram, the minimum percentile rank variability is achieved when there is no decomposition at all. With each added decomposition layer, the PUR increases, and this pattern is seen consistently.

In the “No Decomposition” case, the average PUR is +/-10, that is, if a player falls, e.g. in the 70th percentile, we can expect him with a high certainty to be between 60th and 80th percentiles. Already with the first split, we increase the average percentile range from 10 to 14 (In Final Third) and 18 (Out of Final Third). In some cases, the difference is extreme. Like when splitting Out of Final Third passes into Long and Short: The PUR jumps from 18 to 22 (Long) and 39 (Short).  A PUR of 39 means the metric is essentially as good as no information at all.

 

Closing Remarks

It is up to the analyst who designs the metrics to make the best compromise between uncertainty and interpretability. We have seen examples within the analytics industry of performance metrics having very fine splits. While it’s understandably tempting to do this, we don’t believe that the metrics are necessarily meaningful at that level. The precision is very important. As the adage says: “Just because you can doesn’t mean you should”. The question you can therefore consider when presented with a metric, or when you drill into your data yourself, is, have I ended up with too little data to draw reliable conclusions? Or to put it another way, am I simply tossing a coin without realising it?

2018/2019 PL manager of the year

Posted on May 15th, 2019   

Assessing managers’ impact on a team’s performance isn’t as straight forward as with players. Unlike on-pitch actions that can be analysed in greater depth, details about the management of a team remain mainly behind closed dressing room doors and the only concrete in-game interventions we get to observe are largely limited to starting line-ups, formations and substitutions. Thus, in order to compare a manager’s influence on his squads’ performance, we need to consider higher level information; that is, budgets and expectations at the beginning of the season.

There are two principal ways of assessing manager performance, and we will provide an overview of these below in the context of benchmarking managers’ performance for the Premier League this season.

 

Team wage bill

A team is expected to perform better if it consists of better players. Good players are expected to be more expensive (though not necessarily linearly), so we can use the team wage bills as a proxy for team strength. Of course, a player’s wage is not only a function of his quality, but on average, that relationship should hold true. Two clubs spending the same on player wages, but one of them performing better is an indication that it is being managed better.

The differences in player wages among PL teams are very large. They range from £26MM for Cardiff to as high as £149MM for Man United (source https://www.spotrac.com/epl/payroll/). The graph below ranks the clubs by player wage bill.

Apart from Tottenham who manage to perform well despite a smaller wage bill, the difference of the big clubs from the rest is very evident. Within the block of high spenders, the Manchester teams seem to have formed their own sub-block of ultra-high spending. On the other end we have teams such as Cardiff, Huddersfield and Brighton whose player wages bills are less than a third of the top teams’ wage bills. Below is a graph showing the relationship between wages and points earned. Since we expect an element of diminishing returns with total team wage bill and league points (an infinite amount of money cannot bring an infinite number of league points), we fit a logarithmic curve to the data to quantify the trend.

Teams above the line are over-performing and teams below the line are under-performing with respect to their wage bills. The deviations from expectancy (residuals) can be more accurately summarised in the plot below (grey bars correspond to clubs that have changed their manager at least once during the season).

Jürgen Klopp is the clear winner here, with his team having obtained 24 points more than expected based on his player’s wages. He is followed by Wolves’ manager Nuno Espírito Santo and Manchester City’s Guardiola. Man United’s Mourinho/Solskjær fall towards the other end of the plot, having earned about 14 fewer points than expected based on squad wages.

 

Outperforming expectations from the start of the season

A team is expected to perform approximately as well as it has been performing in the recent past. If they have been consistently finishing near the top of the table in the most recent seasons, we expect them to match that performance in the current/next season. Any deviation from this self-set standard is an indication of a managerial under- or over-performance.

Based on results before the start of the season and using the well-known Dixon-Coles model for match predictions we can conduct simulations and get an estimate of the number of points teams are expected to earn against their 19 opponents in a season. Comparing these results with the final standings yields an alternate perspective on the teams that performed better or worse than expected. The chart below shows the percentage of excess points the teams managed to earn relative to the start of season expectations. Since there isn’t enough data to benchmark managers that have only been managing for part of the season, we are going to focus on clubs with a single manager throughout the season.

Once again, Klopp and Nuno Espírito Santo take up the spots at the top of the table, but this measure shows that Pellegrini has performed much better than expected. This is in contrast to his apparently poor performance using the player wages approach. So West Ham have indeed improved, but the improvement was not quite in line with their spending. The converse is true for Neil Warnock at Cardiff: Although they over-performed given their finances, they under-performed compared to past seasons, enough to get relegated.

From an analytics perspective, Jürgen Klopp is the manager of the year. He has had a fantastic season by any measure (including the 2 above), made more impressive by the fact that Liverpool’s wage bill is 20% smaller than Man City. To finish the season on 97 points, with just a single loss and 1 point behind Man City is a testament to his great year. He may have not won the Premier League, but he may yet win a Champions League medal to decorate and commemorate his performance this season.

Tags: ,

Champions League Quarter Finals Predictions

Posted on March 15th, 2019   

The Champions League Quarter Finals draw took place this morning. We now know which teams can face each other for all remaining fixtures in the competition. We have updated our Champions League predictions to include the fixtures and progression tree from the draw.

The match predictions for individual fixtures are shown below. The most one-sided match is expected to be Liverpool vs Porto and the most evenly matched fixture is expected to be Tottenham vs Man City.


 

Interestingly, none of the 4 strongest teams are facing each other in the quarter finals, so there’s a reasonable chance that the 4 strongest teams all make it to the Semi-Finals, though the chances of this happening are not particularly high. We expect the chances of all 4 favourites to make it through to be 29%.

Below is the updated progression likelihood for each team. Barcelona are favourites to win the competition, followed by Liverpool, Man City and Juventus.


 

This is first time in 10 years that the Quarter Finals has 4 English clubs in it and the aggregated chance of an English team winning the Champions League this season is just over 50%. The chance of seeing at least 1 English club in the final is 82% and the chance of an all-English final is 22%. The UK may be keen to leave European institutions, but their football clubs are doing their best to remain in this one!

 

 

Increasing disparity in team strength in leagues

Posted on February 12th, 2019   

“The rich get richer and the poor get poorer”. This particular adage by Percy Shelley is most often used to describe the economic inequality caused by free market capitalism. However, it is strangely appropriate to describe what is happening to team performance in leagues in Europe. The strong are getting stronger as the weak are getting weaker.

In the 2000s, an apparent gap in team strength between the top 4 Premier League teams and the rest led people to start using the term the “the top 4” to refer to the teams that people considered consistently and clearly ahead of the rest in terms of performance. Just last summer, we saw Man City win the Premier League with a record breaking 100 points, indicating that there is likely a widening the gap in team quality in the Premier League. This is better illustrated in the animation below, which shows the distribution of points in the Premier League from 1995 onwards (when the format changed to a 20-team competition).


 

 

The existence of this phenomenon shouldn’t come as a complete surprise since good teams that perform well end up getting more money from commercial and sponsorship deals, matchday revenue, etc. as well as build up a history and reputation of success. These then allow them to recruit better players and staff and this positive feedback loop results in widening of the gap in team strength in the league. This isn’t something restricted to the Premier League either. In fact, we expect it to happen to a greater extent in some other leagues since some allow teams to negotiate their own televised matches deals independently (e.g. Spain) rather than having the televised matches deals negotiated centrally and the revenues distributed evenly across teams as occurs in the Premier League

We wanted to measure this effect in a more rigorous manner to quantify the change in team strength inequality, and to see how it compares across leagues. To do this, we can look at the distribution of points in the league table over time. More specifically, we are looking for inequality in the points distribution. Fortunately, there’s a well-established measure used in economics called the GINI coefficient that helps us to do this. It is commonly used to measure income inequality or wealth inequality, but we can use the underlying concepts behind the GINI coefficient to measure points inequality in leagues and track changes over time.

The way the GINI inequality measure works is… for any given season, starting with the weakest team (bottom of the league table), we measure the share of league points that the team accounts for and gradually include more teams until we consider all 20 teams in the league (at which point these will account for a 100% share of the points accumulated in the league). When these values are plotted, it creates a curve whose gradient characteristics encode the distribution of points in the league. In the case of a perfectly equal league where all teams get the same number of points, this graph will look like the line of equality (perfect triangle). Any deviations from this are a sign of inequality in the league, and the extent of the equality (or inequality) can be measured by the area under the curve relative to the hypothetical perfect equality case. The distribution of points in the Premier League and the corresponding GINI curve are shown below for reference.


The results below show how the league inequality has changed for the top 5 leagues in Europe since each of them changed format to their current league structure (18 teams for Bundesliga and 20 teams for the rest).






On average, across the past 10 years, Ligue 1 has been the most evenly matched league from the top 5 in Europe, followed by the Bundesliga. La Liga, Premier League and Serie A all have greater disparities in team strength in the league and are roughly similar to each other in that respect. A similar pattern is observed even if we just look at the inequality in league points last season. The Bundesliga had the most even distribution of league points, followed by Ligue 1, then La Liga, Premier League and Serie A.

Key results are summarised in the table below:

League 2017-18 season inequality (level of disparity in points between teams) Average inequality over last 10 seasons Inequality trend (change in inequality coefficient per year)
Premier League 0.14 0.13 +0.002
Serie A 0.16 0.13 +0.006
La Liga 0.13 0.13 +0.004
Bundesliga 0.10 0.11 +0.002
Ligue 1 0.13 0.10 +0.002

 

Crucially, all 5 leagues exhibit a trend of increasing points inequality over time, and it is statistically significant increase in all 5 cases (p<0.01). The inequality in team strength is increasing most rapidly in the Serie A, followed by La Liga, while the Premier League, Bundesliga and Ligue 1 appear to be the comparatively protected from the team strength positive feedback phenomenon.

This may be a slight problem for the future of football, since the sport may lose some appeal if the league positions become increasingly predictable over time. We wouldn’t be surprised if more rules were put in place to limit team spending in the near future to try reel this problem in, particularly in Italy.

We should enjoy this season’s competitive title race between Man City and Liverpool while it lasts. Leagues are likely to get increasingly less competitive over time.

 

Relegated teams squad renewal and performance

Posted on December 3rd, 2018   

What happens to teams when they get relegated? How much do their squads end up changing as a result of relegation, and how does it impact on-pitch performance?

We have performed some analysis on the last five completed PL and Championship seasons (i.e. 13/14 to 17/18) to try to answer these questions.

We first investigate the minutes played by players that start in a team’s squad, and look at how those minutes change in the following seasons – but only minutes at the same team are counted. We do this to get at how changes in squads practically affect the actual players fielded.  Specifically we measure the overlap in minutes played across the team from the starting season.  If a team uses the exact same mixture of players in its matches we would observe a 100% similarity/overlap, whereas if a team has replaced all its players we would see a 0% similarity/overlap. The chart below shows the monthly evolution of these overlaps, averaged for teams that remained in the PL, teams that were promoted to the PL, teams that were relegated to the Championship and teams that were already in the second tier league.

We see that relegated teams, more than the other categories, tend to use on average only about 40% of the previous season’s squad in the new season. Teams participating in the PL tend to have a more similar set of players, starting at around 60% and declining with time.  This is not a surprising observation. Relegated teams face a financial challenge, with considerably reduced revenues despite parachute payments.  At the same time, many players will not want to compete in the Championship after being in the PL. What is interesting here is the difference between the PL and Championship for teams that were already there.  Teams in the PL have more stability in their fielded players.

The graph above shows us that PL teams lose more players when they get relegated. But are the players they lose key first team players or mainly reserve players? To find out, we can look at the minutes played (as a proportion of possible minutes played) over the course of the season by players who eventually leave at the end of the season. To get a good understanding of what’s happening, we can do the same for players that end up leaving the subsequent season too.

 

This graph reveals that not only do teams that get relegated from the PL lose more players than if they stayed in the PL or teams already in the Championship, but it also shows that the players that leave tend to be key first team players that played a fair amount of the season. These changes come as a result of an average 10 new players for the relegated teams, versus 11 for teams already in the Championship.  For both sets of teams, the percentage of new players coming from a team that participated in the previous season’s Championship is similar, at around 25%.

We now turn to review the net spending of relegated teams, compared to teams already in the Championship. On average the relegated teams have bought around £23M worth of players in the season after relegation compared to only £7M by teams already in the second tier league. On being relegated, not surprisingly, relegated teams sold on average about £29M worth of players, which makes a negative average net spending of -£6M. Regarding net spending of the existing Championship teams, the average was around zero, i.e. they were spending as much buying as they were getting from selling players. This highlights the degree of the financial challenge relegated PL teams face.

Turning now to performance: Relegated teams have a 1 in 3 chance of being promoted straight back into the PL. According to our league table simulations this proportion was expected to be 27%, so the newly relegated teams seem to slightly over-perform in that sense. If we instead compare their finishing position in that first season to what we’d expect of them, then they actually under-perform: the average relegated team’s expected position in their first Championship season was 6th, the observed was only 9th. This is an interesting apparent contradiction.  It suggests that while many teams do well, others do very badly – thereby dragging the average league position down.

In terms of team strength, according to our team strength model, the overall strength of newly relegated teams tends to decline after a season by an average of 4.3%, compared to the teams’ strengths at the time of relegation.  This is a relatively modest change against the backdrop of such large changes in playing staff.  It should be noted however that for a team to be relegated in the first place their strength cannot have been especially impressive to start with.

In general we think the main message of the above analysis is that there seems to be an indisputable decline in strength after relegation to the Championship for a typical team. But at the same time there is a larger than expected number of teams that make it straight back to the PL and the actual team strength decline does not appear to be especially severe.  For the team itself the question of interest is whether they will be one of the third that makes it straight back and, if not, will they be one of the teams that drags down the average?

We attempted to answer this question by looking for correlations between the relegated teams and their subsequent performance.  We found that teams with higher spends managed to temper the team strength decline the most.  So, money does help.  We also found that teams that are able to keep more of their playing staff tend to finish in higher positions.  These findings are of course related.  The message is that keeping your squad as close to intact as possible is what the evidence suggests is the best move.

 

Champions League forecasts

Posted on September 17th, 2018   

The group stages of the Champions League is set to kick-off tomorrow.

A summary of our Champions League forecasts is shown below. The team most likely to win the competition is Barcelona (20.3%), with Bayern Munich (18.4%) and Real Madrid (16.1%) being the next most likely teams to win the competition.

Looking at the progression chances split by each group shows that all groups have at least one clear progression favourite, with >80% chance of progression to the round of 16. The fight for the second progression spot is toughest in Group E, where Benfica, AEK and Ajax are closely matched. Group F is also relatively competitive. Porto got very lucky with the group stage draw since they’re very likely to progress to the round of 16 (80.9% chance) despite only being in the bottom half of teams in terms of team strength. Napoli, on the other hand, got the short end of the stick since they’re the 15th best team in the competition but only have a 29.8% chance of going through to the round of 16.

Finally, forecasts for individual matches can be found below. Real Madrid vs Victoria Plzen is the most one-sided fixture in the group stage. From the big teams in the tournament, the Liverpool vs PSG fixtures are the most evenly matched.

It will be interesting to see how far Man City and Liverpool can carry the banner of English football on the European stage.

The world cup as seen through the lens of Twitter

Posted on July 16th, 2018   

The world cup is over. It’s been a thrilling month of exciting matches, great goals and surprises. Events like the world cup get discussed and mentioned a lot on Twitter. This allows us to determine which events sparked the most discussion, as well as establish which teams got the most positive response from the Twitterverse.

Let’s start by taking a look at the top moments of the World Cup. We measure this by looking at the moments that led to the most discussion on Twitter, so many will be heavily context dependent rather than noteworthy standalone moments. The top moments as far as the Twitter community was concerned were England getting knocked out of the World Cup, Cristiano Ronaldo’s hat trick against Spain and England finally winning a match on Penalties. Other key moments are shown below.

We can also use sentiment analysis to find out which teams impressed people the most and which teams people were less pleased with.

France, Belgium and England lead the way for teams that people responded the most positively to. Croatia were 7th in comparison. The teams that people responded most negatively to were Saudi Arabia, Colombia, Argentina, Egypt, Poland and Germany. Spain and Portugal also received relatively poor responses overall.

There’s not much left to say on the World Cup, except congratulations to France! Not only did they win the world cup, but they did so in a manner that impressed the Twitterverse!

World Cup Final and third place play-off forecasts

Posted on July 12th, 2018   

The World Cup semi-finals are over. France and Croatia have progressed to the finals of this year’s World Cup.

Below are our predictions for the World Cup final and third place play-offs:

 

 

 

A World Cup of late goals, penalties, own goals and unexpected results?

Posted on July 11th, 2018   

The 2018 World Cup is drawing to a close. It’s been a very interesting and dramatic tournament to follow.

There has been a lot of discussion on how this World Cup stands out in terms of the nature of the goals scored. In particular, there is a lot of discussion around the increased number of penalty goals, own goals and late goals in the game. This begs the question: Are there really more of these types of goals and how do the own goal, penalty and late goal rate compared to the Premier League?

The graph below shows the distribution of goal times in a game. This World Cup, 14.3% of all goals scored were scored after the 90th minute! To provide some context, in the Premier League, only 5.1% of all goals are scored after the 90th minute.

The table below compares the goal rate this World Cup as well as the number and proportion of penalties, own goals and late goals.

 

The overall number of goals per match this World Cup is slightly lower than the 2014 World Cup, and around 10% lower that the average number of goals in a Premier League match. The distribution of types of goals is quite different to the Premier League, however. 16% of all goals this World Cup have been penalties (compared to the 6.8% observed in the Premier League), 20.5% of all goals have been own goals (compared to the 3.5% seen in the Premier League) and as stated earlier, there are 2.8 times as many goals scored after the 90th minute as we observe in the Premier League.

Several people have also commented on the number of apparent surprising results. For example, Germany getting knocked out of the Group Stage, Russia beating Spain, Belgium beating Brazil, etc. Has this World Cup really had more surprising results than previous ones? The graph below compares how unpredictable the results of the competition have been.

The number of unexpected results in this World Cup has been in line with the levels observed in previous World Cups, but more surprising than the Premier League. The World Cup with the most surprising results in the recent past is 2010 and the World Cup with the least surprising results has been in 2006.

It looks like this really is a World Cup of late goals, own goals and penalties, but the number of own goals is particularly remarkable. It wouldn’t surprise us if England – Croatia match tonight ends up being won by a 90th+ minute own goal or penalty!

Should England lose to Belgium?

Posted on June 28th, 2018   

England are going into their third and final match against Belgium this evening as favourites to with the match (38% chance of winning), but should they win it?

 

There has been much discussion in the past couple of days on the potential benefits of England finishing second in group G to have an easier run of fixtures in the knock-out stages of the competition.

If England finish second in the group, they have a 7.7% chance of winning the World Cup compared to only a 6.2% chance if they finish first (the calculations were run before the last set of group H fixtures were played, so the teams that finished first and second in group H are not known at the time of writing). This is largely due to the likely event of having to face Brazil in the Quarter finals if they progress to the round of 16 as top of group G rather than Sweden or Switzerland if they finish second in the group.

The full list of likely opponents for each stage of the competition can be found below.

Regardless of the result tonight, England fans can take comfort in knowing that they are guaranteed to go through to the round of 16. It will be interesting to see if they choose to do that by giving it all they’ve got or prefer to game the system and maximise their chances of winning the world cup.

Archives:

2019

2018

2013

2012

2011