Recruitment by numbers: the tale of Adam and Bobby

One of the charges against analytics is that it hasn’t really demonstrated its utility, particularly in relation to recruitment. This is an argument I have some sympathy with. Having followed football analytics for over three years, I’m well-versed in the metrics that could aid decision making in football but I can appreciate that the body of work isn’t readily accessible without investing a lot of time.

Furthermore, clubs are understandably reticent about sharing the methods and processes that they follow, so successes and failures attributable to analytics are difficult to unpick from the outside.

Rather than add to the pile of analytics in football think-pieces that have sprung up recently, I thought I would try and work through how analysing and interpreting data might work in practice from the point of view of recruitment. Show, rather than tell.

While I haven’t directly worked with football clubs, I have spoken with several people who do use numbers to aid recruitment decisions within them, so I have some idea of how the process works. Data analysis is a huge part of my job as a research scientist, so I have a pretty good understanding of the utility and limits of data (my office doesn’t have air-conditioning though and I rarely use spreadsheets).

As a broad rule of thumb, public analytics (and possibly work done in private also) is generally ‘better’ at assessing attacking players, with central defenders and goalkeepers being a particular blind-spot currently. With that in mind, I’m going to focus on two attacking midfielders that Liverpool signed over the past two summers, Adam Lallana and Roberto Firmino.

The following is how I might employ some analytical tools to aid recruitment.

Initial analysis

To start with I’m going to take a broad look at their skill sets and playing style using the tools that I developed for my OptaPro Forum presentation, which can be watched here. The method uses a variety of metrics to identify different player types, which can give a quick overview of playing style and skill set. The midfielder groups isolated by the analysis are shown below.


Midfield sub-groups identified using the playing style tool. Each coloured circle corresponds to an individual player. Data via Opta.

I think this is a useful starting point for data analysis as it can give a quick snap shot of a player and can also be used for filtering transfer requirements. The utility of such a tool is likely dependent on how well scouted a particular league is by an individual club.

A manager, sporting director or scout could feed into the use of such a tool by providing their requirements for a new signing, which an analyst could then use to provide a short-list of different players. I know that this is one way numbers are used within clubs as the number of leagues and matches that they take an interest in outstrips the number of ‘traditional’ scouts that they employ.

As far as our examples are concerned, Lallana profiles as an attacking midfielder (no great shock) and Firmino belongs in the ‘direct’ attackers class as a result of his dribbling and shooting style (again no great shock). Broadly speaking, both players would be seen as attacking midfielders but the analysis is picking up their differing styles which are evident from watching them play.

Comparing statistical profiles

Going one step further, fairer comparisons between players can be made based upon their identified style e.g. marking down a creative midfielders for taking a low number of shots compared to a direct attacker would be unfair, given their respective roles and playing style.

Below I’ve compared their statistical output during the 2013/14 season, which is the season before Lallana signed for Liverpool and I’m going to make the possibly incorrect assumption that Firmino was someone that Liverpool were interested in that summer also. Some of the numbers (shots, chances created, throughballs, dribbles, tackles and interceptions) were included in the initial player style analysis above, while others (pass completion percentage and assists) are included as some additional context and information.

The aim here is to give an idea of the strengths, weaknesses and playing style of each player based on ranking a player against their peers. Whether a player ranks low or high on a particular metric is a ‘good’ thing or not is dependent on the statistic e.g. taking shots from outside the box isn’t necessarily a bad thing to do but you might not want to be top of the list (Andros Townsend in case you hadn’t guessed). Many will also depend on the tactical system of their team and their role within it.

The plots below are to varying degrees inspired by Ted Knutson, Steve Fenn and Florence Nightingale (Steve wrote about his ‘gauge’ graph here). There are more details on these figures at the bottom of the post*.


Data via Opta.

Lallana profiles as a player who is good/average at several things, with chances created seemingly being his stand-out skill here (note this is from open-play only). Firmino on the other hand is strong and even elite at several of these measures. Importantly, these are metrics that have been identified as important for attacking midfielders and they can also be linked to winning football matches.


Data via Opta.

Based on these initial findings, Firmino looks like an excellent addition, while Lallana is quite underwhelming. Clearly this analysis doesn’t capture many things that are better suited to video and live scouting e.g. their defensive work off the ball, how they strike a ball, their first touch etc.

At this stage of the analysis, we’ve got a reasonable idea of their playing style and how they compare to their peers. However, we’re currently lacking further context for some of these measures, so it would be prudent to examine them further using some other techniques.

Diving deeper

So far, I’ve only considered one analytical method to evaluate these players. An important thing to remember is that all methods will have their flaws and biases, so it would be wise to consider some alternatives.

For example, I’m not massively keen on ‘chances created’ as a statistic, as I can imagine multiple ways that it could be misleading. Maybe it would be a good idea then to look at some numbers that provide more context and depth to ‘creativity’, especially as this should be a primary skill of an attacking midfielder for Liverpool.

Over the past year or so, I’ve been looking at various ways of measuring the contribution and quality of player involvement in attacking situations. The most basic of these looks at the ability of a player to find his team mates in ‘dangerous’ areas, which broadly equates to the central region of the penalty area and just outside it.

Without wishing to go into too much detail, Lallana is pretty average for an attacking midfielder on these metrics, while Firmino was one of the top players in the Bundesliga.

I’m wary of writing Lallana off here as these measures focus on ‘direct’ contributions and maybe his game is about facilitating his team mates. Perhaps he is the player who makes the pass before the assist. I can look at this also using data by looking at the attacks he is involved in. Lallana doesn’t rise up the standings here either, again the quality and level of his contribution is basically average. Unfortunately, I’ve not worked up these figures for the Bundesliga, so I can’t comment on how Firmino shapes up here (I suspect he would rate highly here also).


Based on the methods outlined above, I would have been strongly in favour of signing Firmino as he mixes high quality creative skills with a goal threat. Obviously it is early days for Firmino at Liverpool (a grand total of 239 minutes in the league so far), so assessing whether the signing has been successful or not would be premature.

Lallana’s statistical profile is rather average, so factoring in his age and price tag, it would have seemed a stretch to consider him a worthwhile signing based on his 2013/14 season. Intriguingly, when comparing Lallana’s metrics from Southampton and those at Liverpool, there is relatively little difference between them; Liverpool seemingly got the player they purchased when examining his statistical output based on these measures.

These are my honest recommendations regarding these players based on these analytical methods that I’ve developed. Ideally I would have published something along these lines in the summer of 2014 but you’ll just have to take my word that I wasn’t keen on Lallana based on a prototype version of the comparison tool that I outlined above and nothing that I have worked on since has changed that view. Similarly, Firmino stood out as an exciting player who Liverpool could reasonably obtain.

There are many ways I would like to improve and validate these techniques and they might bear little relation to the tools used by clubs. Methods can always be developed, improved and even scraped!

Hopefully the above has given some insight into how analytics could be a part of the recruitment process.


If analytics is to play an increasing role in football, then it will need to build up sufficient cachet to justify its implementation. That is a perfectly normal sequence for new methods as they have to ‘prove’ themselves before seeing more widespread use. Analytics shouldn’t be framed as a magic bullet that will dramatically improve recruitment but if it is used well, then it could potentially help to minimise mistakes.

Nothing that I’ve outlined above is designed to supplant or reduce the role of traditional scouting methods. The idea is just to provide an additional and complementary perspective to aid decision making. I suspect that more often than not, analytical methods will come to similar conclusions regarding the relative merits of a player, which is fine as that can provide greater confidence in your decision making. If methods disagree, then they can be examined accordingly as a part of the process.

Evaluating players is not easy, whatever the method, so being able to weigh several assessments that all have their own strengths, flaws, biases and weaknesses seems prudent to me. The goal of analytics isn’t to create some perfect and objective representation of football; it is just another piece of the puzzle.

truth … is much too complicated to allow anything but approximations – John von Neumann

*I’ve done this by calculating percentile figures to give an indication of how a player compares with their peers. Values closer to 100 indicate that a player ranks highly in a particular statistic, while values closer to zero indicate they attempt or complete few of these actions compared to their peers. In these examples, Lallana and Firmino are compared with other players in the attacking midfielder, direct attacker and through-ball merchant groups. The white curved lines are spaced every ten percentiles to give a visual indication of how the player compares, with the solid shading in each segment corresponding to their percentile rank.

OptaPro Analytics Forum 2016 accepting abstract proposals

OptaPro are inviting proposals to present at their Analytics Forum, which according to their announcement:

aims to connect football clubs with analytical communities and experts working outside of the professional game

This will be the third year that the forum has taken place and an impressive number of clubs and other football organisations are represented at the forum, along with plenty of laptop gurus with no relevant playing experience.

I was lucky/skillful enough to have my proposal accepted last year, so I thought it might be useful if I posted my abstract as an example. I’m told that the judges liked it as it was tailored to the audience i.e. club analysts.

When I wrote it, my aim was to define a clear and (hopefully) relevant question and give some idea of how feasible it was and how it could be used. I posted the slides and video of my presentation here if you want to check it out.

If you’re thinking of submitting, then I would highly recommend it. The forum is a great way to meet others working in football analytics and as a member of the online analytics community, it was great to properly meet people I had ‘known’ via Twitter. Presenting was a valuable experience also and led to interesting discussions with people during and after the event.

The closing date for submissions is midnight Sunday 18th October. My abstract is below and good luck with your submissions.

Finding square pegs for square holes: identifying player types for scouting

Proposed area of study: player evaluation

Proposed method: Principal component analysis and cluster analysis of on-ball player data

One consideration when scouting potential player signings is how well they will fit into their new team environment. A common criticism of a perceived failed player transfer is that the player was a “square peg for a round hole”. This study will aim to identify certain player types based on their statistical output to aid finding the “right fit” when scouting players.

I propose using Principal Component Analysis (PCA) to distinguish players based on their underlying performance data (specifically Opta’s on-ball data). PCA is an ideal method for exploring datasets with multiple variables in order to discern patterns in the underlying data. This study builds on my previous analysis that used a similar method to study playing styles at the team level1. I will further extend this by applying cluster analysis to the data to group the players into certain types based on their attributes.

I have already explored the feasibility of this method using publically available Opta data from and the results are promising. In order to extend the analysis for the forum, I would look to apply the method to more granular data, with a focus on player actions in open-play; the current dataset I have used groups all on-field actions together, which is not ideal. Furthermore, inclusion of location data would provide additional context for the analysis and aid differentiation of players and styles.

The persistence of player traits and classification will be assessed. Providing the dataset is large enough, it should be possible to test this persistence for players staying at the same team and for those who transfer to a new one. This will be a crucial aspect of the analysis and its utility.

The output from the analysis can serve as an additional tool when identifying potential transfer signings by categorising players according to their team role and providing statistical baselines for their performance compared to their peers. For example, the method separates different styles of central midfielders, such as deep-lying playmakers and defensive midfield “destroyers”. Players can then be compared against their peers in that style category based on the important traits of those player types.

By applying these techniques, this study will aim is to provide a more robust “apples-to-apples” comparison technique and find the appropriate square peg for the square hole in question.

1Relevant blog posts available here:

Networking for success

In my previous post, I described my possession danger rating model, which classifies attacks according to their proximity to goal and their relative occurrence compared to other areas of the pitch. Each possession sequence in open-play is assigned a value depending on where it ends. The figure below outlines the model, with possession sequences ending closer to goal given more credit than those that break down further away.

Map of the pass weighting model based on data from the English Premier League. Data via Opta.

Map of the pass weighting model based on data from the English Premier League. Data via Opta.

Instead of just looking at this metric at the team level, there are numerous ways of breaking it down to the player level.

For each possession, a player could be involved in numerous ways e.g. winning the ball back via a tackle, a successful pass or cross, a dribble past an opponent or a shot at goal. Players that are involved in more dangerous possessions may be more valuable, particularly when we compare them to their peers. When viewing teams, we may identify weak links who reduce the effectiveness of an attack. Conversely, we can pick out the stars in a team or indeed the league.


One popular method of analysing the influence of players on a team is network analysis. This is something I’ve used in the past to examine how a team plays and who the crucial members of a team are. It looks at who a player passes the ball to and who they receive passes from, with players with many links to their teammates usually rated more highly. For example, a midfield playmaker who provides the link between a defence and attack will often score more highly than a centre back who mainly receives passes from their goalkeeper and then plays a simple pass to their central defensive partner.

In order to assess the influence of players on attacking possessions, I’ve combined the possession danger rating model with network analysis. This adjusts the network analysis to give more credit to players involved in more dangerous attacks, while also allowing us to identify the most influential members of a team.

Below is an example network for Liverpool last season during a 10 match period where they mainly played in a 3-4-3 formation. The most used eleven players during this period are shown according to their average position, with links between each player coloured according to how dangerous the possessions these links contributed to were.

Possession network for Liverpool for the ten matches from Swansea City (home) to Burnley (home) during the 2014/15 season. Lines are coloured according to the relative danger rating per each possession between each player. Player markers are sized by their adjusted closeness centrality score.

Possession network for Liverpool for the ten matches from Swansea City (home) to Burnley (home) during the 2014/15 season. Lines are coloured according to the relative danger rating per each possession between each player. Player markers are sized by their adjusted closeness centrality score (see below). Data via Opta.

Philippe Coutinho (10) was often a crucial cog in the network as he linked up with many of his team mates and the possessions he was involved with were often dangerous. His links with Sakho (17) and Moreno (18) appears to have been a fruitful avenue for attacks – this is an area we could examine in more detail via both data and video analysis if we were scouting Liverpool’s play. Over the whole season, Coutinho was easily the most crucial link in the team, which will come as no surprise to anyone who watched Liverpool last season.

Making the play

We can go further than players on a single team and compare across the entire league last season. To do this, I’ve calculated each players ‘closeness centrality‘ score or player influence score but scaled it according to how dangerous the possessions they were involved in were over the season. The rating is predominantly determined by how many possessions they are involved in, how well they link with team mates and the danger rating of the possessions they contribute to.

Yaya Touré leads the league by some distance due to him essentially being the crucial cog in the best attack in the league last season. Many of the players on the list aren’t too surprising, with a collection of Arsenal and Manchester City players high on the list plus the likes of Coutinho and Hazard also featuring.

The ability to effectively dictate play and provide a link for your team mates is likely desirable but the level of involvement a player has may be strongly governed by team tactics and their position on the field. One way around this is to control for the number of possessions a player is involved in to separate this out from the rating; Devin Pleuler made a similar adjustment in this Central Winger post.

Below are the top twenty players from last season according to this adjusted rating, which I’m going to refer to as an ‘influence rating’.

Top twenty players (minimum 1800 minutes) per the adjusted influence rating for the 2014/15 Premier League season. The number of completed passes each player made per 90 minutes is shown on the left. Data via Opta.

When accounting for their level of involvement, Mesut Özil rises to the top, narrowly ahead of Santi Cazorla and Yaya Touré. While players such as these don’t lead the league in terms of the most dangerous passes in open-play, they appear to be crucial conduits for their respective attacks. That might entail choosing the best options to facilitate attacks, making space for their team mates or playing a crucial line-breaking pass to open up a defence or all of the above and more.

There are some surprising names on the list, not least the Burnley duo of Danny Ings and George Boyd! Their level of involvement was very low (the lowest of those in the chart above) but when they were involved, Burnley created quite dangerous attacks and they linked well with the rest of the team. Burnley had a reasonably decent attack last season based on their underlying numbers but they massively under-performed when it came to actual goals scored. The question here is would this level of influence be maintained in a different setup and with greater involvement?

Ross Barkley is perhaps another surprising inclusion given his reputation outside of those who depict him as the latest saviour of English football. Looking at his passing chart and links, this possibly points to the model not accounting for crossing often being a less effective method of attack; his passing chart in the final third is biased towards passes to wide areas, which often then results in a cross into the box. Something for version 2.0 to explore. He was Everton’s attacking hub player, which perhaps helps to explain their lack of penetration in attack last season.


The above is just one example of breaking down my dangerous possession metric to the player level. As with all metrics, it could certainly be improved e.g. additional measures of quality of possession could be included and I’m aware that there are likely issues with team effects inflating or deflating certain players. Rating across all players isn’t completely fair, as there is an obvious bias towards attack-minded players, so I will look to break it down across player positions and roles.

Stay tuned for future developments.

Valuing Possession

Regular visitors will know that I’ve been working on some metrics in relation to possession and territory based on the difficulty of completing passes into various areas of the pitch. To recap, passes into dangerous areas are harder to complete, which isn’t particularly revelatory but by building some metrics around this we can assess how well teams move the ball into dangerous areas as well as how well they prevent their opponents from doing so. These metrics can also be broken down to the player level to see which players complete the most ‘dangerous’ passes.

Below is the current iteration of the pass danger rating model based on data from the 2014/15 Premier League season; working the ball into positions closer to the goal is rewarded with a larger rating, while passes made within a teams own half carry very little weight.

Map of the pass weighting model based on data from the English Premier League. Data via Opta.

Map of the pass weighting model based on data from the English Premier League. Data via Opta.

One particular issue with the Territorial-Possession Dominance (TPD) metric that I devised was that as well as having a crap name, the relationship with points and goal difference could have been better. The metric tended to over-rate teams who make a lot of passes in reasonably dangerous areas around the edge of the box but infrequently complete passes into the central zone of the penalty area. On the other side of the coin, it tended to under-rate more direct teams who don’t attack with sustained possession.

In order to account for this, I’ve calculated the danger rating by looking at attacks on a ‘possession’ basis i.e. by tracking individual chains of possession in open-play and looking at where they end. The end of the chain could be due to a number of reasons including shots, unsuccessful passes or a tackle by an opponent. Each possession is then assigned a danger rating based on the model in the figure above. Possessions which end deep into opponent territory will score more highly, while those that break down close to a team’s own goal are given little weight.

Conceptually, the model is similar to Daniel Altman’s non-shot based model (I think), although he views things through expected goals, whereas I started out looking at passing. You can find some of the details regarding the model here, plus a video of his presentation at the Opta Pro Analytics Forum is available here, which is well worth watching.

Danger Zone

The ratings for last season’s Premier League teams are shown below, with positive values meaning a team had more dangerous possessions than their opponents over the course of the season and vice versa for the negative values. Overall, the correlation between the metric and goal difference and points is pretty good (r-squared values of 0.76 and 0.77 respectively). This is considering open-play only, so it ignores set pieces and penalties, plus I omitted possessions made up of just one shot. The correlation with open-play goal difference is a little larger, so it appears to be an encouraging indicator of team strength.

Open-Play Possession Danger Rating for the 2014/15 English Premier League season. Data via Opta.

Open-Play Possession Danger Rating for the 2014/15 English Premier League season. Zero corresponds to a rating of 50%. Data via Opta.

The rating only takes into account the location of where the possession ends so there is plenty of scope for improvement e.g. throughball passes could carry more weight, counter-attacks could receive an increased danger rating, while moves featuring a cross might be down-weighted. Regardless of such improvements, these initial results are encouraging and are at a similar descriptive level to traditional shot ratios and expected goal models.

Arsenal are narrowly ahead of Manchester City here, as they make up a clear top-two which is strongly driven by their attacking play. Intriguingly, Manchester City’s rating was much greater (+7%) for possessions ending with a shot, while Arsenal’s was almost unchanged (-1%). Similarly to City, Chelsea’s rating for possessions ending with a shot was also greater (+4%)  than their rating for all possessions. I don’t know yet if this is a repeatable trait but it suggests Chelsea and City were more efficient at generating quality shots and limiting their opponents.

Manchester United sit narrowly ahead of Liverpool and Southampton and round out the top four, which was mainly driven by their league-leading defensive performance; few teams were able to get the ball into dangerous positions near their goal. Manchester United’s ability to keep their opponents at arms length has been a consistent trend from the territory-based numbers I’ve looked at.

Analytics anti-heroes Sunderland and a West Brom team managed by Tony Pulis for a large chunk of last season reside at the bottom of the table. Sunderland comfortably allowed the most dangerous possessions in the league last season.


So, we’re left with yet another team strength metric to add to the analytics pile. The question is what does this add to our knowledge and how might we use it?

Analytics has generally based metrics around shots, which is sometimes not reflective of how we often experience the game from a chance creation point of view. The concept of creating a non-shot based chance isn’t a new one – the well worn cliché about a striker ‘fluffing a chance’ tells us that much but what analytics is striving to do is quantify these opportunities and hopefully do something useful with them. Basing the metric on all open-play possessions rather than just focusing on shots potentially opens up some interesting avenues for future research in terms of examining how teams attack and defend. Furthermore, using all possessions rather than those just ending with a shot increases our sample size and opens up the potential for new ways of assessing player contributions.

Looking at player contributions to these possessions will be the subject of my next post.

Liverpool Looking Up? EPL 2015/16 Preview

Originally published on StatsBomb.

After the sordid love affair that culminated in a strong title challenge in 2013/14, Liverpool barely cast a furtive glance at the Champions League places in 2014/15. Their underlying numbers over the whole season provided scant consolation either, with performance levels in line with a decent team lacking the quality usually associated with a top-four contender. Improvements in results and underlying performance will therefore be required to meet the club’s stated aim of Champions League football.

Progress before a fall

Before looking forward to the coming season, let’s start with a look back at Liverpool’s performance over recent seasons. Below is a graphic showing Liverpool’s underlying numbers over the past five seasons, courtesy of Paul Riley’s Expected Goal numbers.

Expected goal rank over the past 5 seasons of the English Premier League. Liverpool seasons highlighted in red.

Expected goal rank over the past 5 seasons of the English Premier League. Liverpool seasons highlighted in red.

From 2010/11 to 2012/13, there was steady progress with an impressive jump in 2013/14 to the third highest rating over the past five years. Paul’s model only evaluates shots on target, so Liverpool’s 2013/14 rating is potentially biased a little high given their unusual/unsustainable proportion of shots on target that year. However, the quality was clear, particularly in attack. Not to be outdone, 2014/15 saw another impressive jump but unfortunately the trajectory was in the opposite direction. Other metrics such as total shots ratio and shots on target ratio tell a similar story, although 2013/14 isn’t quite as impressive.

The less charitable among you may ascribe Liverpool’s trajectory with the presence and performance of one Luis Suárez; when joining in January 2010, Suárez was an erratic yet gifted performer who went on to become a genuine superstar before departing in the summer of 2014. Suárez’s attacking wizardry in 13/14 was remarkable and he served as a vital multiplier in the sides’ pinball style of play. Clearly he was a major loss but there were already reasons to suspect that some regression was due with or without him: Andrew Beasley wrote about the major and likely unsustainable role of set piece goals, while James Grayson and Colin Trainor highlighted the unusually favourable proportions of shots on target and blocked shots respectively during their title challenge. I wrote about how Liverpool’s penchant for early goals had led to an incredible amount of time spent winning over the season (a handy circumstance for a team so adept at counter-attacking), which may well have helped to explain some of their unusual numbers and that it was unlikely to be repeated.

These mitigating and potentially unsustainable factors notwithstanding, the dramatic fall in underlying performance, points (22 in all) and goals scored (an incredible 49 goal decline) is where Liverpool find themselves ahead of the coming season. Such a decline sees Brendan Rodgers go into this season under pressure to justify FSG’s backing of him over the summer, particularly with a fairly nightmarish run of away fixtures to start the season and the spectre of Jürgen Klopp on the horizon.

So, where do Liverpool need to improve this season?

Case for the defence

With the concession of six goals away at Stoke fresh in the memory, the narrative surrounding Liverpool’s defence is strong i.e. the defence is pretty horrible. Numbers paint a somewhat different story with Liverpool’s shots conceded (10.9 per game) standing as the joint-fifth lowest in the league last year according to statistics compiled by the Objective-Football website (rising to fourth lowest in open play). Shots on target were less good (3.8 per game and a rank of joint-seventh) although the margins are fairly small here. By Michael Caley’s and Paul Riley’s expected goal numbers, Liverpool ranked fourth and sixth respectively in expected goals against. Looking at how effective teams were at preventing their opponents from getting the ball into dangerous areas in open-play, my own numbers ranked Liverpool fifth best in the league.

It should be noted that analytics often has something of a blind spot when it comes to analysing defensive performances; metrics which typically work very well on the offensive side often work less well on the defensive side. Liverpool also tend to be a fairly dominant team and their opponents typically favour a deep defence and counter strategy against them, which will limit the number of chances they create.

One area where their numbers (courtesy of Objective-Football again) were noticeably poor was at set-pieces where they conceded on 11.6% of the shots by their opponents, which was 3rd worst in the league, compared to a league average conversion of 8.7%. Set-piece conversion rates are notoriously unsustainable year-on-year though, so some regression towards more normal conversion rates could potentially bring down Liverpool’s goal per game average compared to last season.

While Liverpool’s headline numbers were reasonable, their tendency to shoot themselves in the foot and concede some daft goals was impressive in its ineptitude at times. Culprits typically included combinations of Rodgers’ tactics, Dejan Lovren’s ‘whack a mole’ approach to defending and the embers of Steven Gerrard’s Liverpool career. The defensive structure of the team should be improved now that Gerrard no longer needs to be accommodated at the heart of midfield, while Glen Johnson’s prolonged audition for an extra role in the Walking Dead will continue at Stoke. Nathaniel Clyne should be a significant upgrade at full back, with youngsters Ilori and Gomez presently with the squad and aiming to compete for a first team role.

Broadly speaking though, Liverpool’s defensive numbers were reasonable but with room for improvement. Their numbers looked ok for a Champions League hopeful rather than a title challenger. A more mobile midfield should enhance the protection afforded to the central defence, however it should line up. Whether the individual errors were a bug and not a feature of this Liverpool team will likely determine how the narrative around the defence continues this year.

Under-powered attack

Liverpool’s decline in underlying performance in 2014/15 was driven by a significant drop-off in their attacking numbers. The loss of Suárez was compounded by Daniel Sturridge playing just 750 minutes in the league all season; Sturridge isn’t at the same level as Suárez (few are) but he does represent a truly elite forward and the alternatives at the club weren’t able to replace him.

The loss of Suárez and Sturridge meant that Coutinho and Sterling were now the principal conduits for Liverpool’s attack. Both performed admirably and were among the most dangerous attackers in the division. The figure below details Liverpool’s players according to the number of dangerous passes per 90 minutes played, which is related to my pass-danger rating score. In terms of volume, Coutinho and Sterling were way ahead of their teammates and both ranked in the top 15 in the league (minimum of 900 minutes played). James Milner actually ranked seventh by this metric, so he could well provide an additional source of creativity and link well with Liverpool’s forward players.

Dangerous passes per 90 minutes played metric for Liverpool players in 2014/15. Right hand side shows total number of completed passes per 90 minutes.

Dangerous passes per 90 minutes played metric for Liverpool players in 2014/15. Right hand side shows total number of completed passes per 90 minutes.

As good as Coutinho and Sterling were from a creative perspective, they did lag behind the truly elite players in the league by these metrics. As with many of Liverpool’s better players, you’re often left with the caveat of stating how good they are for their age. That’s not a criticism of the players themselves, merely a recognition of their overall standing relative to their peers.

What didn’t help was the lack of attacking contribution from Liverpool’s peak-age attacking players; Lallana’s contribution was decidedly average, Sturridge is obviously capable of making a stellar contribution but injuries curtailed him, while Balotelli certainly provided a high shot volume powered by a predilection for shooting from range but a potential dose of bad luck meant his goal-scoring record was well below expectation.

While there were clearly good elements to Liverpool’s attack, they were often left shooting from long range. According to numbers published by Michael Caley, Liverpool took more shots from outside the box than any other team last year and had the fourth highest proportion of shots from outside the box (48%). Unsurprisingly, they had the third lowest proportion of shots from the central region inside the penalty area (34%), which is the so-called ‘danger zone’ where shots are converted at much greater rates than wide in the box and outside the area. With their shot volumes being pretty good last season (third highest total shots and fourth highest shots on target), shifting the needle towards better quality chances would certainly improve Liverpool’s prospects. The question is where will that quality come from?

Bobby & Ben

With Sturridge not due back until the autumn coupled with his prior injury record, Liverpool moved to sign Christian Benteke as a frontline striker with youngsters Ings and Origi brought in to fill out the forward ranks. Roberto Firmino was added before Sterling’s departure but the expectation is that he will line-up in a similar role as the dynamic attacking midfielder/forward.

Firmino brings some impressive statistical pedigree with him: elite dribbler, dangerous passer, a tidy shot profile for a non-striker and stand-out tackling numbers for his position. If he can replicate his Bundesliga form then he should be a more than adequate replacement for Sterling, while also having the scope to develop over coming seasons.

Benteke brings a good but not great goal-scoring record, with his record in open-play being particularly average. Although there have been question marks regarding his stylistic fit within the team, Liverpool have seemingly been pursuing a physical forward to presumably act as a ‘reference point’ in their tactical system over the past few years; Diego Costa was a target in 2013, while Wilfred Bony was linked in 2014. Benteke brings that to the table alongside a more diverse range of skills than he is given credit for having been seemingly cast as an immobile lump of a centre forward by some.

Whether he has the necessary quality to improve this Liverpool team is the more pertinent question. From open-play, Benteke averages 2.2 shots per 90 minutes and 0.34 goals per 90 minutes over the past three seasons, which is essentially the average rate for a forward in the top European leagues. For comparison, Daniel Sturridge averages 4.0 shots per 90 minutes and 0.65 goals per 90 minutes over the same period. Granted, Sturridge has played for far greater attacking units than Aston Villa over that period but based on some analysis of strikers moving clubs that I’ve done, there is little evidence that shot and goal rates rise when moving to a higher quality team. Benteke does provide a major threat from set-pieces, which has been a productive source of goals for him but I would prefer to view these as an added extra on top of genuine quality in open-play, rather than a fig leaf.

Benteke will need to increase his contribution significantly if he is to cover for Sturridge over the coming season, otherwise Liverpool may find themselves in the good but not great attacking category again.


So where does all of the above leave Liverpool going into the season? Most of the underlying numbers for last season suggested that Chelsea, Manchester City and Arsenal were well ahead of the pack and I don’t see much prospect of one of them dropping out of the top four. Manchester United, Liverpool and Southampton made up the trailing group, with these three plus perhaps Tottenham in a battle to be the ‘best of the rest’ or ‘least crap’ and claim the coveted fourth place trophy.

When framed this way, Liverpool’s prospects look more viable, although fourth place looks like the ceiling at present unless the club procure some adamantium to alleviate Sturridge’s injury woes. While Liverpool currently operate outside the financial Goldilocks zone usually associated with a title challenge, they should have the quality to mount a concerted challenge for that Champions League spot in what could be a tight race. They did put together some impressive numbers during the 3-4-3 phase of last season that was in-line with those expected of a Champions League contender; replicating and sustaining that level of quality should be the aim for the team this coming season.

Prediction: 4-6th, most likely 5th.

P.S. Can Liverpool to be more fun this year? If you can’t be great, at least be fun.

Uncertain expectations

In this previous post, I describe a relatively simple version of an expected goals model that I’ve been developing recently. In this post, I want to examine the limitations and uncertainties relating to how well the model predicts goals.

Just to recap, I built the model using data from the Premier League from 2013/14 and 2014/15. For the analysis below, I’m just going to focus on non-penalty shots with the foot, so it includes both open-play and set piece shot situations. Mixing these will introduce some bias but we have to start somewhere. The data amounts to over 16,000 shots.

What follows is a long and technical post. You have been warned.

Putting the boot in

One thing to be aware of is how the model might differ if we used a different set of shots for input; ideally the answer we get shouldn’t change if we only used a subset of the data or if we resample the data. If the answer doesn’t change appreciably, then we can have more confidence that the results are robust.

Below, I’ve used a statistical technique known as ‘bootstrapping‘ to assess how robust the regression is for expected goals. Bootstrapping belongs to a class of statistical methods known as resampling. The method works by randomly extracting shots from the dataset and rerunning the regression many times (1000 times in the plot below). Using this, I can estimate a confidence interval for my expected goal model, which should provide a reasonable estimate of goal expectation for a given shot.

For example, the base model suggests that a shot from the penalty spot has an xG value of 0.19. The bootstrapping suggests that the 90% confidence interval gives an xG range from 0.17 to 0.22. What this means is that on 90% of occasions that Premier League footballers take a shot from the penalty spot, we would expect them to score somewhere between 17-22% of the time.

The plot below shows the goal expectation for a shot taken in the centre of the pitch at varying distances from the goal. Generally speaking, the confidence interval range is around ±1-2%. I also ran the regressions on subsets of the data and found that after around 5000 shots, the central estimate stabilised and the addition of further shots in the regression just narrows the confidence intervals. After about 10,000 shots, the results don’t change too much.


Expected goal curve for shots in the centre of the pitch at varying distances from the goal. Shots with the foot only. The red line is the median expectation, while the blue shaded region denotes the 90% confidence interval.

I can use the above information to construct a confidence interval for the expected goal totals for each team, which is what I have done below. Each point represents a team in each season and I’ve compared their expected goals vs their actual goals. The error bars show the range for the 90% confidence intervals.

Most teams line up with the one-to-one line within their respective confidence intervals when comparing with goals for and against. As I noted in the previous post, the overall tendency is for actual goals to exceed expected goals at the team level.

Expected goals vs actual goals for teams in the 2013/14 and 2014/15 Premier League. Dotted line is the 1:1 line, the solid line is the line of best fit and the error bars denote the 90% confidence intervals based on the xG curve above.

Expected goals vs actual goals for teams in the 2013/14 and 2014/15 Premier League. Dotted line is the 1:1 line, the solid line is the line of best fit and the error bars denote the 90% confidence intervals based on the xG curve above.

As an example of what the confidence intervals represent, in the 2013/14 season, Manchester City’s expected goal total was 59.8, with a confidence interval ranging from 52.2 to 67.7 expected goals. In reality, they scored 81 non-penalty goals with their feet, which falls outside of their confidence interval here. On the plot below, Manchester City are the red marker on the far right of the expected goals for vs actual goals for plot.

Embracing uncertainty

Another method of testing the model is to look at the model residuals, which are calculated by subtracting the outcome of a shot (either zero or one) from its expected goal value. If you were an omnipotent being who knew every aspect relating to the taking of a shot, you could theoretically predict the outcome of a shot (goal or no goal) perfectly (plus some allowance for random variation). The residuals of such a model would always be zero as the outcome minus the expectation of a goal would equal zero in all cases. In the real world though, we can’t know everything so this isn’t the case. However, we might expect that over a sufficiently large sample, the residual will be close to zero.

In the figure below, I’ve again bootstrapped the data and looked at the model residuals as the number of shots increases. I’ve done this 10,000 times for each number of shots i.e. I extract a random sample from the data and then calculate the residual for that number of shots. The red line is the median residual (goals minus expected goals), while the blue shaded region corresponds to the standard error range (calculated as the 90% confidence interval). The residual is normalised to a per shot basis, so the overall uncertainty value is equal to this value multiplied by the number of shots taken.


Goals-Expected Goals versus number of shots calculated via bootstrapping. Inset focusses on the first 100 shots. The red line is the median, while the blue shaded region denotes the 90% confidence interval (standard error).

The inset shows how this evolves up to 100 shots and we see that over about 10 shots, the residual approaches zero but the standard errors are very large at this point. Consequently, our best estimate of expected goals is likely highly uncertain over such a small sample. For example, if we expected to score two goals from 20 shots, the standard error range would span 0.35 to 4.2 goals. To add a further complication, the residuals aren’t normally distributed at that point, which makes interpretations even more challenging.

Clearly there is both a significant amount of variation over such small samples, which could be a consequence of both random variation and factors not included in the model. This is an important point when assessing xG estimates for single matches; while the central estimate will likely have a very small residual, the uncertainty range is huge.

As the sample size increases, the uncertainty decreases. After 100 shots, which would equate to a high shot volume for a forward, the uncertainty in goal expectation would amount to approximately ±4 goals. After 400 shots, which is close to the average number of shots a team would take over a single season, the uncertainty would equate to approximately ±9 goals. For a 10% conversion rate, our expected goal value after 100 shots would be 10±4, while after 400 shots, our estimate would be 40±9 (note the percentage uncertainty decreases as the number of shots increases).


Same as above but with individual teams overlaid.

Above is the same plot but with the residuals shown for each team over the past two seasons (or one season if they only played for a single season). The majority of teams fall within the uncertainty envelope but there are some notable deviations. At the bottom of the plot are Burnley and Norwich, who significantly under-performed their expected goal estimate (they were also both relegated). On the flip side, Manchester City have seemingly consistently outperformed the expected goal estimate. Part of this is a result of the simplicity of the model; if I include additional factors such as how the chance is created, the residuals are smaller.

How well does an xG model predict goals?

Broadly speaking, the central estimates of expected goals appear to be reasonably good; the residuals tend to zero quickly and even though there is some bias, the correlations and errors are encouraging. When the uncertainties in the model are propagated through to the team level, the confidence intervals are on average around ±15% for expected goals for and against.

When we examine the model errors in more detail, they tend to be larger (around ±25% at the team level over a single season). The upshot of all this is that there appears to be a large degree of uncertainty in expected goal values when considering sample sizes relevant at the team and player level. While the simplicity of the model used here may mean that the uncertainty values shown represent a worst-case scenario, it is still something that should be considered when analysts make statements and projections. Having said this, based on some initial tests, adding extra complexity doesn’t appear to reduce the residuals to any great degree.

Uncertainty estimates and confidence intervals aren’t sexy and having spent the last 1500ish words writing about them, I’m well aware they aren’t that accessible either. However, I do think they are useful and important in the real world.

Quantifying these uncertainties can help to provide more honest assessments and recommendations. For example, I would say it is more useful to say that my projections estimate that player X will score 0.6-1.4 goals per 90 minutes next season along with some central value, rather than going with a single value of 1 goal per 90 minutes. Furthermore, it is better to state such caveats in advance – if you just provided the central estimate and the player posted say 0.65 goals per 90 and you then bring up your model’s uncertainty range, you will just sound like you’re making excuses.

This also has implications regarding over and under performance by players and teams relative to expected goals. I frequently see statements about regression to the mean without considering model errors. As George Box wisely noted:

Statisticians, like artists, have the bad habit of falling in love with their models.

This isn’t to say that expected goal models aren’t useful, just that if you want to wade into the world of probability and modelling, you should also illustrate the limitations and uncertainties associated with the analysis.

Perhaps those using expected goal models are well aware of these issues but I don’t see much discussion of it in public. Analytics is increasingly finding a wider public audience, along with being used within clubs. That will often mean that those consuming the results will not be aware of these uncertainties unless you explain them. Speaking as a researcher who is interested in the communication of science, I can give many examples of where not discussing uncertainty upfront can backfire in the long run.

Isn’t uncertainty fun!


Thanks to several people who were kind enough to read an initial draft of this article and the proceeding method piece.

Great Expectations

One of the most popular metrics in football analytics is the concept of ‘expected goals’ or xG for short. There are various flavours of expected goal models but the fundamental objective is to assess the quality of chances created or conceded by a team. The models are also routinely applied to assessing players using various techniques.

Michael Caley wrote a nice explanation of the what and the why of expected goals last month. Alternatively, you could check out this video by Daniel Altman for a summary of some of the potential applications of the metric.

I’ve been building my own expected goals model recently and I’ve been testing out a fundamental question regarding the performance of the model, namely:

How well does it predict goals?

Do expected goal models actually do what they say on the tin? This is a really fundamental and dumb question that hasn’t ever been particularly clear to me in relation to the public expected goal models that are available.

This is a key aspect, particularly if we want to make statements about prior over or under-performance and any anticipated changes in the future. Further to this, I’m going to talk about uncertainty and how that influences the statements that we can make regarding expected goals.

In this post, I’m going to describe the model and make some comparisons with a ‘naive’ baseline. In a second post, I’m going to look at uncertainties relating to expected goal models and how they may impact our interpretations of them.

The model

Before I go further, I should note that the initial development closely resembles the work done by Michael Caley and Martin Eastwood, who detailed their own expected goal methods here and here respectively.

I built the model using data from the Premier League from 2013/14 and 2014/15. For the analysis below, I’m just going to focus on non-penalty shots with the foot, so it includes both open-play and set piece shot situations. Mixing these will introduce some bias but we have to start somewhere. The data amounts to over 16,000 shots.

I’m only including distance from the centre of the goal in the first instance, which I calculated in a similar manner to Michael Caley in the link above as the distance from the goal line divided by the relative angle. I didn’t raise the relative angle to any power though.

I then calculate the probability of a goal being scored with the adjusted distance of each shot as the input; shots are deemed either successful (goal) or unsuccessful (no goal). Similarly to Martin Eastwood, I found that an exponential decay formula represented the data well. However, I found that there was a tendency towards under-predicting goals on average, so I included an offset in the regression. The equation I used is below:

xG = exp(-Distance/α) + β

Based on the dataset, the fit coefficients were 6.65 for α and 0.017 for β. Below is what this looks like graphically when I colour each shot by the probability of a goal being scored; shots from close to the goal line in central positions are far more likely to be scored than long distance shots or shots from narrow angles, which isn’t a new finding.


Expected goals based on shot location using data from the 2013/14 and 2014/15 Premier League seasons. Shots with the foot only.

So, now we have a pretty map and yet another expected goal model to add to the roughly 1,000,001 other models in existence.


In the figure below, I’ve compared the expected goal totals with the actual goals. Most teams are close to the one-to-one line when comparing with goals for and against, although the overall tendency is for actual goals to exceed expected goals at the team level. When looking at goal difference, there is some cancellation for teams, with the correlation being tighter and the line of best fit passing through zero.


Expected goals vs actual goals for teams in the 2013/14 and 2014/15 Premier League. Dotted line is the 1:1 line, the solid line is the line of best fit. Click on the graph for an enlarged version.

Inspecting the plot more closely, we can see some bias in the expected goal number at the extreme ends; high-scoring teams tend to out-perform their expected goal total, while the reverse is true for low scoring teams. The same is also true for goals against, to some extent, although the general relationship is less strong than for goals for. Michael Caley noted a similar phenomenon here in relation to his xG model. Overall, it looks like just using location does a reasonable job.


The table above includes R2 and mean absolute error (MAE) values for each metric and compares them to a ‘naïve’ baseline where just the average conversion rate is used to calculate the xG values i.e. the location of the shot is ignored. The Rvalue assesses the strength of the relationship between expected goals and goals, with values closer to one indicating a stronger link. Mean absolute error takes an average of the difference between the goals and expected goals; the lower the value the better. In all cases, including location improves the comparison. ‘Naïve’ xG difference is effectively Total Shot Difference as it assumes that all shots are equal.

What is interesting is that the correlations are stronger in both cases for goals for than goals against. This could be a fluke of the sample I’m using but the differences are quite large. There is more stratification in goals for than goals against, which likely helps improve the correlations. James Grayson noted here that there is more ‘luck’ or random variation in goals against than goals for.

How well does an xG model predict goals?

Broadly speaking, the central estimates of expected goals appear to be reasonably good. Even though there is some bias, the correlations and errors are encouraging. Adding location into an xG model clearly improves our ability to predict goals compared to a naïve baseline. This obviously isn’t a surprise but it is useful to quantify the improvements.

The model can certainly be improved though and I also want to quantify the uncertainties within the model, which will be the topic of my next post.

Premier League Pass Masters

In this previous post, I combined territory and possession to create a Territorial-Possession Dominance (TPD) metric. The central basis for this metric is that it is more difficult to pass the ball into dangerous areas. Essentially teams that have the ball in areas closer to their opponent’s goal, while stopping their opponent moving the ball close to their own, will score more highly on this metric.

In the graphic below, I’ve looked at how the teams in the Premier League have been shaping up this year (data correct up to 24/04/15). The plot splits this performance on the offensive side (with the ball) and the defensive side (without the ball). For a frame of reference, league average is defined as a score of 100.

Broadly, these two terms show that teams who dominate territory with the ball also limit the amount of possession they concede close to their own goal. This makes sense given there is only one ball on the pitch, so pinning your opponent back in their half makes it more difficult to maintain possession in dangerous areas in return. Alternatively, teams may choose to sit back, soak up pressure and then aim to counter attack; this would yield a low rating offensively and a higher rating defensively.

Territorial-possession for and against for the 2014/15 English Premier League. A score of 100 denotes league average. Marker colour refers to Territorial-Possession Dominance. Data via Opta.

The top seven (plus Everton) tend to dominate territory and possession, while the bottom thirteen (minus Everton) are typically pinned back. Stoke City are somewhat peculiar, as they are below average on both scores,so while they limit their opponents, they seemingly struggle to manoeuvre the ball into dangerous areas themselves. Michael Caley’s expected goals numbers suggest that Everton have seemingly struggled to convert their territorial and possession dominance into an abundance of good quality chances; essentially they look pretty in-between both boxes.

Sunderland’s passivity is evident as they routinely saw their opponents pass the ball into dangerous areas; based on where their defensive actions occur and the league-leading number of shots from outside of the box they concede, the aim is to get men behind the ball and prevent good quality chances from being created. That is possibly a reasonable tactical system if you can combine that with swift counter-attacking and high quality chances but Poyet’s dismissal is indicative of how that worked out.

On the flip side, Manchester United rank lowest for territorial-possession against. Their system is designed to prevent their opponent’s from building pressure on their defense close to their own goal. Think of it as a system designed to prevent Phil Jones’ face from trending on Twitter. Of course, when the system breaks down and/or opposition skill breaks through, things look awful and high quality chances are conceded.

Finally, Manchester City clearly aren’t trying hard enough.

Passing maestros

The metric I’ve devised classifies each pass completed based on the destination of the pass, so it is relatively straight-forward to breakdown the metric by the player passing the ball. Below are the top twenty players this season ranked according to the average ‘danger’ of their passes (non-headed passes only, minimum 900 minutes played). I can also do this for players receiving the ball but I’ll leave that for another time.

Players who routinely complete passes into dangerous areas will score highly here, so there is an obvious bias towards forwards and attacking midfielders/wingers. Bias will also be introduced by team systems, which would be a good thing to examine in the future. I’ve also noted on the right-hand-side the number of passes each player completes per 90 minutes to give a sense of their involvement.

Some players, like Diafra Sakho and Jamie Vardy, are rarely involved but their passes are often dangerous. Others manage to combine a high-volume of passes with danger; PFA Player of the Year, Eden Hazard, is the standout here (very much a Sum 41 kind of footballer). The link-up skills of Sánchez and Agüero are also evident.

Pass Danger Rating for English Premier League players in the 2014/15 season. Numbers on right indicate number of completed passes played per 90 minutes by each player. Minimum of 900 minutes played. Data via Opta.

I quite like this as a metric, as the results aren’t always obvious; it is nice to have confirmatory metrics but informative metrics are potentially more valuable from an analytics point of view. For instance, the metric can quickly identify the dangerous passers for the opposition, who could then be targeted to reduce their influence. It can also be useful in identifying players who could possibly do more on your own team (*cough* Lallana *cough*). Finally, it’s a metric that could be used as a part of an analytics based scouting system. I’m hoping to develop this further, so watch this space.

Square pegs for square holes: OptaPro Forum Presentation

At the recent OptaPro Forum, I was delighted to be selected to present to an audience of analysts and representatives from the football industry. I presented a technique to identify different player types using their underlying statistical performance. My idea was that this would aid player scouting by helping to find the “right fit” and avoid the “square peg for a round hole” cliché.

In the presentation, I outlined the technique that I used, along with how Dani Alves made things difficult. My vision for this technique is that the output from the analysis can serve as an additional tool for identifying potential transfer signings. Signings can be categorised according to their team role and their performance can then be compared against their peers in that style category based on the important traits of those player types.

The video of my presentation is below, so rather than repeating myself, go ahead and watch it! The slides are available here.

Each of the player types is summarised below in the figures. My plan is to build on this initial analysis by including a greater number of leagues and use more in-depth data. This is something I will be pursuing over the coming months, so watch this space.

Some of my work was featured in this article by Ben Lyttleton.

Forward player types.

Forward player types

Midfielder player types.

Midfielder player types.

Defender player types.

Defender player types.

Help me rondo

In my previous post, I looked at the relationship between controlling the pitch (territory) and the ball (possession). When looking at the final plot in that post, you might infer that ‘good’ teams are able to control both territory and possession, while ‘bad’ teams are dominated on both counts. There are also teams that dominate only one metric, which likely relates to their specific tactical make-up.

When I calculated the territory metric, I didn’t account for the volume of passes in each area of the pitch as I just wanted to see how things stacked up in a relative sense. Territory on its own has a pretty woeful relationship with things we care about like points (r2=0.27 for the 2013/14 EPL) and goal difference (r2=0.23 for the 2013/14 EPL).

However, maybe we can do better if we combine territory and possession into one metric.

To start with, I’ve plotted some heat maps (sorry) showing pass completion percentage based on the end point of the pass. The completion percentage is calculated by adding up all of the passes to a particular area on the pitch and comparing that to the number of passes that are successfully received. I’ve done this for the 2013/14 season for the English Premier League, La Liga and the Bundesliga.

As you would expect, passes directed to areas closer to the goal are completed at lower rates, while passes within a teams own half are completed routinely.


Heat map of pass completion percentage based on the target of all passes in the 2013/14 English Premier League, La Liga and Bundesliga. Data via Opta.

What is interesting in the below plots is the contrast between England and Germany; in the attacking half of the pitch, pass completion is 5-10% lower in the Bundesliga than in the EPL. La Liga sits in-between for the most part but is similar to the Bundesliga within the penalty area. My hunch is that this is a result of the contrasting styles in these leagues:

  1. Defences often sit deeper in the EPL, particularly when compared to the Bundesliga, which results in their opponents completing passes more easily as they knock the ball around in front of the defence.
  2. German and Spanish teams tend to press more than their English counter-parts, which will make passing more difficult. In Germany, counter-pressing is particularly rife, which will make passing into the attacking midfield zone more challenging.

From the above information, I can construct a model* to judge the difficulty of a pass into each area of the pitch and given the differences between the leagues, I do this for each league separately.

I can then use this pass difficulty rating along with the frequency of passes into that location to put a value on how ‘dangerous’ a pass is e.g. a completed pass received on the penalty spot in your opponents penalty area would be rated more highly than one received by your own goalkeeper in his six-yard box.

Below is the resulting weighting system for each league. Passes that are received in-front of the goal within the six-yard box would have a rating close to one, while passes within your own half are given very little weighting as they are relatively easy to complete and are frequent.

There are slight differences between each league, with the largest differences residing in the central zone within the penalty area.


Heat map of pass weighting model for the 2013/14 English Premier League, La Liga and Bundesliga. Data via Opta.

Using this pass weighting scheme, I can assign a score to each pass that a team completes, which ‘rewards’ them for completing more dangerous passes themselves and preventing their opponents from moving the ball into more dangerous areas. For example, a team that maintains possession in and around the opposition penalty area will increase their score. Similarly, if they also prevent their opponent from moving the ball into dangerous areas near their own penalty area, this will also be rewarded.

Below is how this Territorial-Possession Dominance (TPD) metric relates to goal difference. It is calculated by comparing the for and against figures as a ratio and I’ve expressed it as a percentage.

Broadly speaking, teams with a higher TPD have a better goal difference (overall r2=0.59) but this varies across the leagues. Unsurprisingly, Barcelona and Bayern Munich are the stand-out teams on this metric as they pin teams in and also prevent them from possessing the ball close to their own goal. Manchester City (the blue dot next to Real Madrid) had the highest TPD in the Premier League.

In Germany, the relationship is much stronger (r2=0.87), which is actually better than both Total Shot Ratio (TSR, r2=0.74) and Michael Caley’s expected goals figures (xGR, r2=0.80). A major caveat here though is that this is just one season in a league with only 18 teams and Bayern Munich’s domination certainly helps to strengthen the relationship.

The relationship is much weaker in Spain (r2=0.35) and is worse than both TSR (r2=0.54) and xGR (r2=0.77).  A lot of this is driven by the almost non-existent explanatory power of TPD when compared with goals conceded (r2=0.06). La Liga warrants further investigation.

England sits in-between (r2=0.69), which is on a par with TSR (r2=0.72). I don’t have xGR numbers for last season but I believe xGR is usually a few points higher than TSR in the Premier League.


Relationship between goal difference per game and territorial-possession dominance for the 2013/14 English Premier League, La Liga and Bundesliga. Data via Opta.

The relationship between TPD and points (overall r2=0.56) is shown below and is broadly similar to goal difference. The main difference is that the strength of the relationship in Germany is weakened.


Relationship between points per game and territorial-possession dominance for the 2013/14 English Premier League, La Liga and Bundesliga. Data via Opta.

Over the summer, I’ll return to these correlations in more detail when I have more data and the relationships are more robust. For now, the metric appears to be useful and I plan to improve it further. Also, I’ll be investigating what it can tell us about a teams style when combined with other metrics.

——————————————————————————————————————– *For those who are interested in the method, I calculated the relative distance of each pass from the centre of the opposition goal using the distance along the x-axis (the length of the pitch) and the angle relative to a centre line along the length of the pitch.

I then used logistic regression to calculate the probability of a pass being completed; passes are deemed either successful or unsuccessful, so logistic regression is ideal and avoids putting the passes into location buckets on the pitch.

I then weighted the resulting probability according to the frequency of passes received relative to the distance from the opposition goal-line. This gave me a ‘score’ for each pass, which I used to calculate the territory weighted possession for each team.