Luis Suárez: Home & away

Everyone’s favourite riddle wrapped in an enigma was a topic of Twitter conversation between various analysts yesterday. The matter at hand was Luis Suárez’s improved goal conversion this season compared to his previous endeavours. Suárez has previously been labelled as inefficient by members of the analytics community (not the worst thing he has been called mind), so explaining his upturn is an important puzzle.

In the 2012/13 season, Suárez scored 23 goals from 187 shots, giving him a 12.3% conversion rate. So far this season he has scored 25 goals from 132 shots, which works out at 18.9%.

What has driven this increased conversion?

Red Alert

Below I’ve broken down Suárez’s goal conversion exploits into matches played home and away over the past two seasons. In terms of sample sizes, in 2012/13 he took 98 shots at home and 89 shots away, while he has taken 69 and 63 respectively this season.

Season Home Away Overall
2012/13 11.2% 13.5% 12.3%
2013/14 23.2% 14.3% 18.9%

The obvious conclusion is that Suárez’s improved goal scoring rate has largely been driven by an increased conversion percentage at home. His improvement away is minor, coming in at 0.8% but his home improvement is a huge 12%.

What could be driving this upturn?

Total Annihilation

Liverpool’s home goal scoring record this season has seen them average 3 goals per game compared to 1.7 last season. Liverpool have handed out several thrashings at home this season, scoring 3 or more goals in nine of their fourteen matches. Their away goal scoring has improved from 2 goals per game to 2.27 per game for comparison.

Liverpool have been annihilating their opponents at home this season and I suspect Suárez is reaping some of the benefit of this with his improved goal scoring rate. Liverpool have typically gone ahead early in their matches at home this season but aside from their initial Suárez-less matches, that hasn’t generally seen them ease off in terms of their attacking play (they lead the league in shots per game at home with 20.7).

My working theory is that Suárez has benefited from such situations by taking his shots under less pressure and/or better locations when Liverpool have been leading at home. I would love to hear from those who collect more detailed shot data on this.

Drilling down into some more shooting metrics at home adds some support to this. Suárez has seen a greater percentage of his shots hit the target at home this season compared with last (46.4% vs 35.7%). He has also seen a smaller percentage being blocked this season (13% vs 24.5%). Half of Suárez’s shots on target at home this season have resulted in a goal compared to 31.4% last season. Away from home, the comparison between this season and last is much closer.

These numbers are consistent with Suárez taking his shots at home this season in better circumstances. I should stress that there is a degree of circularity here as Suárez’s goal scoring is not independent of Liverpool’s. Further analysis is required.

Starcraft

The above is an attempt to explain Suárez’s improved goal scoring form. I doubt it is the whole story but it hopefully provides some clues ahead of more detailed analysis. Suárez may well have also benefited from a hot-streak this season and the big question will be whether he can sustain his goal scoring form over the remainder of this season and into next.

As I’ve shown previously, there is a large amount of variability in player shot conversion from season to season. Some of this will be due to ‘luck’ or randomness but some of this could be due to specific circumstances such as those potentially aiding Suárez this season. Explaining the various factors involved in goal scoring is a tricky puzzle indeed.

——————————————————————————————————————–

All data in this post are from Squawka and WhoScored.

You’ll never win anything with crosses

You’ll probably have heard about Manchester United’s penchant for crossing in their match against Fulham yesterday. If you haven’t, all 81 of them are illustrated below in their full chalkboard glory.

Manchester United's crosses in the Premier League match against Fulham on the 9th February 2014.

Manchester United’s crosses in the Premier League match against Fulham on the 9th February 2014. All 81 of them. Image via Squawka.

Rather than focus on the tactical predictability of such a strategy, I’m going to take a look at whether it can be a successful one over the long term.

In the public work on attacking strategies, the analytics community isn’t quite at the stage where the merits of individual strategies has been quantified. The work so far suggests that crossing is probably on the lower end though in terms of effectiveness. Ted Knutson did a nice summary of the work in this area here.

Can crossing bring success?

Given this, I’m going to assess crosses from a different angle. Over the past five seasons, the Premier League Champions have averaged 2.3 goals per game. The fourth placed team has averaged 1.8 goals per game. This suggests that a top team needs an attacking strategy that can yield around two goals per game. Let’s see if crossing can get you there.

I’m going to focus on open play crosses as I feel that is more relevant from a tactical perspective; set piece crosses are a different (more effective) matter. Based on data from the 2011/12 Premier League season, I found that on average it took 79 crosses in open play for a single goal to be scored. On average, teams had 22 open play crosses per game. So an average Premier League team would expect to score a goal from an open play cross every three-to-four games. I only have data for one season, so let’s be generous and round that down to a goal every three matches. That is a long way off two goals per game.

Let’s consider an example of a team that both crosses more than average and converts those crosses into goals more efficiently e.g. Manchester United in 2011/12. They averaged 22 open-play crosses per game and scored 19 goals, which works out at 43.5 crosses per goal. So even a really good crossing team in terms of their goal return could only manage a goal from an open play cross every two games. The caveat to this last point also is that I don’t have the data to look at whether that is a sustainable level of goal production from crosses.

Based on the above, I would say it is basically impossible to be an elite team and use crossing as your main strategy. If you were good at set pieces, you could probably add another 20 or so goals over a season but that still only puts you at a goal per game average.

That isn’t to say that crossing is pointless – as a part of a varied attacking approach and against an opponent who isn’t dug in and ready for them, they can be an effective source of goals (see the video of Dani Alves assists below and Luis Suarez’s sublime assists in the past two games).

This is where the problem occurs for Moyes. According to WhoScored, in the last three seasons under Ferguson, Manchester United averaged 27, 25 and 27 crosses per game while posting 6, 3 and 3 through-balls per game. The crossing figure is up to 29 per game now with through-balls down to a paltry one per game. Crossing is not a new thing at Manchester United but more of their play under Moyes is focussed down the flanks; around 30% of their attacks under Ferguson in his last three years came down the middle of the pitch. Under Moyes, that has dropped to 24%, which is the lowest proportion in the league. This was wonderfully illustrated in this piece by Mike Goodman for Grantland earlier this season.

Moyes’ tactics have seemingly reduced the effectiveness of Manchester United’s previous elite attacking levels, which matches up with the successful lowering of expectations of the current champions prospects.

Scoring ability: the good, the bad and the Messi

Identifying scoring talent is one of the main areas of investigation in analytics circles, with the information provided potentially helping to inform decisions that can cost many, many millions. Players who can consistently put the ball in the net cost a premium; can we separate these players from the their peers?

I’m using data from the 2008/09 to 2012/13 seasons across the top divisions in England, Spain, Germany and Italy from ESPN. An example of the data provided is available here for Liverpool in 2012/13. This gives me total shots (including blocked shots) and goals for over 8000 individual player seasons. I’ve also taken out penalties from the shot and goal totals using data from TransferMarkt. This should give us a good baseline for what looks good, bad and extraordinary in terms of scoring talent. Clearly this ignores the now substantial work being done in relation to shot location and different types of shot but the upside here is that the sample size (number of shots) is larger.

Below is a graph of shot conversion (defined as goals divided by total shots) against total shots. All of the metrics I’ll use will have penalties removed from the sample. The average conversion rate across the whole sample is 9.2%. Using this average, we can calculate the bounds of what average looks like in terms of shot conversion; we would expect some level of random variation around the average and for this variation to be larger for players who’ve taken fewer shots.

Shot conversion versus total shots for individual players in the top leagues in England, Italy, Spain and Germany from 2008/09-2012/13. Points are shown in grey with certain players highlighted, with the colours corresponding to the season. The solid black line is the average conversion rate of 9.2%, with the dotted lines above and below this line corresponding to two standard errors above the average. The dashed line corresponds to five standard errors. Click on the image for a larger view.

On the plot I’ve also added some lines to illustrate this. The solid black line is the average shot conversion rate, while the two dotted lines either side of it represent upper and lower confidence limits calculated as being two standard errors from the mean. These are known as funnel plots and as far as I’m aware, they were introduced to football analysis by James Grayson in his work on penaltiesPaul Riley has also used them when looking at shot conversion from different areas of the pitch. There is a third dotted line but I’ll talk about that later.

So what does this tell us? Well we would expect approximately 95% of the points to fall within this envelope around the average conversion rate; the actual number of points is 97%. From a statistical point of view, we can’t identify whether these players are anything other than average at shot conversion. Some players fall below the lower bound, which suggests that they are below average at converting their shots into goals. On the other hand, those players falling above the upper bound, are potentially above average.

The Bad

I’m not sure if this is surprising or not, but it is actually quite hard to identify players who fall below the lower bound and qualify as “bad”. A player needs to take about 40 shots without scoring to fall beneath the lower bound, so I suspect “bad” shooters don’t get the opportunity to approach statistical significance. Some do though.

Only 62 player seasons fall below the lower bound, with Alessandro Diamanti, Antonio Candreva, Gökhan Inler and (drum-roll) Stewart Downing having the dubious record of appearing twice. Downing actually holds the record in my data for the most shots (80) without scoring in 2008/09, with his 2011/12 season coming in second with 71 shots without scoring.

The Good

Over a single season of shots, it is somewhat easier to identify “good” players in the sample, with 219 players lying above the two standard error curve. Some of these players are highlighted in the graph above and rather than list all of them, I’ll focus on players that have managed to consistently finish their shooting opportunities at an above average rate.

Only two players appear in each of the five seasons of this sample; Gonzalo Higuaín and Lionel Messi. Higuaín has scored an impressive 94 goals with a shot conversion rate of 25.4% over that sample. I’ll leave Messi’s numbers until a little later. Four players appear on four separate occasions; Álvaro Negredo, Stefan Kießling, Alberto Gilardino and Giampaolo Pazzini. Negredo is interesting here as while his 15.1% conversion rate over multiple seasons isn’t as exceptional as some other players, he has done this over a sustained period while taking a decent volume of shots each season (note his current conversion rate at Manchester City is 16.1%).

Eighteen players have appeared on this list three times; notable names include van Persie, Di Natale, Cavani, Agüero, Gómez, Soldado, Benzema, Raúl, Fletcher, Hernández and Agbonlahor (wasn’t expecting that last one). I would say that most of the players mentioned here are more penalty box strikers, which suggests they take more of their shots from closer to the goal, where conversion rates are higher. It would be interesting to cross-check these with analysts who are tracking player shot locations.

The Messi

To some extent, looking at players that lie two standard errors above or below the average shot conversion rate is somewhat arbitrary. The number of standard errors you use to judge a particular property typically depends on your application and how “sure” you want to be that the signal you are observing is “real” rather than due to “chance”. For instance, when scientists at CERN were attempting to establish the existence of the Higgs boson, they used a very stringent requirement that the observed signal is five standard errors above the typical baseline of their instruments; they want to be really sure that they’ve established the existence of a new particle. The tolerance here is that there be much less than a one in a million chance that any observed signal be the result of a statistical fluctuation.

As far as shot conversion is concerned, over the two seasons prior to this, Lional Messi is the Higgs boson of football. While other players have had shot conversion rates above this five-standard error level, Messi has done this while taking huge shot volumes. This sets him apart from his peers. Over the five seasons prior to this, Messi took 764 shots, from which an average player would be expected to score between 54 and 86 goals based on a player falling within two standard errors of the average; Messi has scored 162! Turns out Messi is good at the football…who knew?

Stats! What are they good for?

I’ve been closely following the developments in the football analytics community for close to two years now, ever since WhoScored allied themselves with the Daily Mail and suggested Xavi wasn’t so good at the football and I was directed to James Grayson’s wonderful riposte.

There has been some discussion on Twitter about the state of football analytics recently and I thought I would commit some extended thoughts on this topic to writing.

Has football analytics stalled?

Part of the Soccermetrics podcast, featuring Howard Hamilton and Zach Slaton, revolved around how football analytics as an activity has “stalled” (Howard has since attributed the “stalled” statement to Chris Anderson, although he seemingly agrees with it). Even though this wasn’t really defined, I find it difficult to comprehend the view that analytics has stalled.

Over the past two years, the community has developed a lot as far as I can see. James Grayson and Mark Taylor continue to regularly publish smart work, while new bloggers have emerged also. The StatsBomb website has brought together a great collection of analysts and thinkers on the game and they appear to be gaining traction outside of the analytics echo chamber.

In addition to this, data is increasingly finding a place in the mainstream media; Zach Slaton writes at Forbes, Sean Ingle is regularly putting numbers into his Guardian columns and there is a collection of writers contributing to the Dutch newspaper De Volkskrant. Mike Goodman is doing some fantastic work at Grantland; his piece on Manchester United this season is an all too rare example of genuine insight in the wider football media. The Numbers Game book by Chris Anderson and David Sally was also very well received.

Allied to these writing developments, a number of analytics bloggers have joined professional clubs or data organisations recently – surely it is encouraging to see smart people being brought into these environments? (One side effect of this is that some great work is lost from the public sphere though e.g. the StatDNA blog).

To me, this all seems like progress on a number of fronts.

What are we trying to achieve?

The thing that isn’t clear to me is what people in the analytics community are actually aiming for. Some are showcasing their work with the aim of getting a job in the football industry, some are hoping to make some money, while others are doing it as a hobby (*waves hand*). Whatever the motivation, the work coming out of the community is providing insights, context and discussion points and there is an audience for it even if it is considered quite niche.

Football analytics is still in its infancy and expecting widespread acceptance in the wider football community at this stage is perhaps overly ambitious. However, strides are being made; tv coverage has started looking more at shot counts over a season and heat maps of touches have made a few appearances. These are small steps undoubtedly but I doubt there is much demand for scatter plots, linear regression and statistical significance tests from tv producers. Simple and accessible tables or metrics that can be overlaid on an image of a football pitch seem to go down well with a broader audience – the great work being done on shot locations seems ripe for this as it is accessible and intuitive without resorting to complex statistical language.

Gary Neville shows off his massive iPad.

Gary Neville shows off his massive iPad. Courtesy of thedrum.com.

However, I don’t think the media should be the be all and end all for judging the success or progress of football analytics. Fan discussion of football is increasingly found online in the form of blogs, forums and Twitter, so the media don’t have to be the gatekeepers to analytics content. Saying that, I would love to see more intelligent discussion of football in the media and I feel that analytics is well placed to contribute to that. I’d be interested to hear what it is people in the football analytics community are aiming for in the longer term.

What about the clubs?

The obvious aspect of the analytics community that I’ve omitted from the discussion so far is the role of the clubs in all this. It’s difficult to know what goes on within clubs due to their secrecy. The general impression I get is that there are analytics teams toiling away but without necessarily making an impact in decision making at the club, whether that is in terms of team analysis or in the transfer market. Manchester City are one example of a team using data for such things based on this article.

With this in mind, I was interested to listen to the Sky Sports panel discussion show featuring Chris Anderson, Damien Commoli and Sam Allardyce. Chris co-authored the excellent The Numbers Game book and brought some nuance and genuine insight to the discussion. Commoli is mates with Billy Beane. Allardyce is held up as an acolyte for football analytics at the managerial level in English football and I think this is first time I’ve really heard him speak about it. I wasn’t impressed.

Allardyce clearly takes an interest in the numbers side of the game and reeled off plenty of figures, which on the surface seemed impressive. He seemingly revels in the idea that he is some sort of visionary with his interest in analytics, repeating on several occasions how he has been using data for over ten years. He seemed particularly pleased with the “discovery” of how many clean sheets, goals and other aspects of the game were required to gain a certain number of points in the Premier League; something that many analysts could work out in their lunch hour given the appropriate data.

I would question how this analysis and many of the other nuggets he threw out are actually actionable though; much of this is just stamp-collecting and doesn’t really move things forward in terms of actually identifying what is happening at the process level on the pitch. For example, Commoli’s statistic on a team never losing when having ten or more shots on goal, which is valuable information for those footballers who don’t aim for the goal. Now it could be that they were holding back the good stuff but several of their comments suggested they don’t really understand core analytics concepts such as regression to the mean and the importance of sample size e.g. referring to Aaron Ramsey’s unsustainable early-season scoring run. I would have expected more from people purporting to be leading take up of analytics at club level.

I felt Allardyce’s comment about his “experience” being better than “maths” when discussing the relationship between money and success betrayed his actual regard for the numbers side of football. Many of the numbers he quoted seemed to be used to confirm his own ideas about the game. This is fine but I think to genuinely gain an edge using analytics, you need to see where the data takes you and make it actionable. This is hard and is something that the analytics community could do better (Paul Riley is doing great work on his blog in this area for goalkeepers). The context that analytics provides is very valuable but without identifying “why” certain patterns are observed, it is difficult to alter the process on the field.

Based on the points that Allardyce made, I have my doubts whether the clubs are any further ahead in this regard than what is done publicly by the online analytics community. If there is a place where analytics has stagnated, maybe it is the within the clubs. To my mind, they would do well to look at what is going on in the wider community and try to tap into that more.

——————————————————————————————————————–

Shorter version of this post, courtesy of Edward Monkton.

Where are we Going?

I don’t know, I thought you knew.

No I don’t know. Maybe he knows.

No, He definitely doesn’t know.

*PAUSE*

Maybe no-one knows.

*PAUSE*

Oh Well. I hope it’s nice when we get there.

Is shooting accuracy maintained from season to season?

This is a short follow-up to this post using the same dataset. Instead of shot conversion, we’re now looking at shooting accuracy which is defined as the number of shots on target divided by the total number of shots. The short story here is that shooting accuracy regresses more strongly to the mean than shot conversion at the larger shot samples (more than 70 shots) and is very similar below this.

Comparison between shooting accuracy for players in year zero and the following season (year one). Click on the image or here for a larger interactive version.

Comparison between shooting accuracy for players in year zero and the following season (year one). Click on the image or here for a larger interactive version.

Minimum Shots Players year-to-year r^2 ‘luck’ ‘skill’
1 2301 0.045 79% 21%
10 1865 0.118 66% 34%
20 1428 0.159 60% 40%
30 951 0.214 54% 46%
40 632 0.225 53% 47%
50 456 0.219 53% 47%
60 311 0.190 56% 44%
70 180 0.245 51% 49%
80 117 0.305 45% 55%
90 75 0.341 42% 58%
100 43 0.359 40% 60%

Comparison of the level of ‘skill’ and ‘luck’ attributed to shooting accuracy (measured by shots on target divided by all shots) from one season to the next. The data is filtered by the total number of shots a player takes in consecutive seasons.

Essentially, there is quite a bit of luck involved with getting shots on target and for large-volume shooters, there is more luck involved in getting accurate shots in than in scoring them.

Is scoring ability maintained from season to season? (slight return)

In my previous post (many moons ago), I looked at whether a players’ shot conversion in one season was a good guide to their shot conversion in the next. While there were some interesting features in this, I was wary of being too definitive given the relatively small sample size that was used. Data analysis is a journey with no end, so this is the next step. I collated the last 5 seasons of data across the top divisions in England, Spain, Germany and Italy (I drew the line at collecting France) from ESPN. An example of the data provided is available here for Liverpool in 2012/13. The last 5 seasons on ESPN are Opta provided data and matched up perfectly when I compared with English Premier League data from EPL-Index.

Before digging into the results, a few notes on the data. The data is all shots and all goals i.e. penalties are not removed. Ideally, you would strip out penalty shots and goals but that would require player-level data that I don’t have and I’ve already done enough copy and pasting. I doubt including penalties will change the story too much but it would alter the absolute numbers. Shot conversion here is defined as goals divided by total shots, where total shots includes blocked shots. I then compared shot conversion for individual players in year zero with their shot conversion the following year (year one). The initial filter that I applied here was that the player had to have scored at least one goal in both years (so as to exclude players having 0% shot conversion).

Comparison between shot conversion rates for players in year zero and the following season (year one). Click on the image or here for a larger interactive version.

Starting out with the full dataset, we have 2301 data points where a player scored a goal in two consecutive seasons. The R^2 here (a measure of the strength of the relationship) is very low, with a value of 0.061 (where zero would mean no relationship and one would be perfect). Based on the method outlined here by James Grayson, this suggests that shot conversion regresses 75% towards the mean from one season to the next. The implication of this number is that shot conversion is 25% ‘skill’ and 75% is due to random variation, which is often described as ‘luck’.

As I noted in my previous post on this subject, the attribution to skill and luck is dependent on the number of shots taken. As the number of shots increases, we smooth out some of the randomness and skill begins to emerge. A visualisation of the relationship between shot conversion and total shots is available here. Below is a summary table showing how this evolves in 10 shot increments. After around 30 shots, skill and luck are basically equal and this is maintained up to 60 shots. Above 80 shots, we seem to plateau at a 70/30% split between ‘skill’ and ‘luck’ respectively.

Minimum Shots Players year-to-year r^2 ‘luck’ ‘skill’
1 2301 0.061 75% 25%
10 1865 0.128 64% 36%
20 1428 0.174 58% 42%
30 951 0.234 52% 48%
40 632 0.261 49% 51%
50 456 0.262 49% 51%
60 311 0.261 49% 51%
70 180 0.375 39% 61%
80 117 0.489 30% 70%
90 75 0.472 31% 69%
100 43 0.465 32% 68%

Comparison of the level of ‘skill’ and ‘luck’ attributed to scoring ability (measured by shot conversion) from one season to the next. The data is filtered by the total number of shots a player takes in consecutive seasons.

The results here are different to my previous post, where the equivalence of luck and skill was hit around 70 shots whereas it lies from 30-60 shots here. I suspect this is driven by the smaller sample size in the previous analysis. The song remains the same though; judging a player on around half a season of shots will be about as good as a coin toss. Really you want to assess a heavy shooter over at least a season with the proviso that there is still plenty of room for random variation in their shot conversion.

What is shot conversion anyway?

The past summer in the football analytics community saw a wonderful catalytic cycle of hypothesis, analysis and discussion. It’s been great to see the community feeding off each other; I would have liked to join in more but the academic conference season and the first UK heatwave in 7 years put paid to that. Much of the focus has been on shots and their outcomes. Increasingly the data is becoming more granular; soon we’ll know how many shots per game are taken within 10 yards of the corner flag at a tied game state by players with brown hair and blue eyes while their manager juggles on the sideline (corrected for strength of opposition of course). This increasing granularity is a fascinating and exciting development. While it was already clear that all shots aren’t created equal from purely watching the football, the past summer has quantified this very clearly. To me, this demonstrates that the traditional view of ‘shot conversion’ as a measure of finishing ability is erroneous.

As an illustrative example, consider two players who both take 66 shots in a season. Player A scores 11 goals, so has a shot conversion of 17%. Player B scores 2 goals, so has a shot conversion of 3%. The traditional view of shot conversion would suggest that Player A is a better finisher than Player B. However, if Player A took all of his shots from a central area within the 18-yard box, he would be bang in line with the Premier League average over the past 3 seasons. If Player B took all of his shots from outside the area, he would also be consistent with the average Premier League player. Both players are average when controlling for shot location. Clearly this is an extreme example but then again it is meant to be an illustration. To me at least, shot conversion seems more indicative of shooting efficiency i.e. taking shots from good positions under less defensive pressure will lead to an increased shot conversion percentage. Worth bearing in mind the next time someone mentions ‘best’ or ‘worst’ in combination with shot conversion.

The remaining question for me is how sustainable the more granular data is from season-to-season, especially given the smaller sample sizes.

Is scoring ability maintained from season to season?

With the football season now over across the major European leagues, analysis and discussion turns to reflection of the who, what and why of the past year. With the transfer window soon to do whatever the opposite of slam shut is, thoughts also turn to how such reflections might inform potential transfer acquisitions. As outlined by Gabriele Marcotti today in the Wall Street Journal, strikers are still the centre of attention when it comes to transfers:

The game’s obsession with centerforwards is not new. After all, it’s the glamour role. Little kids generally dream of being the guy banging in the goals, not the one keeping them out.

On the football analytics front, there has been a lot of discussion surrounding the relative merits of various forward players, with an increasing focus on their goal scoring efficiency (or shot conversion rate) and where players are shooting from. There has been a lot of great work produced but a very simple question has been nagging away at me:

Does being ‘good’ one year suggest that you’ll be ‘good’ next year?

We can all point to examples of forwards shining brightly for a short period during which they plunder a large number of goals, only to then fade away as regression to their (much lower) mean skill level ensues. With this in mind, let’s take a look at some data.

Scoring proficiency

I’ve put together data on players over the past two seasons who have scored at least 10 goals during a single season in the top division in either England, Spain, Germany or Italy from WhoScored. Choosing 10 goals is basically arbitrary but I wanted a reasonable number of goals so that calculated conversion rates didn’t oscillate too wildly and 10 seems like a good target for your budding goalscorer. So for example, Gareth Bale is included as he scored 21 in 2012/13 and 9 goals in 2011/12 but Nikica Jelavić isn’t as he didn’t pass 10 league goals in either season. Collecting the data is painful so a line had to be drawn somewhere. I could have based it on shots per game but that is prone to the wild shooting of the likes of Adel Taarabt and you end up with big outliers. If a player was transferred to or from a league within the WhoScored database (so including France), I retained the player for analysis but if they left the ‘Big 5′ then they were booted out.

In the end I ended up with 115 players who had scored at least 10 league goals in one of the past two seasons. Only 43 players managed to score 10 league goals in both 2011/12 and 2012/13, with only 6 players not named Lionel Messi or Cristiano Ronaldo able to score 20 or more in both seasons. Below is how they match up when comparing their shot conversion, where their goals are divided by their total shots, across both seasons. The conversion rates are based on all goals and all shots, ideally you would take out penalties but that takes time to collate and I doubt it will make much difference to the conclusions.

Comparison between shot conversion rates for players in 2011/12 and 2012/13. Click on the image or here for a larger interactive version.

If we look at the whole dataset, we get a very weak relationship between shot conversion in 2013/12 relative to shot conversion in 2011/12. The R^2 here is 0.11, which suggests that shot conversion by an individual player shows 67% regression to the mean from one season to the next. The upshot of this is that shot conversion above or below the mean is around two-thirds due to luck and one-third due to skill. Without filtering the data any further, this would suggest that predicting how a player will convert their chances next season based on the last will be very difficult.

A potential issue here is the sample size for the number of shots taken by an individual in a season. Dimitar Berbatov’s conversion rate of 44% in 2011/12 is for only 16 shots; he’s good but not that good. If we filter for the number of shots, we can take out some of the outliers and hopefully retain a representative sample. Up to 50 shots, we’re still seeing a 65% regression to the mean and we’ve reduced our sample to 72 players. It is only when we get up to 70 shots and down to 44 players that we see a close to even split between ‘luck’ and ‘skill’ (54% regression to the mean). The problem here is that we’re in danger of ‘over-fitting’ as we rapidly reduce our sample size. If you are happy with a sample of 18 players, then you need to see around 90 shots per season to able to attribute 80% of shot conversion to ‘skill’.

Born again

So where does that leave us? Perhaps unsurprisingly, the results here for players are similar to what James Grayson found at the team level, with a 61% regression to the mean from season to season. Mark Taylor found that around 45 shots was where skill overtook luck for assessing goal scoring, so a little lower than what I found above although I suspect this is due to Mark’s work being based on a larger sample over 3 season in the Premier League.

The above also points to the ongoing importance of sample size when judging players, although I’d want to do some more work on this before being too definitive. Judgements on around half a season of shots appears rather unwise and is about as good as flipping a coin. Really you want around a season for a fuller judgement and even then you might be a little wary of spending too much cash. For something approaching a guarantee, you want some heavy shooting across two seasons, which allied with a good conversion rate can bring you over 20 league goals in a season. I guess that is why the likes of Van Persie, Falcao, Lewandowski, Cavani and Ibrahimovic go for such hefty transfer fees.

Newcastle United vs Liverpool: passing network analysis

Liverpool defeated Newcastle 6-0 at St James’ Park. Below is the passing network analysis for Liverpool split between the first 75 minutes of the match and the rest of the match up to full time. I focussed just on Liverpool here. More information on how these are put together is available here in my previous posts on this subject.

The reason I separated the networks into these two periods was that I noticed how Liverpool’s passing rate changed massively after Steven Gerrard was substituted and the fifth goal was scored. During the first 75 minutes, Liverpool attempted 323 passes with a success rate of 74% and a 45% share of possession. After this, Liverpool attempted 163 passes with an accuracy of 96% and a 60% share of possession. Liverpool attempted 34% of their passes in this closing period. Let’s see how this looks in terms of their passing network.

The positions of the players are loosely based on the formations played, although some creative license is employed for clarity. It is important to note that these are fixed positions, which will not always be representative of where a player passed/received the ball. The starting eleven is shown on the pitch for the first 75 minutes, with Borini replacing Gerrard in the second network.

Passing networks for Liverpool for the first and second halfs against Swansea City from the match at Anfield on the 17th February 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. Players with an * next to their name were substituted. Click on the image for a larger view.

Passing networks for Liverpool for the first 75 minutes and up to full time against Newcastle United from the match at St James’ Park on the 27th April 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. Click on the image for a larger view.

Liverpool’s passing was quite balanced for the first 75 minutes of the match, with a varied passing distribution. There was a stronger bias towards the right flank compared with the left flank as Gerrard drifted right to combine with Johnson and Downing. The passing influence scores were also evenly distributed across the whole team with Gerrard and Lucas being the top two. A contrast with some previous matches is the lack of strong links along the back line, which indicates less reliance on recycling of possession in deeper areas. Instead, Liverpool were seeking to move the ball forward more quickly and played the ball through the whole team.

He makes us happy

After Gerrard and Lucas, the next most influential player was Coutinho, who put in a wonderfully creative performance as the attacking fulcrum of the team. He linked well with all of Liverpool’s forward players and threaded several dangerous passes to his team-mates including an assist and a ‘second goal assist’ (defined as a pass to the goal assist creator) for the second goal according to EPL-Index. His creative exploits thus far have been hugely promising during his first 10 appearances.

Sterile domination

The final period of the match saw Liverpool really rack up the passing numbers as mentioned earlier. Clearly, this is easier to do when 5 or 6 goals clear but it is still potentially illustrative to see how this was accomplished. The main orchestrator’s of this were Lucas and Henderson who were 28/28 and 35/35 for passes attempted/completed during this period. Henderson was 21/24 from the first 75 minutes, so this was quite a rapid increase with his shift in role after Gerrard went off and the state of the game.

Your challenge should you wish to accept it

Admittedly Newcastle were very poor in this match but Liverpool took advantage to enact a severe thrashing. This was accomplished without Suárez, which leads to obvious (premature?) questions about whether his absence improved Liverpool’s overall balance and play. Assuming that Suárez doesn’t leave in the summer, one of Bredan Rodgers’ key tasks will be developing a system that gets the best out of the attacking talents of Suárez, Coutinho and Sturridge. It could be quite tasty if he manages to accomplish this.

Borussia Dortmund vs Real Madrid: passing network analysis

Borussia Dortmund defeated Real Madrid 4-1.

Below is the passing network for the match. The positions of the players are loosely based on the formations played by the two teams, although some creative license is employed for clarity. It is important to note that these are fixed positions, which will not always be representative of where a player passed/received the ball. Only the starting eleven is shown as the substitutes had little impact in a passing sense.

Passing network for Bayern Munich and Barcelona from the Champions League match at the Allianz Arena on the 23rd April 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. The size and colour of the markers is relative to the players on their own team i.e. they are on different scales for each team. Only the starting eleven is shown. Click on the image for a larger view.

Passing networks for Borussia Dortmund and Real Madrid from the Champions League match at the Westfalenstadion on the 24th April 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. The size and colour of the markers is relative to the players on their own team i.e. they are on different scales for each team. Only the starting eleven is shown. Click on the image for a larger view.

The most striking difference between the sides respective passing networks was that Real had a greater emphasis down the flanks, with strong links between the wide players and their full backs. Dortmund were quite balanced in their passing approach with much of their play going through the trio of Hummels, Gundogan and Gotze.

Influential potential

Dortmund’s number ‘ten’ (Gotze) had a greater influence on proceedings than Modric did for Real, with Gotze coming second only to Gundogan in terms of passing influence for Dortmund. Ozil was far more influential than Modric, although he rarely combined with Higuain and Ronaldo. Modric was well down the pecking order for Madrid with the likes of Pepe, Varane and Coentrao ahead of him. On its own, this might not have been a problem but aside from Ramos and Lopez, the only other Real players with less influence were Higuain and Ronaldo. This contrasts directly with Dortmund, where Reus and Lewandowski played an important linking roles.

In summary, Dortmund’s attacking players were among their most influential passing performers; Real Madrid’s were not.

——————————————————————————————————————–

Passing matrices from Uefa.com press kits.

Bayern Munich vs Barcelona: passing network analysis

Bayern Munich defeated Barcelona 4-0 with a dominant performance. The way both teams approached the game in terms of their passing was interesting and worth some additional analysis.

Much of the post-match discussion on TV focussed on Barcelona’s dominance of possession not being reflected in the final scoreline. According to UEFA, Barcelona had 63%, while WhoScored/Opta had it at 66%. However, Bayern were well ahead in terms of shots (15-4 in favour of Bayern, with a 7-1 advantage for on-target shots). It seems that whenever Barcelona lose, their possession statistics are trotted out as a stick to beat them with. Given that Barcelona have gone more than 300 games and close to half a decade since they last played a game with less than 50% possession, I very much doubt there is causality between their possession statistics and match results. Barcelona choose to play this way and it has certainly been successful. However, it is worth remembering that not all teams play the same way and the assumption that there is a single holy grail metric that can ‘explain’ winning football matches is probably a fool’s errand. Even if one does exist, it isn’t a match aggregated possession statistic.

Process, not outcome

In terms of passing, I’ve tried to look more at the process using network analysis to establish how teams pass the ball and which players are the most influential in passing terms in a given match, rather than focussing on a single statistic. Below is the passing network for the match. The positions of the players are loosely based on the formations played by the two teams, although some creative license is employed for clarity. It is important to note that these are fixed positions, which will not always be representative of where a player passed/received the ball. Only the starting eleven is shown as the substitutes had little impact in a passing sense.

Passing network for Bayern Munich and Barcelona from the Champions League match at the Allianz Arena on the 23rd April 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. The size and colour of the markers is relative to the players on their own team i.e. they are on different scales for each team. Only the starting eleven is shown. Click on the image for a larger view.

As might be expected, the contrast between the two teams is quite clear. Bayern focussed their passing down the flanks, with Ribery and Robben combining well with their respective full-backs. Neuer, Dante and Boateng fed the full-backs well to begin these passing transitions. Barcelona on the other hand engaged in their familiar multitude of passing triangles, although with a bias towards their right flank. There are a number of strong links although the somewhat uninspiring Bartra-Pique link was the strongest (23 passes).

Sterile domination

The issue for Barcelona was that their possession was largely in deeper areas, away from Bayern’s penalty area. This was neatly summed up by this tweet (including a graphic) by Albert Larcada:

While Barcelona’s passing network showed plenty of combinations in deeper areas, their more attacking players combined much less, with the links between Alexis, Messi and Pedro being relatively weak. In particular, the passes to Messi were low in number as he received just 7 passes combined from Iniesta (3), Pedro (2) and Alexis (2). Messi had much stronger links with Xavi (received 20 passes) and Alves (received 19 passes) although I suspect many of these were in deeper areasWhile, Barcelona’s midfield three exerted their usual influence, the next most influential players were Pique and Bartra. This is a stark comparison with the home match against AC Milan, where Messi was the most influential player after the midfield trio.

Bayern did a great job of limiting Messi’s influence, although his injury likely contributed also.

Avoid the puddle

Schweinsteiger was the most influential player for Bayern, linking well with Dante, Alaba and Ribery. After the centre-backs, Bayern’s next most influential players were Robben and Ribery who counter-attacked superbly, with excellent support from their full-backs. As discussed by Zonal Marking, Bayern preyed on Barcelona’s weakness on the counter-attack with speedy breaks down the flanks.

Bayern were incredibly effective and deservedly won the match and very likely the tie.

——————————————————————————————————————–

Passing matrices from Uefa.com press kits.