Germany vs Portugal: passing network analysis

Germany faced Portugal in their opening Group G match, with Germany winning 4-0 and Pepe being an idiot (surprise, surprise). Faced with the decision on which diminutive gifted midfielder to leave out of the starting eleven, Jogi Löw just went ahead and picked all of them. Furthermore, Germany’s best fullback, Phillip Lahm played centre midfield. Ronaldo was fit enough to start for Portugal.

Below are the passing networks for both Germany (left) and Portugal (right) based on data from Fifa.com. More information on how these are put together is available here in my previous posts on this subject. For Germany, I’ve not included the substitutes as they contributed little in this aspect. For Portugal, I included Eder who came on for the injured Hugo Almeida after 28 minutes.

Passing networks for the World Cup Group G match between Germany and Portugal at the Arena Fonte Nova, Salvador on the 16th June 2014. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their involvement. Click on the image for a larger view.

Bear in mind that the passing networks above are likely skewed by game state effects, with Germany leading and playing 11 vs 10 for a large proportion of the match.

Germany

Germany lined up with something like a 4-1-5-0 formation in the first half, with their full backs being relatively unadventurous, Phillip Lahm playing ahead of the centre backs with Sami Khedira running from deep and often beyond his attacking compatriots. Khedira was less aggressive in the second half with Germany three goals ahead and with a numerical advantage. In the graphic above, I’ve got them lined up in a 4-2-4ish formation based on a mixture of their average positions and making the plot look pretty. In reality, the side was very compact with the central defenders playing a high line and the attackers dropping off continually.

Lahm and Khedira provided a controlling influence for Germany, forming the link between the defence and attack. Höwedes and Boateng were also well involved in build-up play, although they had limited involvement in terms of direct creativity, with just one cross and no key passes between them.

The attacking quartet were all about fluid movement and passing links, as can be seen in the passing network above. Kroos was similarly influential to Lahm/Khedira but with a slightly higher position up the pitch. Özil and Götze were also heavily involved, while Müller was the least involved (unsurprisingly). The relative balance between the German play-makers meant that their attacks were not simply funnelled through one individual, which led to some lovely passing inter-changes and several high-quality shooting opportunities.

Portugal

Portugal’s passing network was dominated by their central midfielders but they struggled to involve their attacking players in dangerous areas. Ronaldo in particular saw relatively little involvement and the passes he did receive were often well away from the danger-zone. The one Portuguese attacker who was well-involved was Nani; unfortunately for Portugal, he put in a fairly terrible performance. Despite his involvement, Nani created no shooting opportunities for his team mates and put in a total of six crosses with none finding a fellow Portuguese. He did have three shots, with one on target. Sometimes a relatively high passing influence is a bad thing if the recipient wastes their involvement.

Portugal did look dangerous on the counter-attack prior to Pepe’s sending off but failed to really create a clear chance from these opportunities. Overall, Portugal’s passing network was too heavily weighted away from their (potentially) dangerous attacking players and when they did get the ball, they didn’t do enough with it.

Moving forward

Germany were impressive, although this was likely facilitated by Pepe’s indiscretion and the game being essentially over at half-time. The game conditions were certainly in their favour but they capitalised fully. If they can keep their gifted band of play-makers weaving their magic, then they will do well. They’ll need Müller to keep finishing their passing moves, while Mario Götze found himself in several promising shooting situations which may well yield goals on future occasions.

Conversely, Portugal were hampered by the match situation although they looked worryingly dependent on Ronaldo in attack, as noted by the imperious Michael Cox in his recap of day five. Furthermore, the USA likely won’t give them as much space to attack as Germany did. They’ll need to improve the passing links to their dangerous attackers if they are to have much joy at this tournament.

Win, lose or draw

The dynamics of a football match are often dictated by the scoreline and teams will often try to influence this via their approach; a fast start in search of an early goal, keeping it tight with an eye on counter-attacking or digging a moat around the penalty area.

With this in mind, I’m going to examine the repeatability of the amount of time a team spends winning, losing and drawing from year to year. I’m basically copying the approach of James Grayson here who has looked at the repeatability of several statistical metrics. This is meant to be a broad first look; there are lots of potential avenues for further study here.

I’ve collected data from football-lineups.com (tip of the hat to Andrew Beasley for alerting me to the data via his blog) for the past 15 English Premier League seasons and then compared each teams performance from one season (year zero) to the next (year one). Promoted or relegated teams are excluded as they don’t spend two consecutive seasons in the top flight.

Losers

Below is a plot showing how the time spent losing varies in consecutive seasons. Broadly speaking, there is a reasonable correlation from one season to the next but with a degree of variation also (R^2=0.41). The data suggests that 64% of time spent winning is repeatable, leaving 36% in terms of variation from one season to the next. This variation could result due to many factors such as pure randomness/luck, systemic or tactical influences, injury, managerial and/or player changes etc.

Blah.

Relationship between time spent losing per game from one season to the next.

As might be expected, title winning teams and relegated sides tend towards the extreme ends in terms of time spent losing. Generally, teams at these extreme ends in terms of success over and under perform respectively compared to the previous season.

Winners

Below is the equivalent plot for time spent winning. Again there is a reasonable correlation from one season to the next, with the relationship for time spent winning (R^2=0.47) being stronger than for time spent losing. The data suggests that 67% of time spent winning is repeatable, leaving 33% in terms of variation from one season to the next.

Blah.

Relationship between time spent winning per game from one season to the next.

As might be expected, title winning teams spend a lot of time winning. The opposite is true for relegated teams. Title winners generally improve their time spent winning compared to the previous season. Interestingly, they often then see a drop off in the following season.

Manchester City and Liverpool really stick out here in terms of their improvement relative to 2012/13. Liverpool spent 19 minutes more per game in a winning position in 2013/14 than they did the previous season; I have this as the second biggest improvement in the past 15 seasons. They were narrowly pipped into second place (sounds familiar) by Manchester City this season, who improved by close to 22 minutes. They spent 51 and 48 minutes in a winning position per game respectively. They occupy the top two slots for time spent winning in the past 15 seasons.

According to football-lineups.com, Manchester City and Liverpool scored their first goals of the match in the 26th and 27th minutes respectively. Chelsea were the next closest in the 38th minute. They were also in the top four for how late they conceded their first goal on average, with Liverpool conceding in the 55th minute and City in the 57th. Add in their ability to rack up the goals when leading and you have a recipe for spending a lot of time winning.

Illustrators

The final plot below is for time spent drawing. Football-lineups doesn’t report the figures for drawing directly so I just estimated it by subtracting the winning and losing figures from 90. There will be some error here as this doesn’t account for injury time but I doubt it would hugely alter the general picture. The relationship here from season to season is almost non-existent (R^2=0.013), which implies that time spent drawing regresses to the mean by 89% from season to season.

Blah.

Relationship between time spent drawing per game from one season to the next.

Teams seemingly have limited control on the amount of time they spend drawing. I suspect this is a combination of team quality and incentives. Good teams have a reasonable control on the amount of time they spend winning and losing (as seen above) and it is in their interests to push for a win. Bad teams will face a (literally) losing battle against better teams in general, leading to them spending a lot of time losing (and not winning). It should be noted that teams do spend a large proportion of their time drawing though (obviously this is the default setting for a football match given the scoreline starts at 0-0), so it is an important period.

We can also see the shift in Liverpool and Manchester City’s numbers; they replaced fairly average numbers for time spent drawing in 2012/13 with much lower numbers in 2013/14. Liverpool’s time spent drawing figure of 29.8 minutes this season was the lowest value in the past 15 seasons according to this data!

Baked

There we have it then. In broad terms, time spent winning and losing exhibit a reasonable degree of repeatability but with significant variation superimposed. In particular, it seems that title winners require a boost in their time spent winning and a drop in their time spent losing to claim their prize. Perhaps unsurprisingly, things have to go right for you to win the title.

As far as this season goes, Manchester City and Liverpool both improved their time spent winning dramatically. If history is anything to go by, both will likely regress next season and not have the scoreboard so heavily stacked in their favour. It will be interesting to see how they adapt to such potential challenges next year.

Luis Suárez: Home & away

Everyone’s favourite riddle wrapped in an enigma was a topic of Twitter conversation between various analysts yesterday. The matter at hand was Luis Suárez’s improved goal conversion this season compared to his previous endeavours. Suárez has previously been labelled as inefficient by members of the analytics community (not the worst thing he has been called mind), so explaining his upturn is an important puzzle.

In the 2012/13 season, Suárez scored 23 goals from 187 shots, giving him a 12.3% conversion rate. So far this season he has scored 25 goals from 132 shots, which works out at 18.9%.

What has driven this increased conversion?

Red Alert

Below I’ve broken down Suárez’s goal conversion exploits into matches played home and away over the past two seasons. In terms of sample sizes, in 2012/13 he took 98 shots at home and 89 shots away, while he has taken 69 and 63 respectively this season.

Season Home Away Overall
2012/13 11.2% 13.5% 12.3%
2013/14 23.2% 14.3% 18.9%

The obvious conclusion is that Suárez’s improved goal scoring rate has largely been driven by an increased conversion percentage at home. His improvement away is minor, coming in at 0.8% but his home improvement is a huge 12%.

What could be driving this upturn?

Total Annihilation

Liverpool’s home goal scoring record this season has seen them average 3 goals per game compared to 1.7 last season. Liverpool have handed out several thrashings at home this season, scoring 3 or more goals in nine of their fourteen matches. Their away goal scoring has improved from 2 goals per game to 2.27 per game for comparison.

Liverpool have been annihilating their opponents at home this season and I suspect Suárez is reaping some of the benefit of this with his improved goal scoring rate. Liverpool have typically gone ahead early in their matches at home this season but aside from their initial Suárez-less matches, that hasn’t generally seen them ease off in terms of their attacking play (they lead the league in shots per game at home with 20.7).

My working theory is that Suárez has benefited from such situations by taking his shots under less pressure and/or better locations when Liverpool have been leading at home. I would love to hear from those who collect more detailed shot data on this.

Drilling down into some more shooting metrics at home adds some support to this. Suárez has seen a greater percentage of his shots hit the target at home this season compared with last (46.4% vs 35.7%). He has also seen a smaller percentage being blocked this season (13% vs 24.5%). Half of Suárez’s shots on target at home this season have resulted in a goal compared to 31.4% last season. Away from home, the comparison between this season and last is much closer.

These numbers are consistent with Suárez taking his shots at home this season in better circumstances. I should stress that there is a degree of circularity here as Suárez’s goal scoring is not independent of Liverpool’s. Further analysis is required.

Starcraft

The above is an attempt to explain Suárez’s improved goal scoring form. I doubt it is the whole story but it hopefully provides some clues ahead of more detailed analysis. Suárez may well have also benefited from a hot-streak this season and the big question will be whether he can sustain his goal scoring form over the remainder of this season and into next.

As I’ve shown previously, there is a large amount of variability in player shot conversion from season to season. Some of this will be due to ‘luck’ or randomness but some of this could be due to specific circumstances such as those potentially aiding Suárez this season. Explaining the various factors involved in goal scoring is a tricky puzzle indeed.

——————————————————————————————————————–

All data in this post are from Squawka and WhoScored.

You’ll never win anything with crosses

You’ll probably have heard about Manchester United’s penchant for crossing in their match against Fulham yesterday. If you haven’t, all 81 of them are illustrated below in their full chalkboard glory.

Manchester United's crosses in the Premier League match against Fulham on the 9th February 2014.

Manchester United’s crosses in the Premier League match against Fulham on the 9th February 2014. All 81 of them. Image via Squawka.

Rather than focus on the tactical predictability of such a strategy, I’m going to take a look at whether it can be a successful one over the long term.

In the public work on attacking strategies, the analytics community isn’t quite at the stage where the merits of individual strategies has been quantified. The work so far suggests that crossing is probably on the lower end though in terms of effectiveness. Ted Knutson did a nice summary of the work in this area here.

Can crossing bring success?

Given this, I’m going to assess crosses from a different angle. Over the past five seasons, the Premier League Champions have averaged 2.3 goals per game. The fourth placed team has averaged 1.8 goals per game. This suggests that a top team needs an attacking strategy that can yield around two goals per game. Let’s see if crossing can get you there.

I’m going to focus on open play crosses as I feel that is more relevant from a tactical perspective; set piece crosses are a different (more effective) matter. Based on data from the 2011/12 Premier League season, I found that on average it took 79 crosses in open play for a single goal to be scored. On average, teams had 22 open play crosses per game. So an average Premier League team would expect to score a goal from an open play cross every three-to-four games. I only have data for one season, so let’s be generous and round that down to a goal every three matches. That is a long way off two goals per game.

Let’s consider an example of a team that both crosses more than average and converts those crosses into goals more efficiently e.g. Manchester United in 2011/12. They averaged 22 open-play crosses per game and scored 19 goals, which works out at 43.5 crosses per goal. So even a really good crossing team in terms of their goal return could only manage a goal from an open play cross every two games. The caveat to this last point also is that I don’t have the data to look at whether that is a sustainable level of goal production from crosses.

Based on the above, I would say it is basically impossible to be an elite team and use crossing as your main strategy. If you were good at set pieces, you could probably add another 20 or so goals over a season but that still only puts you at a goal per game average.

That isn’t to say that crossing is pointless – as a part of a varied attacking approach and against an opponent who isn’t dug in and ready for them, they can be an effective source of goals (see the video of Dani Alves assists below and Luis Suarez’s sublime assists in the past two games).

This is where the problem occurs for Moyes. According to WhoScored, in the last three seasons under Ferguson, Manchester United averaged 27, 25 and 27 crosses per game while posting 6, 3 and 3 through-balls per game. The crossing figure is up to 29 per game now with through-balls down to a paltry one per game. Crossing is not a new thing at Manchester United but more of their play under Moyes is focussed down the flanks; around 30% of their attacks under Ferguson in his last three years came down the middle of the pitch. Under Moyes, that has dropped to 24%, which is the lowest proportion in the league. This was wonderfully illustrated in this piece by Mike Goodman for Grantland earlier this season.

Moyes’ tactics have seemingly reduced the effectiveness of Manchester United’s previous elite attacking levels, which matches up with the successful lowering of expectations of the current champions prospects.

Scoring ability: the good, the bad and the Messi

Identifying scoring talent is one of the main areas of investigation in analytics circles, with the information provided potentially helping to inform decisions that can cost many, many millions. Players who can consistently put the ball in the net cost a premium; can we separate these players from the their peers?

I’m using data from the 2008/09 to 2012/13 seasons across the top divisions in England, Spain, Germany and Italy from ESPN. An example of the data provided is available here for Liverpool in 2012/13. This gives me total shots (including blocked shots) and goals for over 8000 individual player seasons. I’ve also taken out penalties from the shot and goal totals using data from TransferMarkt. This should give us a good baseline for what looks good, bad and extraordinary in terms of scoring talent. Clearly this ignores the now substantial work being done in relation to shot location and different types of shot but the upside here is that the sample size (number of shots) is larger.

Below is a graph of shot conversion (defined as goals divided by total shots) against total shots. All of the metrics I’ll use will have penalties removed from the sample. The average conversion rate across the whole sample is 9.2%. Using this average, we can calculate the bounds of what average looks like in terms of shot conversion; we would expect some level of random variation around the average and for this variation to be larger for players who’ve taken fewer shots.

Shot conversion versus total shots for individual players in the top leagues in England, Italy, Spain and Germany from 2008/09-2012/13. Points are shown in grey with certain players highlighted, with the colours corresponding to the season. The solid black line is the average conversion rate of 9.2%, with the dotted lines above and below this line corresponding to two standard errors above the average. The dashed line corresponds to five standard errors. Click on the image for a larger view.

On the plot I’ve also added some lines to illustrate this. The solid black line is the average shot conversion rate, while the two dotted lines either side of it represent upper and lower confidence limits calculated as being two standard errors from the mean. These are known as funnel plots and as far as I’m aware, they were introduced to football analysis by James Grayson in his work on penaltiesPaul Riley has also used them when looking at shot conversion from different areas of the pitch. There is a third dotted line but I’ll talk about that later.

So what does this tell us? Well we would expect approximately 95% of the points to fall within this envelope around the average conversion rate; the actual number of points is 97%. From a statistical point of view, we can’t identify whether these players are anything other than average at shot conversion. Some players fall below the lower bound, which suggests that they are below average at converting their shots into goals. On the other hand, those players falling above the upper bound, are potentially above average.

The Bad

I’m not sure if this is surprising or not, but it is actually quite hard to identify players who fall below the lower bound and qualify as “bad”. A player needs to take about 40 shots without scoring to fall beneath the lower bound, so I suspect “bad” shooters don’t get the opportunity to approach statistical significance. Some do though.

Only 62 player seasons fall below the lower bound, with Alessandro Diamanti, Antonio Candreva, Gökhan Inler and (drum-roll) Stewart Downing having the dubious record of appearing twice. Downing actually holds the record in my data for the most shots (80) without scoring in 2008/09, with his 2011/12 season coming in second with 71 shots without scoring.

The Good

Over a single season of shots, it is somewhat easier to identify “good” players in the sample, with 219 players lying above the two standard error curve. Some of these players are highlighted in the graph above and rather than list all of them, I’ll focus on players that have managed to consistently finish their shooting opportunities at an above average rate.

Only two players appear in each of the five seasons of this sample; Gonzalo Higuaín and Lionel Messi. Higuaín has scored an impressive 94 goals with a shot conversion rate of 25.4% over that sample. I’ll leave Messi’s numbers until a little later. Four players appear on four separate occasions; Álvaro Negredo, Stefan Kießling, Alberto Gilardino and Giampaolo Pazzini. Negredo is interesting here as while his 15.1% conversion rate over multiple seasons isn’t as exceptional as some other players, he has done this over a sustained period while taking a decent volume of shots each season (note his current conversion rate at Manchester City is 16.1%).

Eighteen players have appeared on this list three times; notable names include van Persie, Di Natale, Cavani, Agüero, Gómez, Soldado, Benzema, Raúl, Fletcher, Hernández and Agbonlahor (wasn’t expecting that last one). I would say that most of the players mentioned here are more penalty box strikers, which suggests they take more of their shots from closer to the goal, where conversion rates are higher. It would be interesting to cross-check these with analysts who are tracking player shot locations.

The Messi

To some extent, looking at players that lie two standard errors above or below the average shot conversion rate is somewhat arbitrary. The number of standard errors you use to judge a particular property typically depends on your application and how “sure” you want to be that the signal you are observing is “real” rather than due to “chance”. For instance, when scientists at CERN were attempting to establish the existence of the Higgs boson, they used a very stringent requirement that the observed signal is five standard errors above the typical baseline of their instruments; they want to be really sure that they’ve established the existence of a new particle. The tolerance here is that there be much less than a one in a million chance that any observed signal be the result of a statistical fluctuation.

As far as shot conversion is concerned, over the two seasons prior to this, Lional Messi is the Higgs boson of football. While other players have had shot conversion rates above this five-standard error level, Messi has done this while taking huge shot volumes. This sets him apart from his peers. Over the five seasons prior to this, Messi took 764 shots, from which an average player would be expected to score between 54 and 86 goals based on a player falling within two standard errors of the average; Messi has scored 162! Turns out Messi is good at the football…who knew?

Stats! What are they good for?

I’ve been closely following the developments in the football analytics community for close to two years now, ever since WhoScored allied themselves with the Daily Mail and suggested Xavi wasn’t so good at the football and I was directed to James Grayson’s wonderful riposte.

There has been some discussion on Twitter about the state of football analytics recently and I thought I would commit some extended thoughts on this topic to writing.

Has football analytics stalled?

Part of the Soccermetrics podcast, featuring Howard Hamilton and Zach Slaton, revolved around how football analytics as an activity has “stalled” (Howard has since attributed the “stalled” statement to Chris Anderson, although he seemingly agrees with it). Even though this wasn’t really defined, I find it difficult to comprehend the view that analytics has stalled.

Over the past two years, the community has developed a lot as far as I can see. James Grayson and Mark Taylor continue to regularly publish smart work, while new bloggers have emerged also. The StatsBomb website has brought together a great collection of analysts and thinkers on the game and they appear to be gaining traction outside of the analytics echo chamber.

In addition to this, data is increasingly finding a place in the mainstream media; Zach Slaton writes at Forbes, Sean Ingle is regularly putting numbers into his Guardian columns and there is a collection of writers contributing to the Dutch newspaper De Volkskrant. Mike Goodman is doing some fantastic work at Grantland; his piece on Manchester United this season is an all too rare example of genuine insight in the wider football media. The Numbers Game book by Chris Anderson and David Sally was also very well received.

Allied to these writing developments, a number of analytics bloggers have joined professional clubs or data organisations recently – surely it is encouraging to see smart people being brought into these environments? (One side effect of this is that some great work is lost from the public sphere though e.g. the StatDNA blog).

To me, this all seems like progress on a number of fronts.

What are we trying to achieve?

The thing that isn’t clear to me is what people in the analytics community are actually aiming for. Some are showcasing their work with the aim of getting a job in the football industry, some are hoping to make some money, while others are doing it as a hobby (*waves hand*). Whatever the motivation, the work coming out of the community is providing insights, context and discussion points and there is an audience for it even if it is considered quite niche.

Football analytics is still in its infancy and expecting widespread acceptance in the wider football community at this stage is perhaps overly ambitious. However, strides are being made; tv coverage has started looking more at shot counts over a season and heat maps of touches have made a few appearances. These are small steps undoubtedly but I doubt there is much demand for scatter plots, linear regression and statistical significance tests from tv producers. Simple and accessible tables or metrics that can be overlaid on an image of a football pitch seem to go down well with a broader audience – the great work being done on shot locations seems ripe for this as it is accessible and intuitive without resorting to complex statistical language.

Gary Neville shows off his massive iPad.

Gary Neville shows off his massive iPad. Courtesy of thedrum.com.

However, I don’t think the media should be the be all and end all for judging the success or progress of football analytics. Fan discussion of football is increasingly found online in the form of blogs, forums and Twitter, so the media don’t have to be the gatekeepers to analytics content. Saying that, I would love to see more intelligent discussion of football in the media and I feel that analytics is well placed to contribute to that. I’d be interested to hear what it is people in the football analytics community are aiming for in the longer term.

What about the clubs?

The obvious aspect of the analytics community that I’ve omitted from the discussion so far is the role of the clubs in all this. It’s difficult to know what goes on within clubs due to their secrecy. The general impression I get is that there are analytics teams toiling away but without necessarily making an impact in decision making at the club, whether that is in terms of team analysis or in the transfer market. Manchester City are one example of a team using data for such things based on this article.

With this in mind, I was interested to listen to the Sky Sports panel discussion show featuring Chris Anderson, Damien Commoli and Sam Allardyce. Chris co-authored the excellent The Numbers Game book and brought some nuance and genuine insight to the discussion. Commoli is mates with Billy Beane. Allardyce is held up as an acolyte for football analytics at the managerial level in English football and I think this is first time I’ve really heard him speak about it. I wasn’t impressed.

Allardyce clearly takes an interest in the numbers side of the game and reeled off plenty of figures, which on the surface seemed impressive. He seemingly revels in the idea that he is some sort of visionary with his interest in analytics, repeating on several occasions how he has been using data for over ten years. He seemed particularly pleased with the “discovery” of how many clean sheets, goals and other aspects of the game were required to gain a certain number of points in the Premier League; something that many analysts could work out in their lunch hour given the appropriate data.

I would question how this analysis and many of the other nuggets he threw out are actually actionable though; much of this is just stamp-collecting and doesn’t really move things forward in terms of actually identifying what is happening at the process level on the pitch. For example, Commoli’s statistic on a team never losing when having ten or more shots on goal, which is valuable information for those footballers who don’t aim for the goal. Now it could be that they were holding back the good stuff but several of their comments suggested they don’t really understand core analytics concepts such as regression to the mean and the importance of sample size e.g. referring to Aaron Ramsey’s unsustainable early-season scoring run. I would have expected more from people purporting to be leading take up of analytics at club level.

I felt Allardyce’s comment about his “experience” being better than “maths” when discussing the relationship between money and success betrayed his actual regard for the numbers side of football. Many of the numbers he quoted seemed to be used to confirm his own ideas about the game. This is fine but I think to genuinely gain an edge using analytics, you need to see where the data takes you and make it actionable. This is hard and is something that the analytics community could do better (Paul Riley is doing great work on his blog in this area for goalkeepers). The context that analytics provides is very valuable but without identifying “why” certain patterns are observed, it is difficult to alter the process on the field.

Based on the points that Allardyce made, I have my doubts whether the clubs are any further ahead in this regard than what is done publicly by the online analytics community. If there is a place where analytics has stagnated, maybe it is the within the clubs. To my mind, they would do well to look at what is going on in the wider community and try to tap into that more.

——————————————————————————————————————–

Shorter version of this post, courtesy of Edward Monkton.

Where are we Going?

I don’t know, I thought you knew.

No I don’t know. Maybe he knows.

No, He definitely doesn’t know.

*PAUSE*

Maybe no-one knows.

*PAUSE*

Oh Well. I hope it’s nice when we get there.

Is shooting accuracy maintained from season to season?

This is a short follow-up to this post using the same dataset. Instead of shot conversion, we’re now looking at shooting accuracy which is defined as the number of shots on target divided by the total number of shots. The short story here is that shooting accuracy regresses more strongly to the mean than shot conversion at the larger shot samples (more than 70 shots) and is very similar below this.

Comparison between shooting accuracy for players in year zero and the following season (year one). Click on the image or here for a larger interactive version.

Comparison between shooting accuracy for players in year zero and the following season (year one). Click on the image or here for a larger interactive version.

Minimum Shots Players year-to-year r^2 ‘luck’ ‘skill’
1 2301 0.045 79% 21%
10 1865 0.118 66% 34%
20 1428 0.159 60% 40%
30 951 0.214 54% 46%
40 632 0.225 53% 47%
50 456 0.219 53% 47%
60 311 0.190 56% 44%
70 180 0.245 51% 49%
80 117 0.305 45% 55%
90 75 0.341 42% 58%
100 43 0.359 40% 60%

Comparison of the level of ‘skill’ and ‘luck’ attributed to shooting accuracy (measured by shots on target divided by all shots) from one season to the next. The data is filtered by the total number of shots a player takes in consecutive seasons.

Essentially, there is quite a bit of luck involved with getting shots on target and for large-volume shooters, there is more luck involved in getting accurate shots in than in scoring them.

Is scoring ability maintained from season to season? (slight return)

In my previous post (many moons ago), I looked at whether a players’ shot conversion in one season was a good guide to their shot conversion in the next. While there were some interesting features in this, I was wary of being too definitive given the relatively small sample size that was used. Data analysis is a journey with no end, so this is the next step. I collated the last 5 seasons of data across the top divisions in England, Spain, Germany and Italy (I drew the line at collecting France) from ESPN. An example of the data provided is available here for Liverpool in 2012/13. The last 5 seasons on ESPN are Opta provided data and matched up perfectly when I compared with English Premier League data from EPL-Index.

Before digging into the results, a few notes on the data. The data is all shots and all goals i.e. penalties are not removed. Ideally, you would strip out penalty shots and goals but that would require player-level data that I don’t have and I’ve already done enough copy and pasting. I doubt including penalties will change the story too much but it would alter the absolute numbers. Shot conversion here is defined as goals divided by total shots, where total shots includes blocked shots. I then compared shot conversion for individual players in year zero with their shot conversion the following year (year one). The initial filter that I applied here was that the player had to have scored at least one goal in both years (so as to exclude players having 0% shot conversion).

Comparison between shot conversion rates for players in year zero and the following season (year one). Click on the image or here for a larger interactive version.

Starting out with the full dataset, we have 2301 data points where a player scored a goal in two consecutive seasons. The R^2 here (a measure of the strength of the relationship) is very low, with a value of 0.061 (where zero would mean no relationship and one would be perfect). Based on the method outlined here by James Grayson, this suggests that shot conversion regresses 75% towards the mean from one season to the next. The implication of this number is that shot conversion is 25% ‘skill’ and 75% is due to random variation, which is often described as ‘luck’.

As I noted in my previous post on this subject, the attribution to skill and luck is dependent on the number of shots taken. As the number of shots increases, we smooth out some of the randomness and skill begins to emerge. A visualisation of the relationship between shot conversion and total shots is available here. Below is a summary table showing how this evolves in 10 shot increments. After around 30 shots, skill and luck are basically equal and this is maintained up to 60 shots. Above 80 shots, we seem to plateau at a 70/30% split between ‘skill’ and ‘luck’ respectively.

Minimum Shots Players year-to-year r^2 ‘luck’ ‘skill’
1 2301 0.061 75% 25%
10 1865 0.128 64% 36%
20 1428 0.174 58% 42%
30 951 0.234 52% 48%
40 632 0.261 49% 51%
50 456 0.262 49% 51%
60 311 0.261 49% 51%
70 180 0.375 39% 61%
80 117 0.489 30% 70%
90 75 0.472 31% 69%
100 43 0.465 32% 68%

Comparison of the level of ‘skill’ and ‘luck’ attributed to scoring ability (measured by shot conversion) from one season to the next. The data is filtered by the total number of shots a player takes in consecutive seasons.

The results here are different to my previous post, where the equivalence of luck and skill was hit around 70 shots whereas it lies from 30-60 shots here. I suspect this is driven by the smaller sample size in the previous analysis. The song remains the same though; judging a player on around half a season of shots will be about as good as a coin toss. Really you want to assess a heavy shooter over at least a season with the proviso that there is still plenty of room for random variation in their shot conversion.

What is shot conversion anyway?

The past summer in the football analytics community saw a wonderful catalytic cycle of hypothesis, analysis and discussion. It’s been great to see the community feeding off each other; I would have liked to join in more but the academic conference season and the first UK heatwave in 7 years put paid to that. Much of the focus has been on shots and their outcomes. Increasingly the data is becoming more granular; soon we’ll know how many shots per game are taken within 10 yards of the corner flag at a tied game state by players with brown hair and blue eyes while their manager juggles on the sideline (corrected for strength of opposition of course). This increasing granularity is a fascinating and exciting development. While it was already clear that all shots aren’t created equal from purely watching the football, the past summer has quantified this very clearly. To me, this demonstrates that the traditional view of ‘shot conversion’ as a measure of finishing ability is erroneous.

As an illustrative example, consider two players who both take 66 shots in a season. Player A scores 11 goals, so has a shot conversion of 17%. Player B scores 2 goals, so has a shot conversion of 3%. The traditional view of shot conversion would suggest that Player A is a better finisher than Player B. However, if Player A took all of his shots from a central area within the 18-yard box, he would be bang in line with the Premier League average over the past 3 seasons. If Player B took all of his shots from outside the area, he would also be consistent with the average Premier League player. Both players are average when controlling for shot location. Clearly this is an extreme example but then again it is meant to be an illustration. To me at least, shot conversion seems more indicative of shooting efficiency i.e. taking shots from good positions under less defensive pressure will lead to an increased shot conversion percentage. Worth bearing in mind the next time someone mentions ‘best’ or ‘worst’ in combination with shot conversion.

The remaining question for me is how sustainable the more granular data is from season-to-season, especially given the smaller sample sizes.

Is scoring ability maintained from season to season?

With the football season now over across the major European leagues, analysis and discussion turns to reflection of the who, what and why of the past year. With the transfer window soon to do whatever the opposite of slam shut is, thoughts also turn to how such reflections might inform potential transfer acquisitions. As outlined by Gabriele Marcotti today in the Wall Street Journal, strikers are still the centre of attention when it comes to transfers:

The game’s obsession with centerforwards is not new. After all, it’s the glamour role. Little kids generally dream of being the guy banging in the goals, not the one keeping them out.

On the football analytics front, there has been a lot of discussion surrounding the relative merits of various forward players, with an increasing focus on their goal scoring efficiency (or shot conversion rate) and where players are shooting from. There has been a lot of great work produced but a very simple question has been nagging away at me:

Does being ‘good’ one year suggest that you’ll be ‘good’ next year?

We can all point to examples of forwards shining brightly for a short period during which they plunder a large number of goals, only to then fade away as regression to their (much lower) mean skill level ensues. With this in mind, let’s take a look at some data.

Scoring proficiency

I’ve put together data on players over the past two seasons who have scored at least 10 goals during a single season in the top division in either England, Spain, Germany or Italy from WhoScored. Choosing 10 goals is basically arbitrary but I wanted a reasonable number of goals so that calculated conversion rates didn’t oscillate too wildly and 10 seems like a good target for your budding goalscorer. So for example, Gareth Bale is included as he scored 21 in 2012/13 and 9 goals in 2011/12 but Nikica Jelavić isn’t as he didn’t pass 10 league goals in either season. Collecting the data is painful so a line had to be drawn somewhere. I could have based it on shots per game but that is prone to the wild shooting of the likes of Adel Taarabt and you end up with big outliers. If a player was transferred to or from a league within the WhoScored database (so including France), I retained the player for analysis but if they left the ‘Big 5′ then they were booted out.

In the end I ended up with 115 players who had scored at least 10 league goals in one of the past two seasons. Only 43 players managed to score 10 league goals in both 2011/12 and 2012/13, with only 6 players not named Lionel Messi or Cristiano Ronaldo able to score 20 or more in both seasons. Below is how they match up when comparing their shot conversion, where their goals are divided by their total shots, across both seasons. The conversion rates are based on all goals and all shots, ideally you would take out penalties but that takes time to collate and I doubt it will make much difference to the conclusions.

Comparison between shot conversion rates for players in 2011/12 and 2012/13. Click on the image or here for a larger interactive version.

If we look at the whole dataset, we get a very weak relationship between shot conversion in 2013/12 relative to shot conversion in 2011/12. The R^2 here is 0.11, which suggests that shot conversion by an individual player shows 67% regression to the mean from one season to the next. The upshot of this is that shot conversion above or below the mean is around two-thirds due to luck and one-third due to skill. Without filtering the data any further, this would suggest that predicting how a player will convert their chances next season based on the last will be very difficult.

A potential issue here is the sample size for the number of shots taken by an individual in a season. Dimitar Berbatov’s conversion rate of 44% in 2011/12 is for only 16 shots; he’s good but not that good. If we filter for the number of shots, we can take out some of the outliers and hopefully retain a representative sample. Up to 50 shots, we’re still seeing a 65% regression to the mean and we’ve reduced our sample to 72 players. It is only when we get up to 70 shots and down to 44 players that we see a close to even split between ‘luck’ and ‘skill’ (54% regression to the mean). The problem here is that we’re in danger of ‘over-fitting’ as we rapidly reduce our sample size. If you are happy with a sample of 18 players, then you need to see around 90 shots per season to able to attribute 80% of shot conversion to ‘skill’.

Born again

So where does that leave us? Perhaps unsurprisingly, the results here for players are similar to what James Grayson found at the team level, with a 61% regression to the mean from season to season. Mark Taylor found that around 45 shots was where skill overtook luck for assessing goal scoring, so a little lower than what I found above although I suspect this is due to Mark’s work being based on a larger sample over 3 season in the Premier League.

The above also points to the ongoing importance of sample size when judging players, although I’d want to do some more work on this before being too definitive. Judgements on around half a season of shots appears rather unwise and is about as good as flipping a coin. Really you want around a season for a fuller judgement and even then you might be a little wary of spending too much cash. For something approaching a guarantee, you want some heavy shooting across two seasons, which allied with a good conversion rate can bring you over 20 league goals in a season. I guess that is why the likes of Van Persie, Falcao, Lewandowski, Cavani and Ibrahimovic go for such hefty transfer fees.

Newcastle United vs Liverpool: passing network analysis

Liverpool defeated Newcastle 6-0 at St James’ Park. Below is the passing network analysis for Liverpool split between the first 75 minutes of the match and the rest of the match up to full time. I focussed just on Liverpool here. More information on how these are put together is available here in my previous posts on this subject.

The reason I separated the networks into these two periods was that I noticed how Liverpool’s passing rate changed massively after Steven Gerrard was substituted and the fifth goal was scored. During the first 75 minutes, Liverpool attempted 323 passes with a success rate of 74% and a 45% share of possession. After this, Liverpool attempted 163 passes with an accuracy of 96% and a 60% share of possession. Liverpool attempted 34% of their passes in this closing period. Let’s see how this looks in terms of their passing network.

The positions of the players are loosely based on the formations played, although some creative license is employed for clarity. It is important to note that these are fixed positions, which will not always be representative of where a player passed/received the ball. The starting eleven is shown on the pitch for the first 75 minutes, with Borini replacing Gerrard in the second network.

Passing networks for Liverpool for the first and second halfs against Swansea City from the match at Anfield on the 17th February 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. Players with an * next to their name were substituted. Click on the image for a larger view.

Passing networks for Liverpool for the first 75 minutes and up to full time against Newcastle United from the match at St James’ Park on the 27th April 2013. Only completed passes are shown. Darker and thicker arrows indicate more passes between each player. The player markers are sized according to their passing influence, the larger the marker, the greater their influence. Click on the image for a larger view.

Liverpool’s passing was quite balanced for the first 75 minutes of the match, with a varied passing distribution. There was a stronger bias towards the right flank compared with the left flank as Gerrard drifted right to combine with Johnson and Downing. The passing influence scores were also evenly distributed across the whole team with Gerrard and Lucas being the top two. A contrast with some previous matches is the lack of strong links along the back line, which indicates less reliance on recycling of possession in deeper areas. Instead, Liverpool were seeking to move the ball forward more quickly and played the ball through the whole team.

He makes us happy

After Gerrard and Lucas, the next most influential player was Coutinho, who put in a wonderfully creative performance as the attacking fulcrum of the team. He linked well with all of Liverpool’s forward players and threaded several dangerous passes to his team-mates including an assist and a ‘second goal assist’ (defined as a pass to the goal assist creator) for the second goal according to EPL-Index. His creative exploits thus far have been hugely promising during his first 10 appearances.

Sterile domination

The final period of the match saw Liverpool really rack up the passing numbers as mentioned earlier. Clearly, this is easier to do when 5 or 6 goals clear but it is still potentially illustrative to see how this was accomplished. The main orchestrator’s of this were Lucas and Henderson who were 28/28 and 35/35 for passes attempted/completed during this period. Henderson was 21/24 from the first 75 minutes, so this was quite a rapid increase with his shift in role after Gerrard went off and the state of the game.

Your challenge should you wish to accept it

Admittedly Newcastle were very poor in this match but Liverpool took advantage to enact a severe thrashing. This was accomplished without Suárez, which leads to obvious (premature?) questions about whether his absence improved Liverpool’s overall balance and play. Assuming that Suárez doesn’t leave in the summer, one of Bredan Rodgers’ key tasks will be developing a system that gets the best out of the attacking talents of Suárez, Coutinho and Sturridge. It could be quite tasty if he manages to accomplish this.