A day before the start of this season’s Premier League, the BBC announced that its flagship highlights programme, Match of the Day, will feature “expected goals” (also referred to as xG) as part of its coverage. For a relatively obscure term/methodology confined to Twitter debates between nerds a couple of years ago, this is a staggering rise.
Even considering the fact that it had become a popular tool on social media over the past couple of seasons, for Match of the Day—an institution of sorts in England—to acknowledge its value is a huge victory for the football analytics community.
Following suit, Sky Sports—England’s main broadcaster of the Premier League—did a substantial segment on xG for their highly-rated show Monday Night Football (MNF). MNF, which has been at the forefront of in-depth analysis, launching the careers of Gary Neville and Jamie Carragher as erudite football pundits, included the former Liverpool defender in its explanation of xG and how it can be used to assess teams better.
For football fans tired of lazy punditry riddled with cliches like “the winning team wanted it more”, “fighting spirit”, “determination”, “the players lacked conviction”, etc., the introduction of xG into mainstream analysis is a welcome addition.
Before we delve into the history of data usage and analytics in football, a short history of xG.
First off, just what is it? Opta—the company which collects all possible kinds of data for every Premier League match—defines xG as follows:
“Based on over 300,000+ shots, expected goals (xG) measures the quality of a shot based on a number of different variables. The metric gives an indication as to how many goals a player or team should have scored on average based on the chances they have had.”
The concept of shot quality is important here. In short, xG is a statistical model to assess the quality of chances created by a team. This can also be applied to individual players. For every shot taken in a match, a probability is assigned, depending on a number of factors including but not limited to distance from the goal, location (outside the box, inside the box, centre of the box), circumstances in which the shot was taken (open play, one-on-one, set piece, free kick), how the shot was generated (through ball, cross from the wing, cross from the centre of the pitch).
In a low-scoring game like football where the scoreline plays a disproportionate role in how teams/players are assessed, xG can bring in some much needed objectivity. Take, for instance, last season’s Premier League match between Manchester United and Burnley that ended in a 0-0 draw.
In this game, Manchester United set a Premier League record for the number of shots taken. Anyone who had seen the match would know that the Red Devils were extremely unlucky to not have scored at least one goal. Zlatan Ibrahimovic hit the post and missed another sitter and so did a host of other United attackers. Notice in the graphic that the closer to the goal a shot is taken, the bigger the size of the bubble, indicating a higher probability of it getting converted to a goal.
Manchester United were thwarted by a barnstorming goalkeeping performance by Burnley goalkeeper Tom Heaton, besides having a bad day in terms of finishing. Burnley, on the other hand, were extremely lucky to get a point from this game. The post-match analysis was an extremely different story. While Manchester United were blamed for their “lack of ruthlessness in front of goal”, Burnley were praised for their “resilience, grit and determination”.
Of course, football’s low-scoring nature means that it is prone to random variation, and intangibles like “fighting spirit” do play an important role in a team’s performance. But xG helps us assess the game in a more nuanced manner.
Just in terms of statistics and numbers associated with football analysis, xG is a massive improvement on metrics like “possession %”, “shots taken” and “shots on target”, which were the only numbers publicly available during the Pep Guardiola-led Barcelona era, when possession was king and data availability was limited.
In many ways, the 2009-10 season can be considered a watershed moment for football analytics. Football had always lagged behind other sports (especially American sports) in embracing external sources of knowledge. Until 2009, even as most American sports franchises had embraced the concepts such as Moneyball—employing a battery of data crunchers and stats nerds to help pick the right players—football was taking baby steps towards incorporating analytics.
It was only from 2009 that Opta started collecting data, and xG statistical models have only achieved maturity now—with sufficient volumes of data available to feed into these models. Among the analytics community, the post-2009 period is considered the “Enlightened Era”.
Before this, though, it wasn’t that there was no place for analytics in football. Sam Allardyce, in his stint as the Bolton Wanderers’ manager from 1999 to 2007, made extensive use of Prozone (a football analysis tool produced by a company of the same name) to lift a newly-promoted Premier League club first to safety and then to mid-table and ultimately to the UEFA Cup on a shoestring budget.
Allardyce used Prozone for match analysis and for player recruitment as he blended old English-style football with the continental flair of players like Jay Jay Okocha, Fernando Hierro, Stelios Giannakopoulos and Youri Djorkaeff. But it was Arsene Wenger who gave a big fillip to the xG “movement” by mentioning the term clearly in one of his interviews.
This brought the mainstream football media’s attention on to expected goals and, since then, the “football analytics movement” has moved from the nerdy corners of Twitter to appearances on the back pages of all major English newspapers, with the likes of The Independent, Daily Mirror and even The Sun featuring expected goals as part of their coverage.
What does the future hold?
In a recent interview, Wenger commented that, 20 years from now, he foresees a robot replacing a manager in the dugout. While this may seem a bit far-fetched, his premise was that these days the role of a football manager is to process the mind-numbingly huge amount of information at his disposal. There is no escaping the fact that the minutest of minute information on a player—heartbeat, sleep patterns, fatigue levels—is available to a manager. How does a manager make decisions if his gut feel goes against what the data shows?
Ultimately, football is a low-scoring game with randomness and luck playing a huge role. Given the big money at stake in the top leagues, a substitution in a crucial game can make or break a manager’s career. When and how does a manager decide to use data even if it goes against his intuition and vice-versa?
Of course, footballing structures at clubs have evolved to accommodate these changes. Roles like “director of football” and “sporting director” have come to ease such burdens on the first team manager. In fact, most clubs now call the manager first team coach, a title clearly indicating that his primary responsibility is to take care of the first team squad.
These days, the likes of sporting directors work with teams of analysts, traditional scouts and data scouts to recruit players, decide wage structures, and so on—in conjunction with the first team coach.
Data versus tradition
Football analytics has had its biggest impact in the area of player identification, scouting and recruitment. All major English clubs now have analytics departments with a substantial say in the aforementioned areas.
As a New York Times piece illustrates, Arsenal have embraced analytics over the past three seasons to dodge big flops in the transfer market. Liverpool and Manchester City too have renowned analytics departments. In the era of huge transfer fees and inflating wages, it is important to find value and analytics provides the necessary edge to avoid costly mistakes.
Traditional scouting versus data scouting is where the battle lines are drawn now. With data now available at a more granular level than ever before and statistical models getting more robust, the much-romanticized “boots on the ground” scouting may take a backseat to analytics-based scouting. There is bound to be some upheaval as the situation evolves.
For instance, Matthew Benham, who made his fortune in the world of gambling with companies Matchbook and Smartodds, applied analytics principles to propel FC Midtjylland to the Danish Superliga title. The same approach, however, has shown mixed results at another club he owns: Brentford, who play in the English second division.
Then there is the question of buy-in from major decision-makers at the boardroom and club levels. Wenger, having spoken glowingly about the amount of data available to him, has actively tried to sell players supposedly bought on StatDNA’s recommendation (StatDNA, a sports performance analytics firm, was bought by Arsenal for £2.15 million in 2012 and supposedly recommended the signings of Mohammed Elneny, Gabriel and Shkodran Mustafi. While Elneny has played adequately enough, Gabriel has already been off-loaded and Mustafi appeared on the verge of a transfer to Italy’s Inter Milan this summer.)
All said and done, the evolution of analytics in football corresponds to the explosion of big data, cloud and technology in almost every sphere of life. Football is a business like any other—it’s all about finding competitive advantage. And analytics is all about finding that much needed edge.
Online resources for those interested in learning more about xG:
Chaitanya Lakkapragada works in the Indian IT industry to feed his stomach and makes his weekends miserable by watching Arsenal play. He has written for StarSportsIndia, Sportskeeda and TheHardTackle, and tweets at @chaitugooner.
Comments are welcome at firstname.lastname@example.org