The trouble with batting averages
In the One Day International (ODI) between India and Australia in Kolkata on 21 September, Ajinkya Rahane made a patient 55 before being run out. This knock, along with captain Virat Kohli’s 92, was instrumental in India getting to 252, which they ultimately managed to defend with ease.
Rahane’s opening partner Rohit Sharma, who unlike Rahane is an automatic selection for the Indian playing 11 for ODIs, made only 7 before he got out. While one game is simply too little data to determine a player’s contribution to the team, what is interesting is that both Rahane and Sharma’s innings on 21 September were rather characteristic of their overall playing careers.
The traditional metric to determine a batsman’s performance over a number of games is the batting average—which is simply defined as the number of runs scored by a batsman for every time he is dismissed. The batting average is calculated as the total number of runs scored by a batsman in a given time period, or a series, or over a career, divided by the number of times he is dismissed over the same time period.
In fact, this metric made its way across the Atlantic to baseball as well, where it was introduced in the late 19th century by an English statistician Henry Chadwick. The measure was suitably modified, of course, to reflect the fact that runs in baseball are scored differently than in cricket. Yet, it remained a sort of arithmetic average (number of hits divided by number of at-bats), and has been institutionalized in baseball as well.
However, it was the inadequacy of the batting average as a measure of a baseball batter’s efficiency that led to the Sabermetrics revolution pioneered by Bill James in the 1970s, and used to good effect by the Oakland A’s around the turn of the millennium (the story has been described expertly in Michael Lewis’s Moneyball, which was made into a movie starring Brad Pitt in 2012).
Coming back to Sharma and Rahane, based on the traditional batting average, there is no comparison in terms of who adds more value to the team—until Thursday’s game, Sharma has scored 5,622 runs while being dismissed 132 times, giving him a batting average of 42.6. Rahane, on the other hand, scored 2,549 runs while being dismissed 76 times, giving him an average of only 33.5. In cricketing terms, a nine-point difference in batting average is massive, and indicates that Sharma adds significantly more value than Rahane.
By definition, an average or a measure of central tendency, results in destruction of information. While taking an average (be it mean, median or mode, or any other esoteric measure), we effectively try to summarize a set of numbers by a single number, and irrespective of how this single number is chosen, some information is bound to get lost. It is no different with the batting average.
While Sharma’s batting average is much higher than Rahane’s, it’s not clear that he is a much better batsman. Figure 1 shows the probability density of both Sharma and Rahane’s scores over their careers, and it shows that Sharma’s average has been boosted by a number of big scores (Sharma, in fact, holds the record for the highest score ever in a men’s ODI—264 against Sri Lanka at Kolkata in 2014). Rahane, on the other hand, is a far more consistent batsman—by one measure, he is among India’s most consistent batsmen in ODIs—ever.
If we abandon the tradition of the batting average and instead look at the median number of runs scored per innings, we find that half the times Rahane goes to bat, he makes at least 28 runs. This ranks him fifth among all Indians who have played at least 20 innings—equal with Sachin Tendulkar and behind only Mahendra Singh Dhoni, Virat Kohli, Shikhar Dhawan and Ambati Rayudu.
Sharma, on the other hand, scores poorly on this measure—his median score is only 20 runs. In other words, in terms of median, Sharma is as far behind Rahane as Rahane is behind Sharma when it comes to the batting average. Table 1 shows the top all-time Indian batsmen on this measure. The analysis only takes into account performances against Test-playing nations (Australia, Bangladesh, England, New Zealand, Pakistan, South Africa, Sri Lanka, West Indies and Zimbabwe).
Table 1 shows that Rahane might be only three points behind Dhoni or Kohli when it comes to the median score, but their higher median scores can be explained by the simple observations that their batting averages are so much higher than Rahane’s. This suggests that using the ratio of median to batting average is perhaps a good measure of consistency of a batsman—a person with a higher ratio may not necessarily be the better batsman, but he is surely more consistent.
Table 2 is Table 1 ordered differently—in terms of the ratio of the median score to the batting average. We’ve limited this to batsmen who have a batting average of at least 20—to avoid those who are consistently bad. Rahane, with a median score that is 83% of his batting average, tops the list. In second place is Sunil Gavaskar, with a ratio of 73%. The only other contemporary batsmen near the top of the list are Suresh Raina (currently dropped from the team) and Dhawan (whose unavailability for the ongoing series has given Rahane his chance in the first place).
Figure 2 shows the batting average and median score for India’s top 10 run scorers who are still playing. The red line is the regression line—players above it can be assumed to be more consistent than average, and those below the line are less consistent. It is pertinent to note that Rahane lies far above the line while Sharma is far below the line.
Does this, however, mean that Rahane is a better player? Not necessarily, as it is a function of the role in the team that the batsman is expected to play. For an opening batsman, time is usually not that much of a constraint, and the ability to play a long innings can have a massive impact on the team’s score. So even if Sharma makes 20 runs or less half the time he goes in to bat, his ability to hit the occasional big score will weigh significantly in his favour. For someone batting down the order, where there is little opportunity to play a long innings, a consistent 30 might be valued much higher. Or the speed of scoring might trump the quantum of runs made.
Ultimately, it is worth recalling that an average of any kind destroys information, and the way we summarize information needs to depend on the context of how we intend to use it. The batting average might be a great metric for an opening batsman, but might result in a loss of information when trying to evaluate batsmen who play elsewhere.