Home Companies Industry Politics Money Opinion LoungeMultimedia Science Education Sports TechnologyConsumerSpecialsMint on Sunday

Not a random score

Not a random score
Comment E-mail Print Share
First Published: Thu, Nov 10 2011. 10 59 PM IST

Updated: Thu, Nov 10 2011. 10 59 PM IST
Task for you: search the records of all 270 cricketers who have played Test matches for India. Collect the first digits of their career aggregates (i.e. not individual innings scores, but the total over the years they played). For example, Rahul Dravid has scored 12,000+ Test runs, so that’s a 1; Ramakant “Tiny” Desai scored 418, that’s 4 …like that.
Go on. I’m waiting.
Oh all right, I’ve done it for you. I dug up the records of these 270 cricketers. Five didn’t bat at all. Another seven didn’t score even a run. That leaves 258. I noted the first digits of their aggregates, and I have the totals of the 1s, the 2s, etc.
Question: How many times do you think each digit appears? That is, how many of these aggregates start with each of the nine digits, 1 through 9? Would you imagine they’d be about equal, about 28 (258 divided by 9) each?
Good guess. Completely wrong, though. Try this: just 13 of the 258 aggregates begin with 9. 22, with 6. 49, with 2. And—drum roll please—80 with 1. Indeed: 80 of the 258 men—nearly one-third—had run aggregates that were either 1, or between 10 and 19, 100 and 199, 1,000 and 1,999, or 10,000+.
What’s going on here? Do Indian Test cricketers track their run aggregates, so that when it reaches a number starting with 1, they promptly begin preparations to retire? Nice, but not quite. What’s going on here is exactly the kind of odd phenomenon that mathematicians will stumble upon and then tirelessly work on till they have an explanation. In this case, the explanation lies in something called Benford’s Law.
Why does it hold for—of all things—Test run aggregates?
Let’s try to understand by considering the number 9. Batsman Bhaktavatsalu piles up runs, innings after innings. Until his aggregate reaches 90, there’s only one way it can begin with the digit 9—which is if his aggregate was exactly 9. In contrast, there are 11 ways it could begin with each of the other digits (for example, we’d get a 1 if he finished with just 1, or any total between 10 and 19). Progressing through the 90s redresses that balance: at 99, the chances for each digit are equal.
With me so far? But as soon as Bhaktavatsalu passes 100 career runs, the next 100 possible totals (till 199) give us a 1, completely skewing the balance again. So if he retires with less than 200 runs, there are 111 ways (1, 10-19 and 100-199) that our first digit can be 1, but only 11 ways each it can be one of the other 9 digits. In particular, until his aggregate reaches 900, there remain just those 11 ways for us to get a 9. Progressing through the 900s redresses the balance for 9. But make it to 1,000, and the clock is reset again: once more, the next 1,000 (1,000-1,999) possible totals give us a 1.
And so on.
The result of this reasoning? With numbers generated cumulatively, like Test aggregates, about 30% will begin with 1, about 18% with 2, and progressively less for each digit: for 9, under 5% (there’s an explanation for these percentages, but that’s for another time). These Test aggregates follow this pattern very closely indeed; and if you trawl the stats for all Test cricketers (not just the Indians), the match would be even better. Note that this is different from reaching into a bag in which you have all the numbers between 1 and 10,000, say, and picking one; in that case, every digit would be equally likely to be the first digit. It is different because with aggregates, we must progress past every number from 1 on up to reach a particular aggregate. That process skews the chances.
And that’s Benford’s Law, and that’s why it applies only to naturally occurring numbers. Taken in the aggregate, all kinds of figures obey the law— lengths of rivers, populations of cities, Twitter followers, stock prices and more. In fact, it so accurately predicts some number behaviour that it is now used to detect fraud. Men who fake data try making them seem authentic by using each digit equally often. Sadly for them, that effort itself, says Benford’s Law, marks them as fake.
One final note: some years ago, fans tabulated the birthdays of several hundred American professional athletes. Surprisingly, a disproportionate number of them were born in January and February, with less in each subsequent month. Yep: another odd phenomenon worth explaining.
So what’s your guess: Are athletes’ parents particularly lovey-dovey in the middle of the year? Does the same thing hold for Indian cricketers?
Is this another instance of Benford’s Law? Or is there another explanation altogether?
(Thanks, Sunil Nanda, for discussions on Benford’s Law and more).
Once a computer scientist, Dilip D’Souza now lives in Mumbai and writes for his dinners. A Matter of Numbers will explore the joy of mathematics, with occasional forays into other sciences. Comments are welcome at dilip@livemint.com
Comment E-mail Print Share
First Published: Thu, Nov 10 2011. 10 59 PM IST