War as a mathematical measure of war

A data scientist has found a way to rank every military leader in history based on a metric used to estimate a baseball player’s contributions to his team

As a few dozen books on my shelves will attest, I find war endlessly fascinating, if also disturbing. That’s because wars have a way of showing us the wide canvas of human nature in its most elemental, unvarnished form. There’s politics, courage, inhumanity, leadership, strategy, failure, insight … even mathematics, at times.

At some point while reading about one ferocious battle or another, I find myself starting to wonder about the men who led the soldiers, whether on the battlefield or from wherever their headquarters are. What does it take to be such a person, whether you’re leading the fighting or directing where your men should go? What makes you a success or a failure? Throughout history, there have been generals renowned for their skill and valour in victory— Hannibal, Erwin Rommel and King Henry V at Agincourt, for example. There have also been generals known for wrongheadedness, currying political favour or obstinacy that resulted in disaster—Lord Kitchener for Gallipoli, Friedrich Paulus in Stalingrad and Admiral Yamamoto at Midway come to mind.

There are plenty of subjective rankings of such men. As you can imagine, they can vary wildly. Was the American Confederate general Robert E. Lee a genius or a mediocrity? What about Julius Caesar? Field Marshal Manekshaw? Leopoldo Galtieri? Is there any objective way of ranking these and other military leaders, so that we get a clearer sense of their worth than from stories of their accomplishments?

Amazingly, somebody set out to do just that a few years ago. Ethan Arsht, an amateur data scientist with a degree in Near Eastern Studies, put his mind to the task and found a way to do it based on a metric used in baseball.

Baseball, you ask? Well, how do you measure the worth of a hitter in baseball? Well, let’s take cricket, perhaps more familiar to most of you reading this. How do you assess a batsman? You could consider the runs he scores each time he comes out to bat. Or his cumulative total of runs. Or the number of centuries he has racked up. Or how quickly he scores his runs. Or some other statistic, or even all these together. Cricket, like baseball, is awash in statistics.

Among them are more esoteric measures too. What was the quality of the opposition? (Do you rank a century against Papua New Guinea the same as one against Australia?) Where in the batting order did he start most of his innings, and thus who in general were his partners? (Is partnering with a first-rate batsman any different from batting with someone you have to protect from the bowling?) How much did he score in team victories? Losses? (Is he contributing to wins? Unable to stave off defeats?)

Anantha Narayanan, a writer for ESPNCricinfo who clearly enjoys such exercises in mining cricket numbers, has a “Batting Performance Rating" to rank individual Test-match performances. The BPR is based on 11 different factors — like the quality of the pitch, runs added with the lesser batsmen, relative team strengths and the batsman’s contribution to the team’s performance. It’s this rating that led him to pronounce that Kusal Perera had the greatest innings in Test cricket history— his 153 to take Sri Lanka to victory over South Africa early this year.

In a similar data-mining spirit, baseball fans have found plenty of ways to measure players in that game too. In particular, there’s the one Arsht examined — the conveniently named “Wins Above Replacement" (WAR) metric. For any given player, his WAR rating gives us an idea of how many victories more or less his team would have had than if he had been replaced by an average player. That is, it tells us if his team is tangibly better with him in it rather than someone else: a player with a WAR score of 5 is certainly more valuable to his team than another with a score of 2.

Or, as one explanation of WAR puts it: “WAR offers an estimate to answer the question, ‘If this player got injured and their team had to replace them with a freely available (average hitter), how much value would the team be losing?’"

WAR is the measure Arsht chose to use to rank several thousand generals through history. That is, his method asks about each general in each of his battles: “All else being equal, if this man had to be replaced in this battle by an average general, how would his army perform?" As Arsht wrote: “I would find the generals’ WAR, in war."

Arsht combed through Wikipedia for data on battles. Among the 3,580 he considered were the tussle between Alexander and Porus on the Jhelum river in 326 BC, the 15th century Wars of the Roses in England, the Warsaw Uprising against Nazi Germany in 1944 and more. Among the 6,619 generals who figured in these battles were Alexander, Rommel, Tipu Sultan and J.S. Aurora. Arsht wrote software that extracted relevant information about each battle from the Wikipedia entries, like the number of soldiers who fought, and the result of the battle. This gave him a model for how an average general would perform. Against that average, then, Arsht could compare how individual generals performed. The model, says Arsht, was able to zero in on each general’s “ability as a tactician". It also was “surprisingly conservative … suggesting that (soldier numbers) have a relatively small effect (on the outcome of a battle) compared to other factors such as terrain or technology."

For each battle, Arsht assigned a WAR score to the generals involved. The example he uses to explain this is the Battle of Borodino in Russia on 7 September 1812. Napoleon led French forces to victory there, but it was at a tremendous cost in casualties on both sides, and in any case his invasion of Russia ultimately failed spectacularly three months later. Still, Borodino was indeed a French victory.

There were slightly more French than Russian troops involved in the fighting at Borodino. Thus, Arsht’s model suggests that an “average" general leading the French instead of Napoleon would have had a 51% chance of victory. Napoleon gets a score of 1 for actually winning, but we subtract that 51% chance, leaving Napoleon with a 0.49 WAR score for Borodino.

Conversely, consider Mikhail Kutuzov, the Russian general who faced Napoleon that day. He gets a score of -1 for losing, but since an average Russian general had a 51% chance of losing anyway, Kutuzov’s WAR score for Borodino is -0.49.

Sum up Napoleon’s WAR scores across all his battles and that total is his overall WAR score. Do the same for every other general. What we end up with is a ranking, via this single metric, of all these generals. Certainly there are problems here. For example, Napoleon fought 43 battles that Wikipedia lists, far more than anyone else, and no doubt that number itself inflates his score. Still, he did win 38 of them, so a high WAR score for him is not unreasonable. To our knowledge, Alexander the Great never lost a battle, which is why he remains such a revered military commander. But he fought only nine before he died on the way home to Greece.

So who tops Arsht’s WAR ranking? Napoleon, with a score of 16.703. In second place is Julius Caesar, whose 17 battles give him a WAR score of 7.352. That is, Napoleon is so far ahead of Caesar that you might think Arsht’s model needs tweaking. But even if you discount Napoleon, the other numbers suggest both that Arsht is on to something here, and that there are some generals whose reputations could use a second look. Not far behind Caesar are names like Hannibal (5.489), the Duke of Wellington (7.133) and Kemal Atatürk (3.582)—and Alexander places 10th, with 4.37. Saddam Hussein has -1.707, Kutuzov -1.036, Kitchener 0.469 with a -0.524 for the Gallipoli catastrophe—perhaps no surprises there. Paulus scrapes in at just above 0, his Stalingrad misadventure offset by a victory in the Battle of Kharkov.

But Rommel has a WAR score of -1.953, which belies the reverence with which many military historians speak of him. (He gets my respect, though, for participating in a 1944 attempt on Hitler’s life). Similarly, the US Confederate hero Robert E. Lee gets -1.89 from 27 battles. Could some commander other than him have turned the US Civil War, and so the course of that country’s history, around?

And what of Indian generals? Harbaksh Singh of our 1965 war is at 0.451, with just two battles. Manekshaw gets 0.467, but his “battle" in Arsht’s model is the entire 1971 war with Pakistan. In fact, there are arguments to be made with several Indian entries in Arsht’s ranking, which I’ll leave you to find.

War is fascinating, yes. The endless “what-ifs" Arsht’s exercise throws out, almost more so.

Once a computer scientist, Dilip D’Souza now lives in Mumbai and writes for his dinners. His Twitter handle is @DeathEndsFun