The release of linguistic data from the 2011 census allows us to objectively conclude that Nagaland is the most linguistically diverse state in India, with Kerala being the least diverse. The analysis is rather simple—we use a concept from industrial economics known as the Herfindahl-Hirschmann Index (HHI).
Originally developed to quantify the degree of monopoly or competition in an industry, the HHI is defined as the sum of the square of the market share of each company in an industry. For an industry with perfect competition (a large number of companies, each with infinitesimal market share), the HHI comes close to zero. For a monopoly, the HHI is one.
Inverting the HHI gives us an estimate of the “effective number of firms" in an industry (one for a monopoly, infinite for a perfectly competitive industry), a concept that has been extended to other fields of economics as well. For example, the inverted HHI formula is used a measure of the “effective number of parties (votes)" in an election. Similarly, we can use it to examine the effective number of languages in a state.
Now, the 2011 census gives language at two levels, which the census describes as “language" and “mother tongue", but which can also be described as “major language" and “minor language" or “language" and “dialect" (bringing to mind the old adage that a language is a dialect with an army and a navy). Here, we will look at the diversity of each state in India both at the language and the dialect level.
Nagaland is the clear winner on the diversity stakes on both the language and dialect axes, as the chart further demonstrates. Based on the 2011 census data, Nagaland effectively has 14 languages and 17 dialects with the largest language (Konyak) having only a 46% share. At the other extreme, Kerala only has 1.06 effective languages, with 97% of the state’s residents (in 2011) identifying Malayalam as their mother tongue.
These extreme states are not the most interesting, though. The more interesting states are the predominantly Hindi-speaking states where the effective number of dialects far outstrips the effective number of languages.
Himachal Pradesh, for example, has only 1.3 effective languages with 86% of the population identifying Hindi as their mother tongue. Breaking it up into dialects, though, it turns out the state has nearly 6 effective languages, with the largest being Pahari (a dialect of Hindi), which is spoken by 32% of the state’s population.
Similarly, 78% of Bihar’s population speak Hindi (when measured at the language level), giving it 1.6 effective languages, but broken down to the dialect level, Hindi is only spoken by 26% of the population (the other prominent dialect in the state being Bhojpuri). In Rajasthan and Chhattisgarh, while Hindi dominates at the language level, its dialects Rajasthani and Chattisgarhi dominate respectively if we measure at the dialect level.
Barring languages primarily spoken in small states such as Arunachal Pradesh, Hindi is the only language which breaks down into multiple major dialects as seen above, which leads to the question of “what is Hindi". We will answer that another day!