Does North East India have the highest per capita language density in the world

The baseline survey on Indian languages conducted by the People’s Linguistic Survey of India (PLSI) throws important light on the state of languages in the country. Though the complete work of PLSI covering all the states and union territories in the country is scheduled to be published in 50 volumes from September 2013 to December 2014; a wealth of information has already emerged from the preliminaries. Some interesting facts that have come up are – 780 languages are spoken in the country; 250 languages have been lost in the last 50 years; there are close to 66 scripts; Arunachal Pradesh with 90 languages, Assam with 55 languages and Gujarat with 48 languages are the three most linguistically diverse states and West Bengal having 38 languages but having 9 scripts has the maximum diversity in terms of scripts.

It is a matter of great surprise that such a comprehensive linguistic survey is being conducted almost after a century by the Gujarat-based Bhasa Research and Publication Centre, and that too not with government support but from funding by the charitable Jamsetji Tata Trust.  The first such survey known as the Linguistic Survey of India was conducted over a period of thirty years from 1898 to 1928 by a British civil servant George A. Grierson. His work documents 364 languages and dialects.  Though the languages of South India and North East India were not adequately covered nevertheless it remains an authoritative work.

In this backdrop, it may be worthwhile to bring forth statistics from Ethnologue, a widely trusted and respected volume (and also a website) that has been cataloguing all living languages since 1951. Published by Summer Institute of Linguistics, Ethnologue is the most authoritative and scholarly survey on the world’s languages; though it is constantly updated, errors do creep in, which is quite natural when data has to be collected and corroborated across the world. According to the 17th Edition of Ethnologue published this year, the number of individual indigenous languages listed for India is 461; in addition to this, India has 7 immigrant languages. Of the 461 languages, 447 are living and 14 are extinct. Of the living languages, 75 are institutional, 127 are developing, 178 are vigorous, 55 are in trouble, and 12 are dying.

LDI (Ethnologue 17th Edition 2013) LDI(UNESCO World Report 2009)
Country Index Country Index
Papua New Guinea 0.988 Papua New Guinea 0.990
Vanuatu 0.973 Vanuatu 0.972
Cameroon 0.972 Cameroon 0.965
Solomon Islands 0.968 Solomon Islands 0.965
Central African Republic 0.959 Central African Republic 0.960
India (14th) 0.916 India (9th) 0.930

Table 1: The Linguistic Diversity Index of multicultural nature of countries

Ten of the fourteen extinct languages belong to the Great Andamanese tribes who number just around 52 people today and sadly the language they speak now is a creolic mixture. The last extinct language to be deemed extinct was Aka-Bo, whose last speaker Boa Sr. died in 2010. Prof. Anvita Abbi, a Professor of Linguistics at Jawaharlal Nehru University and a recent Padma Shri awardee has done pioneering work on the Andamanese languages with the help of Boa Sr. before she passed way documenting a language that had thrived for thousands of years but has now become extinct. Three languages of North East India – Turung, Lui and Nora are among those extinct, according to Ethnologue.   

Anvita Abbi eliciting data from Boa Sr at Adi Basera Jan 2006The work done by PLSI which involved over 3000 people from varied backgrounds will definitely be more authentic and encompassing than by Ethnologue in the context of languages in the country. The PLSI survey is based on 2001 census report and data provided by different language-speaking communities (instead of house-to-house survey). During the survey in Meghalaya, it was found that 22 and 9 varieties of Khasi and Garo languages exist respectively, besides their standard forms. Among the variants of Garo language, there are less than 100 speakers of the Chibok and Megam languages. In Tripura which was found to have 10 living languages, Shimal is spoken by only four people and Korbong by only 250-300 people. These facts itself provide valuable linguistic insight.

According to PLSI, North East India is supposed to be the home of 210 languages. Many of these languages might as well be on the path of gradual extinction. On July 17th Chairperson of PLSI, Prof. G. N. Devy announced the completion of the survey of 130 languages of Assam, Meghalaya, Manipur, Nagaland and Tripura at a Press Conference in Guwahati. The reports of the Press Conference were carried by several newspapers both regional and national wherein some of them have quoted him saying that North East India has the highest per capita language density in the world. It is plausible that he may have said ‘the region is one of the highest’ rather than ‘the highest’. The following crude analysis may provide a clearer picture, if not an answer.

According to the figures of the Census of India 2011, the entire population of the 8 states that comprise the region is 4,54,86,784; with the combined geographical area summing up to 2,62,230 km2.   PLSI’s survey lists 210 languages in the region with around 130 of them being covered in their survey. If we take geographical area and the number of languages into account and call it – Geographical Language Density, then this figure works out to 1248.71 square kilometres for a language in the region. Next, let us take into account the population and the number of languages and call it – Speaker Language Density, this comes out to 2,16,603.73 people per language.

This crude analysis does reinforce that fact that the North East India is among the highest linguistically diverse regions of the world but not the highest.

Compare this with Vanuatu, an island nation in the Pacific whose population is 2,24,564; geographical area is 12,190 km2 and which has 112 languages out of which 2 are extinct. The Geographical Language Density and Speaker Language Density for Vanuatu work out to square 110.82 kilometers per language and 2041.49 people per language respectively.  

Papua New Guinea having 848 languages (12 of them extinct) accounting roughly for 8.5 % of the world’s 7105 known living language has the largest number of languages anywhere in the world. It has a geographical area of 4,62,840 km2 and a population of 70,59,653. The Geographical Language Density is 553.64 square kilometers per language and Speaker Density is 8444.56 people per language.

Readers may kindly note that these two definitions - Geographical Language Density and Speaker Language Density have been solely enunciated to provide some idea of language distribution and its density in the context of linguistic diversity of North East India. Therefore these two definitions do not hold any linguistic backing nor do they confirm to any statistical analysis on the

The Linguistic Diversity Index (LDI) measures the diversity of languages spoken in a country and is the globally accepted measurement of linguistic diversity. The scale ranges from 0 to 1. An index of 0 represents no linguistic diversity, meaning that everyone speaks the same language. An index of 1 represents total diversity, meaning that no two people speak the same language. No country has an index value of exactly 0 or 1. The Linguistic Diversity Index can provide insight into the multicultural nature of countries.

It is very interesting to note that Indonesia (707 which includes one immigrant language) and Nigeria (529 which includes 7 immigrant languages) having the second and third largest number of languages in the world respectively do not come very high in the Linguistic Diversity Index. Nigeria having a Linguistic Diversity Index of 0.891 is ranked 17th and Indonesia with an Index of 0.815 is ranked 32nd by Ethnologue.


Geographical Language Density(km2/per language)

Speaker Language Density(People/per Language)

Papua New Guinea 553.65 8444.56
Vanuatu 110.82 2041.49
Cameroon 1698.01 69,307.50
Solomon Islands 400.00 7366.20
Central African Republic 8652.56 61,416.67
Tanzania 7501.61 3,56,578.75
North East India 1248.71 2,16,603.73

The population figures of the countries have been accessed from Wikipedia and only living indigenous languages as listed by in their 17th Web Edition have been taken into account. This crude analysis does reinforce that fact that the North East India is among the highest linguistically diverse regions of the world but not the highest. But more work in terms of working out the Linguistic Diversity Index of the region in line with the standard global practice to bring out the true picture will have to be done by PLSI.

In the context of the linguistic diversity of the country, language figures of various Census of India, Reports may provide some additional information. The Census of India, 1961 indentified 1652 ‘mother tongues’ while the 1991 Census recognizes 1576 classified ‘mother tongues’. Linguists caution that mother tongues are not identical to languages or dialects. Neither is the boundary between a language and a dialect clear as both are quite fluid due to socio-political reasons.

To conclude an important fact may be cited – 94 percent of the world’s total population speak just 6 percent of languages in the world and 6 percent of the remaining population speak the remaining 94 percent of the languages.

Pratap Chhetri