INTRODUCTION to the Atlas of the Languages of the World (ALW) series

Structure of the Guide

The data in the Guide is structured according to language systematics which is based on genetic classification. To make the systematics of languages given here more precise and comparable, a system of taxa based on lexicostatistical data is used.

Using lexicostatistic data. Lexicostatistical method is usually used for measuring the degrees of difference between related languages in terms of years of separation calculated with the help of percentage of basic vocabulary items shared by two languages. Only percentage of cognates is given in the Guide and no time of separation since different formulas exist to calculate time of divergence. Also cognates’ percentage is quite enough to classify the languages.

The following are taxa or ranks for degrees of relationship, correlated with lexicostatistical percent cognates. The figures given are minimal bounds.

Family – the upper basic level on which whole systematics is founded. It is the group of definitely but long-range related languages which share at least 20 percent cognates.

Taxa for all levels below down to the language are not distinguished. They are all labelled just as groups with indications of percent cognates (in square brackets) between component groups or languages:

adyghe-abkhaz group [53-55]. See also Table 2.

Language / dialect. Since language and dialect are distinguished usually on base of sociolinguistic criteria rather than structural ones, it is impossible to put these terms in foundation of the systematics. Therfore here we use four levels for languages and dialects which are clearly defined structurally or lexicostatistically. The information whether an idiom is traditionally treated as language or dialect is indicated in reference data for it (see Language / dialect status below). See these levels in Table 1.

Table 1. Idiom levels with examples.





Idiom-1 level [89-95 percent cognates between component idioms] normally corresponds to a) quite distinct languages (which are almost mutually unintelligible) or b) group of close related languages.

English, French

East Slavonic, Ibero-Romance

Idiom-2 [95-99] – group of dialects or separate languages (with partial inherent intelligibility).

Picard, Walloon, Standard French, …

Belarussian, South Russian, North Russian, Ukrainian; Galician, Portuguese, Spanish

Idiom-3 [99-100] – dialects (with very good inherent intelligibility).

namurois, liégeois, wallo-picard, …

north portuguese, central portuguese, brazilian

Idiom-4 – subdialects (virtually one idiom with very slight differences); indicated only if necessary.

liégeois "proper, malmédien, verviétois, …

coimbrese, lisbonese,…

No taxa for language / dialect levels are indicated in the Guide. These levels are distinguished only with conventional spelling of basic names. See Table 2.

This distinction is also important for maps. Idiom levels are distinguished with fills (different fills for idiom-2 and upper levels and one fill for idiom-3 and -4 levels) and outlines (see Key list for details).

In certain cases, traditional "languages" have cognates’ percentage between component "dialects" much less than 89% and thus correspond to a level of a group. Some of those languages are already treated by modern linguists as groups of languages (ex. Chinese, Arabic, German) others are still viewed as single languages (often for lack of information).

All lexicostatistical information is cumulated in the genealogic chart (see below).

Data in the Guide

The Guide contains following categories of data.

See Table 2 for example. Different levels are also distinguished by different indents as it is seen from the table.

Table 2. Example of Reference coding and Typography of basic names.



Basic name

Typography of basic names


family [24]


Bold, full capitals, bigger size


group [32]


Bold, full capitals, fixed width font




Bold, full capitals, variable width font (vwf)



North Avar

Bold, initial capital, vwf


idiom-3 (dialect)


Normal, expanded, all smalls, vwf


idiom-3 (dialect)

north-east avar



idiom-4 (subdialect)


Normal, exp., all smalls, vwf, smaller size

To make the reading of the Guide and Maps easier, groups’ numbers are omitted in codes for idiom-2 and lower levels, and only the last letter is kept in code for idiom-4 level. 

Some often used language names are abbreviated for reference use and are listed for each set.

Exonyms in Cyrillic are Russian by default, otherwise they have indications of language as well.

In this edition the following definitions of the terms are used: first language (L1) is a language which is known at least as well as others (but possibly better) and is used most frequently; mother tongue or L0 (if differs from L1) is an ethnical language which is at least known enough to be spoken.

Number of speakers for ex-USSR states is given by default according to the last soviet census of 1989. Otherwise year is indicated. If figure is an estimate it is preceded by tilde (~). «Thousands» are often abbreviated to «k» and «millions» to «m»: 10k, 5m.

