Introduction to ALW series

Home ¦ Đĺĺńňđ ˙çűęîâ ěčđŕ ¦ Ęŕđňű ¦ ßçűęč Đîńńčč ¦ Ńňđŕíű ěčđŕ ¦ Publications

LINGUARIUM Publications

INTRODUCTION to the Atlas of the Languages of the World (ALW) series

Structure of the Guide

The data in the Guide is structured according to language systematics which is based on genetic classification. To make the systematics of languages given here more precise and comparable, a system of taxa based on lexicostatistical data is used.

Using lexicostatistic data. Lexicostatistical method is usually used for measuring the degrees of difference between related languages in terms of years of separation calculated with the help of percentage of basic vocabulary items shared by two languages. Only percentage of cognates is given in the Guide and no time of separation since different formulas exist to calculate time of divergence. Also cognates’ percentage is quite enough to classify the languages.

The following are taxa or ranks for degrees of relationship, correlated with lexicostatistical percent cognates. The figures given are minimal bounds.

Family – the upper basic level on which whole systematics is founded. It is the group of definitely but long-range related languages which share at least 20 percent cognates.

Taxa for all levels below down to the language are not distinguished. They are all labelled just as groups with indications of percent cognates (in square brackets) between component groups or languages:

adyghe-abkhaz group [53-55]. See also Table 2.

Language / dialect. Since language and dialect are distinguished usually on base of sociolinguistic criteria rather than structural ones, it is impossible to put these terms in foundation of the systematics. Therfore here we use four levels for languages and dialects which are clearly defined structurally or lexicostatistically. The information whether an idiom is traditionally treated as language or dialect is indicated in reference data for it (see Language / dialect status below). See these levels in Table 1.

Table 1. Idiom levels with examples.

levels

examples

a)

b)

Idiom-1 level [89-95 percent cognates between component idioms] normally corresponds to a) quite distinct languages (which are almost mutually unintelligible) or b) group of close related languages.

English, French

East Slavonic, Ibero-Romance

Idiom-2 [95-99] – group of dialects or separate languages (with partial inherent intelligibility).

Picard, Walloon, Standard French, …

Belarussian, South Russian, North Russian, Ukrainian; Galician, Portuguese, Spanish

Idiom-3 [99-100] – dialects (with very good inherent intelligibility).

namurois, liégeois, wallo-picard, …

north portuguese, central portuguese, brazilian

Idiom-4 – subdialects (virtually one idiom with very slight differences); indicated only if necessary.

liégeois "proper, malmédien, verviétois, …

coimbrese, lisbonese,…

No taxa for language / dialect levels are indicated in the Guide. These levels are distinguished only with conventional spelling of basic names. See Table 2.

This distinction is also important for maps. Idiom levels are distinguished with fills (different fills for idiom-2 and upper levels and one fill for idiom-3 and -4 levels) and outlines (see Key list for details).

In certain cases, traditional "languages" have cognates’ percentage between component "dialects" much less than 89% and thus correspond to a level of a group. Some of those languages are already treated by modern linguists as groups of languages (ex. Chinese, Arabic, German) others are still viewed as single languages (often for lack of information).

All lexicostatistical information is cumulated in the genealogic chart (see below).

Data in the Guide

The Guide contains following categories of data.

Index code used for reference inside this edition. Every idiom and group has its own unique code. Each code comprises a number of family (1-99) + upper-case letters marking successive levels of groups within the family (none to four depending on depth of the family) + numbers (for idiom-1) and lower-case letters (for idiom-2,3,4 levels).

See Table 2 for example. Different levels are also distinguished by different indents as it is seen from the table.

Table 2. Example of Reference coding and Typography of basic names.

Code

Taxon

Basic name

Typography of basic names

1

family [24]

NORTH CAUCASIAN

Bold, full capitals, bigger size

1B

group [32]

NAKH-DAGESTANIC

Bold, full capitals, fixed width font

1BB-1

idiom-1

AVAR

Bold, full capitals, variable width font (vwf)

1a

idiom-2

North Avar

Bold, initial capital, vwf

1ab

idiom-3 (dialect)

bolmats

Normal, expanded, all smalls, vwf

1ag

idiom-3 (dialect)

north-east avar

a

idiom-4 (subdialect)

teletlin

Normal, exp., all smalls, vwf, smaller size

To make the reading of the Guide and Maps easier, groups’ numbers are omitted in codes for idiom-2 and lower levels, and only the last letter is kept in code for idiom-4 level.

Numbers of the maps representing this group or idiom. Numbers of basic maps (i.e. those having the idiom in their legend) are in normal face and of additional ones are in italics (ex. #3, 15; 8). For groups of languages only those numbers of maps are shown where the majority of comprising languages are presented. #0 means that idiom is not plotted on maps. If there is no number for idiom one should see higher taxa for it.

Basic name in English is a linguonym recommended for the use in any linguistic work for denotation of the idiom concerned. Basic names are printed first in each entry. Conventions for the typography of basic names for different levels see in Table 2.

Other names in English follow the basic name and are in light face and with lower case initials as all linguonyms in other languages are (as opposed to initial capitals for geographical and person names). This typographical convention does not apply to textual notes, printed in italics.
Linguonyms in other languages («exonyms») are preceded by the name of that language in parentheses. For example, under (2-1) Mingrelian, the Georgian exonym is recorded as: (Grg) megruli, odišuri; … Linguonyms in different languages are separated by semicolons.

Some often used language names are abbreviated for reference use and are listed for each set.

Exonyms in Cyrillic are Russian by default, otherwise they have indications of language as well.

Auto(linguo)nyms ('own names') are cited last after the at-sign @.
In certain cases ethnic names (ethnonyms) are also indicated and usually they are autoethnonyms.
Nomenclature and etymological notes are always in italics and are usually preceded by the symbol #.

Statistics. One or more of the following statistics are usually indicated [in square brackets]: total number of first language (L1) or mother tongue (L0) speakers; percentage out of ethnic group (EG); number of second language (L2) speakers; number of speakers in each country where the language is spoken.

In this edition the following definitions of the terms are used: first language (L1) is a language which is known at least as well as others (but possibly better) and is used most frequently; mother tongue or L0 (if differs from L1) is an ethnical language which is at least known enough to be spoken.

Number of speakers for ex-USSR states is given by default according to the last soviet census of 1989. Otherwise year is indicated. If figure is an estimate it is preceded by tilde (~). «Thousands» are often abbreviated to «k» and «millions» to «m»: 10k, 5m.

Location (preceded by symbol Е): countries where the language is spoken (with indication where it has the official/national status (marked as OL or NL) and more detailed location in each country: towns, regions). Names of countries are in small caps and underlined. Countries or regions where the language is spoken because of recent migration of speakers are preceded by ▶; subsequent migration is indicated by ▶▶.

Period of time when the idiom was spoken (for ancient and extinct languages).

Scripts (preceded by symbol & ⁷): note on the script used for the language in question, with the approximate date for the beginning of written tradition; if there is a written standard, it is indicated which variety it is based on.

Language / dialect status: if the idiom is traditionally treated as language or dialect.

Multilingualism: in what other language(s) are speakers bi-/ multilingual and in what degree.

Interlinguistic relationships: transition to / between relative idioms; mixed languages; notes on language history, convergence, divergence.
Condition of language "health": extinction, near extinction, replacement by other language, endangerment. Extinct idioms are marked by the symbol † before Index code and possibly extinct by this symbol in parentheses: (†). Only the uppermost level is marked if all its components are (possibly) extinct.
Notes on an ethnic group: their subsistence type and mobility (nomads, hunter-gatherers, fishermen, etc); religion; migration (including forced), and so on.

This page is a part of Lingvarium project website – www.lingvarium.org

Supported by Linguistic Community

Mastered by: Yuri Koryakov

e-mail: lingvarium

gmail.com

Created on January, 16, 2001 ¦ Last updated on Jan, 11, 2008 15:12

levels	examples
levels	a)	b)
Idiom-1 level [89-95 percent cognates between component idioms] normally corresponds to a) quite distinct languages (which are almost mutually unintelligible) or b) group of close related languages.	English, French	East Slavonic, Ibero-Romance
Idiom-2 [95-99] – group of dialects or separate languages (with partial inherent intelligibility).	Picard, Walloon, Standard French, …	Belarussian, South Russian, North Russian, Ukrainian; Galician, Portuguese, Spanish
Idiom-3 [99-100] – dialects (with very good inherent intelligibility).	namurois, liégeois, wallo-picard, …	north portuguese, central portuguese, brazilian
Idiom-4 – subdialects (virtually one idiom with very slight differences); indicated only if necessary.	liégeois "proper, malmédien, verviétois, …	coimbrese, lisbonese,…

Code	Taxon	Basic name	Typography of basic names
1	family [24]	NORTH CAUCASIAN	Bold, full capitals, bigger size
1B	group [32]	NAKH-DAGESTANIC	Bold, full capitals, fixed width font
1BB-1	idiom-1	AVAR	Bold, full capitals, variable width font (vwf)
1a	idiom-2	North Avar	Bold, initial capital, vwf
1ab	idiom-3 (dialect)	bolmats	Normal, expanded, all smalls, vwf
1ag	idiom-3 (dialect)	north-east avar
a	idiom-4 (subdialect)	teletlin	Normal, exp., all smalls, vwf, smaller size