If you use these data please cite
- the original source
List, Johann-Mattis; Forkel, Robert; Greenhill, Simon J.; Rzymski, Christoph; Englisch, Johannes; and Russell D. Gray (2021): Lexibank: A publicly available repository of standardized lexical datasets with automatically computed phonological and lexical features for more than 2000 language varieties [Dataset, Version 1.0]. Geneva: Zenodo.
- the derived dataset using the DOI of the particular released version you were using
This dataset is licensed under a CC-BY-4.0 license
Available online at https://lexibank.clld.org
The core-sets are defined by using the following criteria:
- Varieties: 4,737 (linked to 2,789 different Glottocodes)
- Concepts: 3,195 (linked to 3,195 different Concepticon concept sets)
- Lexemes: 1,671,479
- Sources: 127
- Synonymy: 1.10
- Invalid lexemes: 0
- Tokens: 9,325,843
- Segments: 2,376 (0 BIPA errors, 0 CLTS sound class errors, 2367 CLTS modified)
- Inventory size (avg): 39.49
- Languages linked to bookkeeping languoids in Glottolog:
Name | GitHub user | Description | Role |
---|---|---|---|
Frederic Blum | @FredericBlum | maintainer | Author |
Robert Forkel | @xrotwang | maintainer | Author |
Simon J. Greenhill | @simongreenhill | maintainer | Author |
Christoph Rzymski | @chrzyki | maintainer | Author |
Johannes Englisch | @johenglisch | maintainer | Author |
Russell D. Gray | maintainer | Author | |
Johann-Mattis List | @LinguList | maintainer | Author |
The following CLDF datasets are available in cldf: