Skip to content

lexibank/lexibank-analysed

Repository files navigation

Lexibank Analysed

How to cite

If you use these data please cite

  • the original source

    List, Johann-Mattis; Forkel, Robert; Greenhill, Simon J.; Rzymski, Christoph; Englisch, Johannes; and Russell D. Gray (2021): Lexibank: A publicly available repository of standardized lexical datasets with automatically computed phonological and lexical features for more than 2000 language varieties [Dataset, Version 1.0]. Geneva: Zenodo.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://lexibank.clld.org

Notes

Core Sets

The core-sets are defined by using the following criteria:

Statistics

Glottolog: 100% Concepticon: 100% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 4,737 (linked to 2,789 different Glottocodes)
  • Concepts: 3,195 (linked to 3,195 different Concepticon concept sets)
  • Lexemes: 1,671,479
  • Sources: 127
  • Synonymy: 1.10
  • Invalid lexemes: 0
  • Tokens: 9,325,843
  • Segments: 2,376 (0 BIPA errors, 0 CLTS sound class errors, 2367 CLTS modified)
  • Inventory size (avg): 39.49

Possible Improvements:

Contributors

Name GitHub user Description Role
Frederic Blum @FredericBlum maintainer Author
Robert Forkel @xrotwang maintainer Author
Simon J. Greenhill @simongreenhill maintainer Author
Christoph Rzymski @chrzyki maintainer Author
Johannes Englisch @johenglisch maintainer Author
Russell D. Gray maintainer Author
Johann-Mattis List @LinguList maintainer Author

CLDF Datasets

The following CLDF datasets are available in cldf: