ChildesNgrams

About

This repository contains research code for exploring n-gram statistics of a corpus of 5M words of English child-directed transcribed speech.

The primary aim is to produce visualisations that demonstrate that the statistical structure of speech to younger children is less complex compared to that of older children (1-3 vs. 3-6 years). For example, an n-gram model trained on the first half of the (age-ordered) corpus, achieves better perplexity when evaluated on the same corpus:

Dependencies

kenlm

To install kenlm,

pip install https://github.com/kpu/kenlm/archive/master.zip

User may need to install additional binaries, and update

scripts.LMPLZ_PATH
scripts.BINARIZE_PATH

AOCHILDES

The AO-CHILDES corpus is dynamically generated using the Python package AOCHILDES.

Compatibility

Developed with Python 3.7 on Ubuntu 18.04

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
aochildesngrams		aochildesngrams
bi_grams		bi_grams
images		images
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChildesNgrams

About

Dependencies

kenlm

AOCHILDES

Compatibility

About

Languages

phueb/AOCHILDESngram

Folders and files

Latest commit

History

Repository files navigation

ChildesNgrams

About

Dependencies

kenlm

AOCHILDES

Compatibility

About

Topics

Resources

Stars

Watchers

Forks

Languages