🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

Check out the live demo.

Installation

Use the package manager pip to install infolingo.

pip install infolingo

(Optional) Streamlit Demo GUI

# download repo
git clone
python -m venv .venv
source .venv/bin/activate

# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py

You should then see a locally hosted website like this:

Usage

Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.

from infolingo import Infolingo

il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]

Supported Languages

Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")

Custom Corpus

Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.

Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")

Vocabulary Picking Functions

We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.

Cross-Entropy

Selects the top n vocabulary that decreases cross-entropy for the text the most.

il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")

KL-Divergence

Select the top n vocabulary that decreases KL-Divergence for the text the most.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")

Frequent

Select the top n most frequent words in the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")

Random

Select n random words from the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")

Default Corpora

The default corpora used are listed below:

English: Brown Corpus
Spanish: Wortschatz Leipzig. spa_news_2023_300K-words
French: Wortschatz Leipzig. fra_news_2023_300K-words

To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.

Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Changelog

1.1.0

Update README.md and links.

1.0.0

Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
infolingo		infolingo
scripts		scripts
streamlit_demo		streamlit_demo
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Installation

(Optional) Streamlit Demo GUI

Usage

Supported Languages

Custom Corpus

Vocabulary Picking Functions

Cross-Entropy

KL-Divergence

Frequent

Random

Default Corpora

Contributing

Changelog

1.1.0

1.0.0

License

About

Releases

Packages

Languages

License

aliceheiman/infolingo

Folders and files

Latest commit

History

Repository files navigation

🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Installation

(Optional) Streamlit Demo GUI

Usage

Supported Languages

Custom Corpus

Vocabulary Picking Functions

Cross-Entropy

KL-Divergence

Frequent

Random

Default Corpora

Contributing

Changelog

1.1.0

1.0.0

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages