Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.
Check out the live demo.
Use the package manager pip to install infolingo.
pip install infolingo
# download repo
git clone
python -m venv .venv
source .venv/bin/activate
# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py
You should then see a locally hosted website like this:
Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.
from infolingo import Infolingo
il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]
Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")
Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.
Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")
We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.
Selects the top n vocabulary that decreases cross-entropy for the text the most.
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")
Select the top n vocabulary that decreases KL-Divergence for the text the most.
from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")
Select the top n most frequent words in the text.
from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")
Select n random words from the text.
from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")
The default corpora used are listed below:
- English: Brown Corpus
- Spanish: Wortschatz Leipzig. spa_news_2023_300K-words
- French: Wortschatz Leipzig. fra_news_2023_300K-words
To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.
Any contributions you make are greatly appreciated.
If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Update README.md and links.
Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.