Skip to content

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

License

Notifications You must be signed in to change notification settings

aliceheiman/infolingo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Infolingo logo

🌏 Infolingo – Efficient Vocabulary Selection for Foreign-Language Learning

Python MIT license

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

Check out the live demo.

Installation

Use the package manager pip to install infolingo.

pip install infolingo

(Optional) Streamlit Demo GUI

# download repo
git clone
python -m venv .venv
source .venv/bin/activate

# start demo GUI
cd streamlit_demo
pip install -r requirements.txt
streamlit run app.py

You should then see a locally hosted website like this:

Infolingo demo

Usage

Quickstart using English as the default language and Cross-Entropy as the default vocabulary picking function.

from infolingo import Infolingo

il = Infolingo(language="english")
vocab = il.pick_vocab("The quick brown fox jumps over the lazy dog", n=2)
print(vocab) # prints ["jumps", "fox"]

Supported Languages

Infolingo(language="english")
Infolingo(language="spanish")
Infolingo(language="french")

Custom Corpus

Format your corpus file as a CSV with fields word,frequency and double quote (") as a delimiter character.

Infolingo(language="language", custom_vocab_file="path/to/custom/corpus")

Vocabulary Picking Functions

We evaluated four vocabulary-picking functions. The results indicate that cross-entropy and KL-divergence are most effective for language comprehension.

Cross-Entropy

Selects the top n vocabulary that decreases cross-entropy for the text the most.

il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="cross-entropy")

KL-Divergence

Select the top n vocabulary that decreases KL-Divergence for the text the most.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="kl-divergence")

Frequent

Select the top n most frequent words in the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="frequent")

Random

Select n random words from the text.

from infolingo import Infolingo
il = Infolingo()
vocab = il.pick_vocab(text, n=3, method="random")

Default Corpora

The default corpora used are listed below:

To use your corpus (alternative corpus to the ones above or to support a new language), see "Custom Corpus" above.

Contributing

Any contributions you make are greatly appreciated.

If you have a suggestion to improve this, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement." Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Changelog

1.1.0

Update README.md and links.

1.0.0

Initial infolingo PyPi submission. This version supports cross-entropy, kl-divergence, frequent, and random vocabulary picking functions. It contains a streamlit demo for testing.

License

MIT

About

Infolingo uses probability to pick the best words to learn next to improve understanding of a foreign language text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published