implementation guess the word game for three Indonesian Languages
Contexto is word-based game in which players have to guess a secret word within an unlimited number of guesses, each time a player would guess a word, he will receive a number which indicates the similarity between his guess and the secret word, the similarity score is calculated by the back-end language model based on contextual relevance between the player guess and the final word from thousands of pre-analyzed texts.
Development of a web application that represents Contexto game in three different languages, and study the optimal gaming strategies.
- derive most common words for each language to be used by the back-end model as a secret word dataset.
- preform NLP stemming on the list of the common words for the three languages example: Arabic, which helps in finding the root version of the word. Pay attention, the root version of the word is heavily reliant on the language and it's grammar.
- fine-tune a Bert-based model (Transformer model) for each of the three languages in order to retrieve the contextual embeddings for the words of the language.
- use the embeddings retrieved in the previous step to calculate cosign similarity between the player word and the secret word.
- implement the front-end part of web application by using streamlit library.
- player simulation and a study of the different gaming stratiges including Maximum likelihood using (MLU) and Player Rank Using (PRU) for each of the three languages under study.
Final product can be seen using Deployment code, or be running the code on local machine.
Skills developed: NLP | Model Deployment | streamlit | pandas | pytorch | numpy | model-finetuning | LLM | word-embeddings | Bert-based models | Transformers | HuggingFace | genism | nltk | python.