The English-Persian Tokenizer is a simple Python program that classifies input strings into English words or Persian words. It leverages a Deterministic Finite Automaton (DFA) to perform this classification, making it a handy tool for distinguishing English and Persian words within text.
- Tokenize input text into English and Persian words.
- Utilizes a DFA for efficient classification.
- Easily customizable for additional languages or character sets.
-
Clone or download this repository to your local machine.
-
Ensure you have Python installed (Python 3 is recommended).
-
Open a terminal and navigate to the repository's directory.
-
Run the tokenizer by executing the
tokenizer.py
script, providing the text you want to classify as an argument.python tokenizer.py "Your input text here."
Thank you for using the English-Persian Tokenizer!