The structure of this project (submitted for fulfillment of the assignment component of LLT3510) is based on 3 steps.
Setting up into Training and Testing sets with X and Y correspondents
Tokenization
Creating a Linear Regression Model to classify the sentence gender label
An additional task was to find the top 10 words in the whole dataset
A RNN was opted to be used instead of a CNN due to better performance historically in RNNs when it comes to text processing
All of the code can be found in the text_classification Jupyter Notebook, whereas the datasets can be found as the only CSVs in the repository