Evaluating Twitter accounts discussing web3 projects

Environment setup

Create and activate a conda environment named, for example, group23 with the dependencies specified in the file environment.yml:

conda env create -n group23 --file environment.yml
conda activate group23

Create a file named .env and set the following environment variables:
- USER_NAME: your Twitter usernames (Ex. 'username1, username2, username3')
- PASSWORD: your Twitter account passwords (Ex. 'pass1, pass2, pass3')
- EMAIL: the email addresses associated with the Twitter accounts (Ex. 'email1@gmail.com, email2@gmail.com, email3@gmail.com')
- EMAIL_PASWORD: the passwords of your email accounts
Configure where to save the data and log in the file config.py.
Run the script twitter_scraper/crawler.py

There are two datasets to preprocess:

the data we crawl using the process above, which has no bot/human label
data from BotRepository with a human/bot label for each Twitter account, which we're going to use for training and testing our bot detection model

To run a data preprocessing job:

Configure input and output locations in the file config.py
Run the script data_preprocessing/preprocess_our_data.py or data_preprocessing/preprocess_bot_repository_data.py.

Following the paper "Scalable and Generalizable Social Bot Detection through Data Selection (Yang et al., 2020)", we implement a random forest using 19 account metadata features to predict whether the account is a human or bot account. A trained model is available at bot_detection_model/model_storage.

To use the trained bot detection model on crawled tweets:

Configure the preprocessed data location and model output ((id, prediction) rows in parquet format) location in the file config.py.
Run the script bot_detection_model/detect_bot.py

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
airflow/dags		airflow/dags
bot_detection_model		bot_detection_model
data_description		data_description
data_preprocessing		data_preprocessing
kafka_streaming		kafka_streaming
twitter_scraper		twitter_scraper
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
environment.yml		environment.yml
requirements.txt		requirements.txt