In Tanzania, scammers often use SMS to steal money by pretending to be people you trust, such as close friends or relatives, or by continuing fake conversations about money transfers. These scams are commonly recognized with phrases like "NI TUMIE KWA NAMBA HII"
, or they claim to be agents like Freemasons, landlords, or employers offering fake jobs.
To address this problem, I created a dataset of 1,508 Tanzania Swahili-based SMS examples, showcasing various scam patterns. The dataset is available on Kaggle: swahili-sms-detection, and this project also includes a basic machine learning model to detect and predict such fraudulent messages.
- Real-time SMS scam detection using machine learning
- Clean and modern UI built with Next.js and Tailwind CSS
- Flask backend API with scikit-learn model
- Supports Swahili language messages
- 98.7% accuracy on test data
- Next.js 15+ with App Router
- TypeScript
- Tailwind CSS
- Shadcn UI Components
- Python
- Flask
- Scikit-learn
- Pandas
- Joblib
- Clone the repository
git clone https://github.com/Henryle-hd/BongoScamDetection
cd bongoscam
- Install frontend dependencies
cd frontend
npm install
- Install backend dependencies
cd backend
pip install -r requirements.txt
- Start the backend server
cd backend
python main.py
- Start the frontend development server
cd frontend
npm run dev
- Open
http://localhost:3000
in your browser
The SMS scam detection model was trained on a custom dataset of Swahili messages labeled as either scam or trust. The model uses:
- CountVectorizer for text feature extraction
- Multinomial Naive Bayes classifier
- Achieved 98.7% accuracy on test set
- Trained on 10000 messages
- Dataset available on kaggle: swahili-sms-detection
Predicts if an SMS is scam
Request body:
{
"sms": "Iyo ela tuma humu kwenye vodacom 0655251448 Jina lije ALLY ISSA "
}
Response:
{
"prediction": "scam" | "trust",
"sms": "Original message"
}
Contributions are welcome! Please feel free to submit a Pull Request.