Please leave a upvote and drop a comment!
📊 This project aims to predict diabetes using the Pima Indian Diabetes Dataset through binary classification.
Please visit the notebook posted on kaggle for the best experience
- 📊 Data splitting for training and testing
- 🛠️ Data preprocessing to prepare the data for modeling
- 🔍 Simple EDA (Exploratory Data Analysis) to understand the dataset
- ⚖️ Class balancing using SMOTE to address imbalanced data
- 🎛️ Parameter tuning using GridSearchCV to optimize model performance
- 🤖 Training 5 different classification models to predict diabetes
- 💾 Exporting the trained models for future use
If you can't see the notebook preview on github, open the pdf. It should load most of the times.
🤔 Diabetes is a prevalent health concern, and early prediction can aid in timely intervention and management. This project utilizes machine learning algorithms to predict the onset of diabetes based on various health parameters.
📈 The dataset contains health-related information such as glucose levels, blood pressure, and BMI of individuals from the Pima Native American tribe. It's widely used in research for diabetes prediction.
🔍 Exploration of multiple machine learning algorithms such as Logistic Regression, Decision Trees, Support Vector Machines, K-Nearest Neighbors, and Random Forests to find the best model for diabetes prediction.
💻 To use this project:
- Clone the repository.
- Run the notebook or scripts to train and evaluate the models. (requires any code editor with appropriate libraries installed or you can load it to kaggle)
- Analyze the results and fine-tune/download the models as needed.
📈 Evaluation the models are based on accuracy and F1 score, providing insights into their performance and suitability for diabetes prediction.