Machine Learning Project: Dimensionality Reduction and Ensemble Learning

Project Overview

This project is for Introduction to Machine Learning course and applies dimensionality reduction techniques, specifically Principal Component Analysis (PCA), and uses ensemble learning methods such as Random Forests and Decision Trees. The goal is to improve the classification performance on a dataset by optimizing the number of features and weak learners. Key metrics such as accuracy, precision, recall, F1-score, and AUPRC (Area Under Precision-Recall Curve) are used to evaluate the model's performance.

Key Features:

Dimensionality Reduction with PCA: Reducing the number of features by maintaining the most variance-rich components.
Ensemble Learning: Using multiple weak learners (Decision Trees) with both hard and soft voting strategies to enhance prediction accuracy.
Performance Metrics: The model's output is evaluated using accuracy, precision, recall, F1-score, and AUPRC.

Steps in the Notebook:

Data Preprocessing:
- Mean normalization and zero-centering of the data.
- PCA to reduce the dimensionality of the dataset based on explained variance.
Model Training:
- Train and test the model using the Random Forest estimator.
- Implement ensemble learning with different numbers of weak learners.
Performance Evaluation:
- Calculate key metrics including accuracy, precision, recall, F1-score, and AUPRC for both PCA-reduced data and ensemble learners.
Optimization:
- The number of PCA components and weak learners is optimized to balance performance and computational cost.

Metrics Achieved:

Accuracy: 97.7%
Precision: 98.4%
Recall: 98.7%
F1-Score: 98.6%
AUPRC: 98.1%

Requirements

To run the project, you need the following Python libraries:

numpy
scikit-learn
matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ML_Project.ipynb		ML_Project.ipynb
README.md		README.md
test_data_1.csv		test_data_1.csv
train_data.csv		train_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Project: Dimensionality Reduction and Ensemble Learning

Project Overview

Key Features:

Steps in the Notebook:

Metrics Achieved:

Requirements

About

Releases

Packages

Languages

ZaZi2002/Machine-Learning-Project

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project: Dimensionality Reduction and Ensemble Learning

Project Overview

Key Features:

Steps in the Notebook:

Metrics Achieved:

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages