Skip to content

It is a simple project on Breast Cancer Prediction using Binary Classification.

License

Notifications You must be signed in to change notification settings

flawed-hooman/Breast_Cancer_Prediction

Repository files navigation

Breast_Cancer_Prediction

A project on Breast Cancer Prediction using KNN and Logistic Regression. A comparison between the two methods has also been shown.

This repository contains the following contents.

  • Breast Cancer Wisconsin (Diagnostic) Data Set
  • Sample program

About the Dataset

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.Separating plane was obtained using Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree Construction Via Linear Programming." Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society, pp. 97-101, 1992], a classification method which uses linear programming to construct a decision tree. Relevant features were selected using an exhaustive search in the space of 1-4 features and 1-3 separating planes.

Attribute Information:
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32) Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)

All feature values are recoded with four significant digits.

Missing attribute values: none

Class distribution: 357 benign, 212 malignant

Model Training

Open "breast-cancer-prediction-knn-logistic-regression(1).ipynb" in Jupyter Notebook and execute from top to bottom.
The provided dataset has been used. The classifications has been done using both K-Nearest Neighbours and Logistic Regression Algorithms.

KNN:-

K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique. It assumes the similarity between the new case/data and available cases and puts the new case into the category that is most similar to the available categories. It is a non-parametric algorithm, which means it does not make any assumption on underlying data.

Logistic Regression:-

Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc, but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. The curve from the logistic function indicates the likelihood of the data.

Comparing the algorithms

The training accuracy and model accuracy score of both the algorithms has been compared and shown using histograms.

License

Breast_Cancer_Prediction is under Apache v2 license.

About

It is a simple project on Breast Cancer Prediction using Binary Classification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published