Detection of Mines/Rock using SONAR Data. Accuracy has been compared among various classification models
During the Russo-Japanese War of 1904–1905, two mines blew up when the Petropavlovsk struck them near Port Arthur, sending the holed vessel to the bottom and killing most of his crew in the process. This show us that why detecting mines under sea water, with good accuracy, is crucial to Navy of any country.
- I have used SONAR data available publicly. This dataset contains 60 numerical features.
- This dataset does not contains any missing value and has all the features in numeric form. So, we just only need to find the relevant features. I have used principal component analysis (pca) for removing correlation among the features and reducing the dimension.
Below plot shows the variation of explained variance with the number of componets of pca
I have compared the accuracy with the number of pca components for LinearRegression, LogisticRegression and RandomForestClassifier model.
Below plot shows the variation of accuracy of test data with number of pca components for LinearRegression model.
Below plot shows the variation of accuracy of test data with number of pca components for LogisticRegression model.
Below plot shows the variation of accuracy of test data with number of pca components for RandomForestClassifier model.
-
LinearRegression has the worst performance among the above models.
-
There is one intresting observayion from the above accuracy Vs number of pca features graph, RandomForestClassifier (an ensemble model) outperform LogisticRegrssion when the number of pca components are low. However, both these models have nearly the same performance when number of pca components are increased.
-
When number of pca components are in range 30 - 40, all the models failed miserably to perform on generalized data.
-
Taking 8-12 pca features on RandomForestClassifier will give the best performance on generalised data.