A sample submission file in the correct format target=1 means that the customer subscribes to Netflix
Sample_submission - https://www.kaggle.com/competitions/netflix-appetency/data?select=sample_submission.csv
steps:-
- IMPORTS
- NUMPY
- PANDAS
- SEABORN
- MATPLOTLIB.PYPLOT
- DATETIME
- SKLEARN.PREPROCESSING
- CATBOOST
- XGBOOST
- LIGHTgbm
- sklearn.preprocesing(LabelEncoder)
- sklearn.model_selection(cross_val_score, Kfold, RepeatedStratifiedKFold, StratifiedKFold, cross_val_predict)
- sklearn.model_selection(train_test_split, GridSearchCV)
- sklearn.metrics(roc_auc_score)
2 IMPORT AND READ THE DATASET
- TRAIN.CSV - the training set. it consists of an id column, the customers features, and a target column: target.
- TEST.CSV - the test set. it consists of everything except target.
NOTE:- USED ----> (%config Completer.use_jedi = False ) MAGIC COMMAND. - Once you have enabled and run the %config Completer.use_jedi = False magic command, you can trigger the code autocompletion by pressing the tab button after the "." character.
- ANALYSIS
-
HEAD()
-
SHAPE()
-
DESCRIBE() ----> FIND MISSING DATA:-
-
get_mising- function made to find missing data
-
A histplot to show the distribution of missing data
-
A histplot to show the distribution of missing data in Missing_df
- To get columns with more than 25% missing values.
-
Drop them from test and train set.
-
Print categorical features of datatype object.
-
Print numerical features of datatype object.
-
Plot a pichart to show the percentage of numeric and categorical features.
-
Fill with median(Numerical Features).
-
Fill with mode(Categorical Features).
-
Find columns that contain date objects.
-
Apply datetime format.
-
Show datetime features.
-
Get each part of datetime using pandas DatetimeIndex.
-
Drop from train/test.
-
Update Categorical_Features list.
-
Create a copy of datasets.
- For train and test.
-
Get the number of unique values for each feature.
-
Categorical Features Sorted by cardinality.
- create a function "get_missing" to find the missing data
- store missing values from df_train in a dataframe "Missing_Df"
- Make a plot to visualize the missing values in "df_train".
- Create a dataframe "Missing_custom" to drop the missing columns with - - percentage greater than 25