Skip to content

ShubhamKumar2202/NETFLIX-APPTENCY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

image

NETFLIX-APPTENCY

The test set. it consists of everything in train.csv except target.

The training set. it consists of an id column, the customers features, and a target column: target.

A sample submission file in the correct format target=1 means that the customer subscribes to Netflix

steps:-

  1. IMPORTS
  • NUMPY
  • PANDAS
  • SEABORN
  • MATPLOTLIB.PYPLOT
  • DATETIME
  • SKLEARN.PREPROCESSING
  • CATBOOST
  • XGBOOST
  • LIGHTgbm
  • sklearn.preprocesing(LabelEncoder)
  • sklearn.model_selection(cross_val_score, Kfold, RepeatedStratifiedKFold, StratifiedKFold, cross_val_predict)
  • sklearn.model_selection(train_test_split, GridSearchCV)
  • sklearn.metrics(roc_auc_score)

2 IMPORT AND READ THE DATASET

  • TRAIN.CSV - the training set. it consists of an id column, the customers features, and a target column: target.
  • TEST.CSV - the test set. it consists of everything except target.

NOTE:- USED ----> (%config Completer.use_jedi = False ) MAGIC COMMAND. - Once you have enabled and run the %config Completer.use_jedi = False magic command, you can trigger the code autocompletion by pressing the tab button after the "." character.

  1. ANALYSIS
  • HEAD()

  • SHAPE()

  • DESCRIBE() ----> FIND MISSING DATA:-

  • get_mising- function made to find missing data

  • A histplot to show the distribution of missing data

  • A histplot to show the distribution of missing data in Missing_df

    • To get columns with more than 25% missing values.
  • Drop them from test and train set.

  • Print categorical features of datatype object.

  • Print numerical features of datatype object.

  • Plot a pichart to show the percentage of numeric and categorical features.

  • Fill with median(Numerical Features).

  • Fill with mode(Categorical Features).

  • Find columns that contain date objects.

  • Apply datetime format.

  • Show datetime features.

  • Get each part of datetime using pandas DatetimeIndex.

  • Drop from train/test.

  • Update Categorical_Features list.

  • Create a copy of datasets.

    • For train and test.
  • Get the number of unique values for each feature.

  • Categorical Features Sorted by cardinality.

STEPS to deal with missing data

  • create a function "get_missing" to find the missing data
  • store missing values from df_train in a dataframe "Missing_Df"
  • Make a plot to visualize the missing values in "df_train".
  • Create a dataframe "Missing_custom" to drop the missing columns with - - percentage greater than 25

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published