The dataset used in this project is originally from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). This project aims at building a logistic regression model to predict diabetes, based on certain diagnostic measurements used as predictors (see below). The outcome variable is a binary one indicating whether or not the patients have diabetes. The resampling method Monte Carlo simulation is used to cross-validate the model.
The R code in this project for data visualization and logistic regression was written in R Markdown and knitted to html.
Preganancies
: Number of times pregnantGlucose
: Plasma glucose concentration in an oral glucose tolerance testBloodPressure
: Diastolic blood pressure (mm Hg)SkinThickness
: Triceps skinfold thickness(mm)Insulin
: Two hour serum insulinBMI
: Body Mass IndexDiabetesPedigreeFunction
: Diabetes pedigree functionAge
: Age in yearsOutcome
: Whether or not the patients have diabetes. 0 for no and 1 for yes.
Diabetes Healthcare: Comprehensive Dataset-AI
- Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press.
- Kuhn, M., & Silge, J. (2022). Tidy modeling with R. O'Reilly Media, Inc.