This repository contains a Jupyter notebook that presents an Exploratory Data Analysis (EDA) of a dataset consisting of physicochemical properties and quality (sensory) data for Portuguese "vinho verde" white and red wine samples.
The objective of this analysis is to understand the data and identify correlations between variables before applying a machine learning model to predict wine quality based on physicochemical characteristics.
The data was obtained from: Wine Quality Data
- Fixed Acidity: The concentration of non-volatile acids, primarily tartaric, malic, and citric acids, contributing to the wine's tartness and stability.
- Volatile Acidity: The amount of acetic acid and other volatile acids present, which can affect the aroma and spoil the wine if too high.
- Citric Acid: A minor acid in wine that can add freshness and enhance the fruity flavors.
- Residual Sugar: The sugar remaining after fermentation, influencing sweetness and body.
- Chlorides: The chloride ion concentration, which can affect the taste and stability of the wine.
- Free Sulfur Dioxide: The portion of sulfur dioxide that is not bound to other compounds, acting as an antimicrobial and antioxidant.
- Total Sulfur Dioxide: The total amount of sulfur dioxide, both free and bound, used as a preservative.
- Density: The mass per unit volume of the wine, related to the alcohol and sugar content.
- pH: The measure of the acidity or basicity of the wine, affecting taste, color, and microbial stability.
- Sulphates: Compounds that can contribute to wine stability and preservation, also influencing mouthfeel.
- Alcohol: The ethanol content resulting from fermentation, affecting the wine's body, warmth, and preservation.
- Type: Categorical, Red or White Wine.
- Quality: A sensory-based score ranging from 0 to 10, reflecting the overall perceived quality of the wine based on taste, aroma, balance, and overall enjoyment.
- Paulo Cortez for providing the wine quality dataset.