This project demonstrates a comprehensive data warehouse implementation for the Brazilian E-commerce dataset from Olist. It showcases both traditional ETL processes and modern data transformation approaches, combining Python-based ETL, dbt (data build tool) for transformations, and Docker for containerization. The project creates a dimensional model optimized for business intelligence and analysis.
Our data warehouse follows a robust architectural design:
The warehouse implements a star schema with:
- Central Fact Table: fact_orders, containing transactional metrics and foreign keys
- Dimension Tables:
- dim_customer (customer demographics)
- dim_seller (seller information)
- dim_product (product details)
- dim_date (time-based analysis)
The base implementation uses Python for:
- Data extraction from CSV sources
- Complex transformations
- Loading into PostgreSQL
- Data quality validation
Our dbt layer provides:
- SQL-first transformations
- Automated testing framework
- Documentation generation
- Version-controlled transformations
- Modular model development
The containerized environment includes:
- PostgreSQL database
- Python ETL application
- DBT transformations
- Orchestrated using Docker Compose
ecommerce-data-warehouse/
├── config/
│ └── database.ini # Database configuration
├── data/
│ ├── processed/ # Transformed data
│ └── raw/ # Olist CSV files
├── dbt_olist/ # DBT implementation
│ ├── models/
│ │ ├── staging/ # Initial transformations
│ │ ├── intermediate/ # Helper models
│ │ ├── dim/ # Dimension tables
│ │ ├── fact/ # Fact tables
│ │ └── mart/ # Business-specific models
│ └── tests/ # Data quality tests
├── docker/ # Docker configuration
│ ├── Dockerfile # ETL application
│ ├── Dockerfile.dbt # DBT environment
│ └── docker-compose.yml # Service orchestration
├── notebooks/ # Analysis notebooks
├── src/ # Source code
└── tests/ # Test files
- Docker and Docker Compose
- Python 3.8+
- PostgreSQL 12+
- DBT 1.9.0+
# Clone the repository
git clone https://github.com/LeoRigasaki/ecommerce-data-warehouse.git
cd ecommerce-data-warehouse
# Start the containerized environment
docker-compose up --build
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure database
cp config/database.ini.example config/database.ini
# Update database.ini with your credentials
Each branch contains the different implementaion like Docker, dbt and python based ETL provided below:
# Run entire pipeline
docker-compose up
# Run DBT transformations
docker-compose run dbt run
# Run Python ETL
python run.py
# Run DBT models
cd dbt_olist
dbt run
- Access the data warehouse:
psql -U dwh_user -d ecommerce_dwh
- Run business queries:
python -m src.analysis.business_queries
- View DBT documentation:
dbt docs generate
dbt docs serve
- Robust data extraction and transformation
- Error handling and logging
- Data quality validation
- Incremental loading capability
- Modular SQL transformations
- Built-in testing framework
- Automated documentation
- Dependency management
- Containerized services
- Reproducible environment
- Easy deployment
- Scalable architecture
- main: Core implementation
- feature/dbt: DBT transformations
- feature/docker: Containerization
- Future: Airflow implementation
- DBT community
- Docker community
- PostgreSQL community
- Dataset provided by Olist Store
- Brazilian E-commerce Public Dataset by Olist on Kaggle