This repo provides a POC to demostrate how a portable data stack can provide an end-to-end ETL process with simple BI. It's calling PiwikPRO API for a demo account and using GCP service account to interact with Google Sheets.
All the context in this blog post: 📦 Portable Data Stack Concept With dbt, DuckDB, Sheets and Looker Studio
A portable data-stack with:
- Custom Python ETL
- DuckDB for OLAP
- dbt for transformations
- Google Sheets for Warehouse
- Data Studio for Visualization
- Docker Desktop
- PiwikPRO demo account.
- GCP Service account to work with Google Sheets. Add your service_account.json file in /reporting path.
- An Empty Spreadsheet that shares access to the email generated by the GCP Service account.
- A Looker Studio connected to the Spreadhsheet.
- DuckDB to run local queries.
- rename .env.example to .env and fill in your values if you want to test it!
- To get started with the Docker Image:
make build
- This will generate data, save to DuckDB, run dbt and export to sheets. Container will keep running:
make run
- Once the data is present on the Google Sheet, connect Looker Studio to it.
- If you want to query data. Uncomment line 45, 46, 47 of docker-compose.yml and replace the command value with:
sh -c "python model_export.py && tail -f /dev/null
- This will leave the container open so you can copy the files to query and use DuckDB querying engine:
make duckdb-files
make duckdb
show tables;
select * from attribution limit 5;
- If you want to remove all dependencies from the container, run:
make clean
- To run in Github Actions, create your Actions secrets for the .env file.
- Get tips, learnings and tricks for your Data career!
- Join the Substack newsletter to get similar content to this one and more to improve your Data career!