Skip to content

This repo provides a POC to demostrate how a portable data stack can provide an end-to-end ETL process with simple BI. It's calling PiwikPRO API for a demo account and using GCP service account to interact with Google Sheets.

Notifications You must be signed in to change notification settings

aboyalejandro/data-stack-to-go-duckdb-dbt

Repository files navigation

🚀 Portable Data Stack with dbt, DuckDB and Sheets!

This repo provides a POC to demostrate how a portable data stack can provide an end-to-end ETL process with simple BI. It's calling PiwikPRO API for a demo account and using GCP service account to interact with Google Sheets.

All the context in this blog post: 📦 Portable Data Stack Concept With dbt, DuckDB, Sheets and Looker Studio

A portable data-stack with:

  • Custom Python ETL
  • DuckDB for OLAP
  • dbt for transformations
  • Google Sheets for Warehouse
  • Data Studio for Visualization

🙋🏻‍♂️ Pre-requisites

  • Docker Desktop
  • PiwikPRO demo account.
  • GCP Service account to work with Google Sheets. Add your service_account.json file in /reporting path.
  • An Empty Spreadsheet that shares access to the email generated by the GCP Service account.
  • A Looker Studio connected to the Spreadhsheet.
  • DuckDB to run local queries.
  • rename .env.example to .env and fill in your values if you want to test it!

📝 Steps

  • To get started with the Docker Image:
make build
  • This will generate data, save to DuckDB, run dbt and export to sheets. Container will keep running:
make run 
  • Once the data is present on the Google Sheet, connect Looker Studio to it.
  • If you want to query data. Uncomment line 45, 46, 47 of docker-compose.yml and replace the command value with:
sh -c "python model_export.py && tail -f /dev/null
  • This will leave the container open so you can copy the files to query and use DuckDB querying engine:
make duckdb-files
make duckdb 
 show tables; 
 select * from attribution limit 5;
  • If you want to remove all dependencies from the container, run:
make clean 
  • To run in Github Actions, create your Actions secrets for the .env file.
  • Get tips, learnings and tricks for your Data career!
  • Join the Substack newsletter to get similar content to this one and more to improve your Data career!

About

This repo provides a POC to demostrate how a portable data stack can provide an end-to-end ETL process with simple BI. It's calling PiwikPRO API for a demo account and using GCP service account to interact with Google Sheets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages