Skip to content

Program that constructs a PostgreSQL database with a dataset of electric vehicles registered in Washington state

Notifications You must be signed in to change notification settings

chrisadamsv/EV_Database_ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dream-Team-Project-3

Authors: Chris Adams, Brandon Johnson, and Anna Yayloyan

Project Overview

The goal of this project is to extract data on registered electric vehicles in the U.S. State of Washington as well as income data in U.S counties to analyze who is buying what kind of electric vehicle and where in Washington state. This data and project could be used in various applications, such as government agencies and vehicle companies. The grain of the primary dataset is individual electric vehicles registered in Washington state and the secondary dataset is income per county for a selected year.

Ethical Considerations

Data can contain personal identifying information about individuals that could potentially result in harm being done to them and their privacy. In this poltical climate, electric vehicles are particularly politicized. To avoid this data being used against indivdual vehicle owners in any way, we blocked out most of the VIN number on the vehicles. The location info is too broad to be concenred with, so we only focused on the VIN.

Database Overview

To store our data, we're using PostgreSQL to establish a database called 'ev_db.'

ERD for Electric Vehicle Database:
EV_DB_ERD

The data will be divided into the following six tables:
Screenshot 2024-12-16 at 13 47 02

Here are examples of the data from the six tables using SQL statements:
Screenshot 2024-12-16 at 14 15 06
Screenshot 2024-12-16 at 14 14 28

Screenshot 2024-12-16 at 14 18 51
Screenshot 2024-12-16 at 14 18 02

Screenshot 2024-12-16 at 14 21 25
Screenshot 2024-12-16 at 14 20 59

Screenshot 2024-12-16 at 14 23 35
Screenshot 2024-12-16 at 14 24 02

Screenshot 2024-12-16 at 14 26 14
Screenshot 2024-12-16 at 14 26 41

Screenshot 2024-12-16 at 14 27 59
Screenshot 2024-12-16 at 14 28 22

Repository Overview

DB_init directory

For this program to work, you must store two new files in the 'DB_Init' directory: one called 'api_keys.py' and another called 'postgres_info.py'

The 'api_keys.py' file will store a variable called 'bea_key' that will hold your personal API key for the Bureau of Economic Analysis which can be created here:
https://apps.bea.gov/api/signup/

The 'postgres_info.file' will contain a 'user' field and a 'password' field to store the username and password for your PostgreSQL server.

  • db_init.py
    This python file initiates the 'ev_db' database in PostgreSQL using 'psycopg2.' It extracts income data by county for the year the user selects, then combines that with an Electric Vehicle csv that contains data for every registered electric vehicle in Washington state. It then will divide this dataframe into six seperate dataframes and import it into six seperate tables in our 'ev_db' PostgreSQL database.
    Executing the program looks like this:
Screenshot 2024-12-16 at 14 39 46

Doing this will set the entire database up as it is displayed in the section above.

  • db_init_notebook.ipynb
    This jupyter notebook file breaks down the code in the 'db_init.py' file step by step. You can use this to view how the data is transformed through the process before it is uploaded to PostgreSQL

  • Output_CSVs Directory
    This directory holds all the data in the individual tables in csv format as they are being processed by the db_init app.

  • Resources Directory
    This directory holds the "Electric Vehicle Population Data" csv that is used as the primary dataset in our daabase

Input_App directory

For this program to work, you must store a python new file in the 'Input_App' directory called 'postgres_info.py'

The 'postgres_info.file' will contain a 'user' field and a 'password' field to store the username and password for your PostgreSQL server.

  • input_app.py
    This python program extracts data from our 'ev_db' PostgreSQL database by asking the user to input their own custom SQL statements. It then will ask the user to make a name for csv that the data is being extracted to. The program can be looped through several times if the user chooses to run it again.

Initiating the app will give you a list of all available tables and their fields like this:
Screenshot 2024-12-16 at 15 00 36

Then it will ask you to input a SQL query and to name the CSV file
Screenshot 2024-12-16 at 15 07 30

Finally it will display a sample of the data like this and ask if you would like to run the program again
Screenshot 2024-12-16 at 15 09 57

  • input_app_notebook.ipynb
    This jupyter notebook runs the above input program, but allows you to use the data extracted as a pandas dataframe that can be manipulated

  • Input_App_CSVs Directory
    This directory stores the CSVs from the input apps above.

Extraction directory

For this program to work, you must store a python new file in the 'Input_App' directory called 'postgres_info.py'

The 'postgres_info.file' will contain a 'user' field and a 'password' field to store the username and password for your PostgreSQL server.

  • db_bokeh_charts.ipynb
    This notebook gives examples of data being extracted from the database and then being maniplated to make into interative charts with the bokeh library. The first example of extarcted data uses the following SQL query to pull all registered CYBERTRUCKs:

SELECT county, model
FROM vehicles
JOIN location_info
ON vehicles.postal_code = location_info.postal_code
JOIN vehicle_types
ON vehicles.vehicle_type_id = vehicle_types.vehicle_type_id
WHERE model = 'CYBERTRUCK'
AND state = 'WA';

It results in the following chart:
Screenshot 2024-12-16 at 15 26 11

The next example uses the following query to extarct all vehicles where the 'electric_range' is above 0:
SELECT vehicles.vehicle_type_id, electric_range, model_year, make, model
FROM vehicles
JOIN vehicle_types
ON vehicles.vehicle_type_id = vehicle_types.vehicle_type_id
WHERE electric_range > 0;

It results in the following chart:
Screenshot 2024-12-16 at 15 28 21

Bokeh is an extremely capable data visualization tool and is able to display many different types of interactive charts. For more information on Bokeh, please view their documentation here:
https://docs.bokeh.org/en/latest/docs/user_guide.html

  • Graphs Directory
    This directory hold the images of the graphs above.

SQL directory

  • ev_db_tables.sql
    This file contains the outline for the SQL tables created in the initialization step.

  • EV_DB_ERD.png
    This is the image of the ERD posted above.


Ressoures

Electric Vehicle Population Data for Washington State
https://catalog.data.gov/dataset/electric-vehicle-population-data

Bureau of Economic Analysis - PerCapita Income by County for 2022
https://apps.bea.gov/api/data?&UserID={INSERT_YOUR_CODE_HERE}&method=GetData&datasetname=Regional&TableName=CAINC1&LineCode=1&Year=2022&GeoFips=COUNTY&ResultFormat=json

About

Program that constructs a PostgreSQL database with a dataset of electric vehicles registered in Washington state

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published