Authors: Chris Adams, Brandon Johnson, and Anna Yayloyan
The goal of this project is to extract data on registered electric vehicles in the U.S. State of Washington as well as income data in U.S counties to analyze who is buying what kind of electric vehicle and where in Washington state. This data and project could be used in various applications, such as government agencies and vehicle companies. The grain of the primary dataset is individual electric vehicles registered in Washington state and the secondary dataset is income per county for a selected year.
Data can contain personal identifying information about individuals that could potentially result in harm being done to them and their privacy. In this poltical climate, electric vehicles are particularly politicized. To avoid this data being used against indivdual vehicle owners in any way, we blocked out most of the VIN number on the vehicles. The location info is too broad to be concenred with, so we only focused on the VIN.
To store our data, we're using PostgreSQL to establish a database called 'ev_db.'
ERD for Electric Vehicle Database:
The data will be divided into the following six tables:
Here are examples of the data from the six tables using SQL statements:
For this program to work, you must store two new files in the 'DB_Init' directory: one called 'api_keys.py' and another called 'postgres_info.py'
The 'api_keys.py' file will store a variable called 'bea_key' that will hold your personal API key for the Bureau of Economic Analysis which can be created here:
https://apps.bea.gov/api/signup/
The 'postgres_info.file' will contain a 'user' field and a 'password' field to store the username and password for your PostgreSQL server.
- db_init.py
This python file initiates the 'ev_db' database in PostgreSQL using 'psycopg2.' It extracts income data by county for the year the user selects, then combines that with an Electric Vehicle csv that contains data for every registered electric vehicle in Washington state. It then will divide this dataframe into six seperate dataframes and import it into six seperate tables in our 'ev_db' PostgreSQL database.
Executing the program looks like this:
Doing this will set the entire database up as it is displayed in the section above.
- db_init_notebook.ipynb
This jupyter notebook file breaks down the code in the 'db_init.py' file step by step. You can use this to view how the data is transformed through the process before it is uploaded to PostgreSQL
- Output_CSVs Directory
This directory holds all the data in the individual tables in csv format as they are being processed by the db_init app.
- Resources Directory
This directory holds the "Electric Vehicle Population Data" csv that is used as the primary dataset in our daabase
For this program to work, you must store a python new file in the 'Input_App' directory called 'postgres_info.py'
The 'postgres_info.file' will contain a 'user' field and a 'password' field to store the username and password for your PostgreSQL server.
- input_app.py
This python program extracts data from our 'ev_db' PostgreSQL database by asking the user to input their own custom SQL statements. It then will ask the user to make a name for csv that the data is being extracted to. The program can be looped through several times if the user chooses to run it again.
Initiating the app will give you a list of all available tables and their fields like this:
Then it will ask you to input a SQL query and to name the CSV file
Finally it will display a sample of the data like this and ask if you would like to run the program again
- input_app_notebook.ipynb
This jupyter notebook runs the above input program, but allows you to use the data extracted as a pandas dataframe that can be manipulated
- Input_App_CSVs Directory
This directory stores the CSVs from the input apps above.
For this program to work, you must store a python new file in the 'Input_App' directory called 'postgres_info.py'
The 'postgres_info.file' will contain a 'user' field and a 'password' field to store the username and password for your PostgreSQL server.
- db_bokeh_charts.ipynb
This notebook gives examples of data being extracted from the database and then being maniplated to make into interative charts with the bokeh library. The first example of extarcted data uses the following SQL query to pull all registered CYBERTRUCKs:
SELECT county, model
FROM vehicles
JOIN location_info
ON vehicles.postal_code = location_info.postal_code
JOIN vehicle_types
ON vehicles.vehicle_type_id = vehicle_types.vehicle_type_id
WHERE model = 'CYBERTRUCK'
AND state = 'WA';
It results in the following chart:
The next example uses the following query to extarct all vehicles where the 'electric_range' is above 0:
SELECT vehicles.vehicle_type_id, electric_range, model_year, make, model
FROM vehicles
JOIN vehicle_types
ON vehicles.vehicle_type_id = vehicle_types.vehicle_type_id
WHERE electric_range > 0;
It results in the following chart:
Bokeh is an extremely capable data visualization tool and is able to display many different types of interactive charts. For more information on Bokeh, please view their documentation here:
https://docs.bokeh.org/en/latest/docs/user_guide.html
- Graphs Directory
This directory hold the images of the graphs above.
-
ev_db_tables.sql
This file contains the outline for the SQL tables created in the initialization step. -
EV_DB_ERD.png
This is the image of the ERD posted above.
Electric Vehicle Population Data for Washington State
https://catalog.data.gov/dataset/electric-vehicle-population-data
Bureau of Economic Analysis - PerCapita Income by County for 2022
https://apps.bea.gov/api/data?&UserID={INSERT_YOUR_CODE_HERE}&method=GetData&datasetname=Regional&TableName=CAINC1&LineCode=1&Year=2022&GeoFips=COUNTY&ResultFormat=json