A repository for the WRK Group Dashboard in 2022.
The environment variables are stored in the .Renviron
file.
CENSUS_API_KEY
: The Census API key obtained from here
To get data for the survey results, we pull private data from the Azure blob with the following environment variables:
AZURE_WRK_SURVEY_RAW_URL
- URL to the Excel fileAZURE_WRK_SURVEY_RAW_SAS_TOKEN
- SAS token for the file
AZURE_WRK_SURVEY_PROCESSED_URL
- URL to the processed data Excel fileAZURE_WRK_SURVEY_PROCESSED_SAS_TOKEN
- SAS token for the file
Note that SAS token may expire with time. Check with the Azure console in case of an error.
graph LR;
subgraph sourcing["Data Sources"]
HUD["Housing and Urban Development"]
ODD["Open Data Delaware"]
kidsCount["Kids Count"]
census["Census Bureau"]
WRK["WRK Group Data"]
end
HUD --> housingUnits;
ODD --> HSach;
ODD --> HSgrad;
kidsCount --> kinderReadiness;
census --> employment;
WRK --> kinderReadiness;
WRK --> events;
WRK --> surveyResponses;
housingUnits["Subsidised units data"] --> housingTab;
HSach["High school achievement (ELA & Math)"] --> educationTab;
HSgrad["High school graduation"] --> educationTab;
kinderReadiness["Kindergarten readiness"] --> educationTab;
employment["Employment rates"] --> workforceTab;
surveyResponses["Safety survey responses"] --> safetyTab;
events["Events data"] --> eventsTab;
subgraph tabs["Dashboard Tabs"]
housingTab["Housing"]
educationTab["Education"]
workforceTab["Workforce"]
safetyTab["Safety"]
eventsTab["Events"]
end
The scripts that downloads and transforms the data are stored in the /ETL
folder.
Housing data are sourced from the assisted housing data from the US Department of Housing and Urban Development (HUD). The data are stored in Excel files. Each year's dataset is divided into two files depending on the states (AK-MN vs. MO-WY). The "AK-MN" Excel files contain Delaware's data, and thus they are downloaded into the data/raw/hud
folder.
Note: To add new data, simply download new Excel files into the data/raw/hud
folder.
The ETL script (housing.R
) will looks for all the excel files available in the raw folder, combines them, and saves the processed file into an RDS file (data/processed/hud_DE_combined.rds
).
National kindergarten readiness data are sourced from the Kindergarten readiness in Delaware dataset from Kids Count Data Center. The ETL script (education_kindergarten.R
) downloads the Excel data as a temporary file and saves the processed data into an RDS file (education_kinder_readiness_wide.rds
).
In addition, the dashboard uses the WRK Group's internal data about the kindergarten readiness. The summary dataset is stored in an RDS file (education_kinder_readiness_WRK.rds
).
High school achievement and graduation rate data are sourced from the Delaware Open Data Portal.
The high school achievement data are sourced from the Student Assessment Performance dataset. The ETL script (education_achievement.R
) calls the portal's API, add additional labels, and produces two datasets separately for the literacy/ELA achievement (education_achievement_wide_ELA.rds
) and math achievement (education_achievement_wide_math.rds
).
The high school graduation data are sourced from the Student Graduation dataset. The ETL script (education_graduation.R
) calls the portal's API, downloads the data, add additional labels, and saves the dataset into two RDS files, a district-wise dataset (education_graduation.rds
), and a summary dataset (education_graduation_summary.rds
).
Employment data are sourced from the 5-year American Community Survey datasets from the US Census. The ETL script (workforce_unemployment.R
) calls the Census API with a Census API key, stored as an environment variable (CENSUS_API_KEY
). Then, the script gets the data about the number of total labor force (B23025_003) and the number of unemployed people (B23025_005) for census tracts across the years. The script is set up to get the data from 2014 to the latest year. The script saves two files: a tract-wise dataset (workforce_unemployment.rds
), and a summary table (workforce_unemployment_sum_long.rds
).
Safety data are sourced from the 2021 WRK Group Community survey. The original, participant-wise dataset is private, and stored in a blob storage on Azure. The ETL script (safety_WRK_survey.R
) uses the environment variables to download the Excel file containing data. Then, the script transforms the data into a summary table. Along the way, the script also uploads the processed dataset back into an Azure instance. Finally, the script saves the summary dataset as an RDS file (safety_WRK_survey_2021.rds
).
Events data are sourced from The Warehouse Calendar. The ETL script (events_calendar.R
) performs a POST request to the calendar API, and repeats it to get the calendar from 2020 to the latest year. The script transforms the data and saves the dataset as an RDS file (events_warehouse_calendar.rds
).
This repo was created by us, the Data Innovation Lab at Tech Impact. Our team uses advanced data analytics and artificial intelligence to help organizations solve problems that will make our communities a better place. Visit our website or email us to get in touch.