[Project Summary] [Project Goals] [Deliverables] [Project Planning] [Initial Hypothesis] [Equipment Rail Data Dictionary] [Highway Rail Data Dictionary] [FIPS State Codes] [DS Pipeline] [Conclusion] [Next Steps] [Reproduce This Project]
Our team accessed the U.S. Department of Transportation’s database to analyze 8 years of rail accidents across the United States. We utilized the full data science pipeline to analyze the data and create a classification model that would predict which company would be involved in a rail accident.
Data Source: U.S. Department of Transportation
- Determine which features are drivers of highway rail accidents and equipment rail accidents.
- Build a classification model for predicting which railroad operator is most likely to be involved with a given accident. This information is used to enhance the overall analysis.
- Showcase highlighted findings in a presentation delivered to stakeholders.
- Create a reproducible jupyter notebook report that includes process, takeaways, and discoveries from every stage of the pipeline.
All of our project planning can be found on Trello. Below is a link to our progress along with a visual snapshot.
A link to the Trello board below can be found at https://trello.com/b/msJXyeEv/off-the-rails
Here is a snapshot of our project planning/setup on the evening of 5/23/21
- For each type of rail incidents, Railroad Operator plays into the frequency of incidents and the severity of incidents.
- Geography and location has an impact on the type of rail incident.
- Time of year and weather conditions have an impact on the frequency and scale of a rail accident.
- There will be a difference in severity of accident based on whether the accident was a highway rail accident or an equipment accident.
* - Indicates the target feature in this data
Features | Description | Data Type |
---|---|---|
railroad_company * | railroad code | object |
accident_type | type of accident | int |
state | FIPS state code | int |
temp | temp in degrees | int |
visibility | encoded visibility | int |
weather | weather conditions | int |
train_speed | speed of train in mph | int |
train_direction | train direction | float |
train_weight | gross tonnage of train | int |
train_type | type of equipment | object |
track_type | type of track | int |
front_engines | # of head end locomotives | int |
loadfrght_cars | # of loaded freight cars | int |
loadpass_cars | # of loaded passenger cars | int |
emptyfrght_cars | # of empty freight cars | int |
emptypass_cars | # of empty passenger cars | int |
equip_damage | equipment damage is USD | int |
track_damage | track damage in USD | int |
cause | cause of incidence | object |
total_killed | # of killed | int |
total_injured | # of injured | int |
max_speed | maximum equipment speed | int |
total_damage | total damage | int |
engineers_onduty | # of engineers on duty | float |
conductors_onduty | # of conductors on duty | float |
brakemen_onduty | # of brakemen on duty | float |
region | FRA designated region | int |
typrr | type of railroad | object |
lat | latitude | object |
long | longitude | object |
signal_type | type of signal | int |
date | date of incident | datetime |
season | season of year | object |
Features | Description | Data Type |
---|---|---|
railroad_company * | railroad code | object |
station | nearest timeable station | object |
county | FIPS county code | object |
state | FIPS state code | int |
region | FRA designated region | int |
city | FIPS city code | object |
vehicle_speed | estimated vehicle speed | float |
vehicle_type | encoded highway user | object |
vehicle_direction | encoded user direction | object |
position | encoded user direction | object |
accident_type | circumstance of accident | int |
hazmat_entity | entity transporting hazmat | object |
temp | temperature in degrees | int |
visibility | encoded visibility | int |
weather | weather conditions | object |
train_type | type of equipment | object |
track_type | type of track | object |
front_engines | # of head end locomotives | int |
railcar_quantity | quantity of railcar | int |
train_speed | estimated train speed | float |
train_direction | direction of train | object |
warning_location | location of warning | object |
warning_signal | crossing w highway signal | object |
lights | crossing illuminated | object |
standveh | passed a standing vehicle | object |
other_train | another train involved | object |
motorist_action | action of highway user | object |
view_obstruction | track view obstruction | int |
vehicle_damage | vehicle damage in USD | float |
driver_fate | fate of driver | object |
vehicle_occupied | was the vehicle occupied | object |
total_killed | total # of deaths | int |
total_injured | total # of deaths | int |
vehicle_occupants | # of vehicle occupants | int |
ispublic_crossing | is this a public crossing | object |
fips | FIPS code | int |
whistle_ban | whistle ban in effect | object |
driver_age | age of driver | object |
driver_gender | gender of driver | object |
train_occupants | # people on train | int |
user_killed | # of drivers killed | int |
user_injured | # of drivers injured | int |
rail_killed | # of rr employess killed | int |
rail_injured | # of rr employees injured | int |
train_pass_killed | # train passengers killed | int |
train_pass_injured | # train passengers injured | int |
road_conditions | encoded road conditions | object |
date | date of incident | datetime |
season | season of year | object |
FIPS Code | Corresponding State |
---|---|
01 | Alabama |
02 | Alaska |
04 | Arizona |
05 | Arkansas |
06 | California |
08 | Colorado |
09 | Connecticut |
10 | Delaware |
11 | District of Columbia |
12 | Florida |
13 | Georgia |
15 | Hawaii |
16 | Idaho |
17 | Illinois |
18 | Indiana |
19 | Iowa |
20 | Kansas |
21 | Kentucky |
22 | Louisiana |
23 | Maine |
24 | Maryland |
25 | Massachusetts |
26 | Michigan |
27 | Minnesota |
28 | Mississippi |
29 | Missouri |
30 | Montana |
31 | Nebraska |
32 | Nevada |
33 | New Hampshire |
34 | New Jersey |
35 | New Mexico |
36 | New York |
37 | North Carolina |
38 | North Dakota |
39 | Ohio |
40 | Oklahoma |
41 | Oregon |
42 | Pennsylvania |
44 | Rhode Island |
45 | South Carolina |
46 | South Dakota |
47 | Tennessee |
48 | Texas |
49 | Utah |
50 | Vermont |
51 | Virginia |
53 | Washington |
54 | West Virginia |
55 | Wisconsin |
56 | Wyoming |
- The data is acquired from csv files sourced from the US Department of Transportation.
- Two dataframes are created by concatenating the csv files for highway rail accidents and equipment rail accidents.
- All functions to prepare the data are included in acquire.py.
- Prepare the data for analysis with prepare.py module with functions for each type of rail accident (highway and equipment)
- The prepare module will return respective dataframes, split into train, validate and test. The train dataset will be ready for exploratory analysis.
- Run univariate, bivariate, and multivariate visulaizations for how features interact with each other and the target, railroad_company
- Run statistical tests to help answer questions that arose from exploration
- All functions to explore the data are included in explore.py.
- Preprocessing module prepares the dataframes for modeling. All functions for preprocessing are included in preprocessing.py.
- Develop a baseline model for predicting the railroad operator based on the type of incident and incident features
- Build classification models that improve upon the baseline accuracy with the understanding that model performance and evaluation metrics will better inform the overall analysis.
- A deployed model that can take in new data that is preprocessed and returns similar results. This is important for supporting any analysis and takeaways that were concluded from the modeling stage.
- A report with visuals that highlight the findings from the project analysis
- A reproducible notebook that is well-documented
- US Railroads seem to be operating in a consistent manner with regards to Equipment Rail Accidents.
- Characteristics of the highway driver and location, rather than the Railroad, appear to account for the variance in outcomes of highway rail accidents.
- Industry and local governments should further investigate the drivers for highway rail accidents, especially in locations with recurring incidents.
-
Read and follow this README.md.
-
Download the following files to your working directory:
- Run our final Jupyter Notebook (final_report_and_findings.ipynb) to reproduce our findings and analysis.