Skip to content

This repo contains our Codeup capstone project, Off the Rails. The team of David Berchelmann, Stephen Kane, Justin Sullivan, and Gabriela Tijerina set out to find the drivers of railway accidents.

Notifications You must be signed in to change notification settings

OTR-Capstone/Off_The_Rails

Repository files navigation


[Project Summary] [Project Goals] [Deliverables] [Project Planning] [Initial Hypothesis] [Equipment Rail Data Dictionary] [Highway Rail Data Dictionary] [FIPS State Codes] [DS Pipeline] [Conclusion] [Next Steps] [Reproduce This Project] ​ ​

Project Summary:

Our team accessed the U.S. Department of Transportation’s database to analyze 8 years of rail accidents across the United States. We utilized the full data science pipeline to analyze the data and create a classification model that would predict which company would be involved in a rail accident.

Data Source: U.S. Department of Transportation


Project Goals:

  • Determine which features are drivers of highway rail accidents and equipment rail accidents.
  • Build a classification model for predicting which railroad operator is most likely to be involved with a given accident. This information is used to enhance the overall analysis.

[Back to top]


Deliverables:

  • Showcase highlighted findings in a presentation delivered to stakeholders.
  • Create a reproducible jupyter notebook report that includes process, takeaways, and discoveries from every stage of the pipeline.

[Back to top]


Project Planning:

All of our project planning can be found on Trello. Below is a link to our progress along with a visual snapshot.

A link to the Trello board below can be found at https://trello.com/b/msJXyeEv/off-the-rails

Here is a snapshot of our project planning/setup on the evening of 5/23/21

Reg-ppline

[Back to top]


Initial Hypotheses:

  • For each type of rail incidents, Railroad Operator plays into the frequency of incidents and the severity of incidents.
  • Geography and location has an impact on the type of rail incident.
  • Time of year and weather conditions have an impact on the frequency and scale of a rail accident.
  • There will be a difference in severity of accident based on whether the accident was a highway rail accident or an equipment accident.

[Back to top]


Data Dictionary

* - Indicates the target feature in this data

Equipment Data

Features Description Data Type
railroad_company * railroad code object
accident_type type of accident int
state FIPS state code int
temp temp in degrees int
visibility encoded visibility int
weather weather conditions int
train_speed speed of train in mph int
train_direction train direction float
train_weight gross tonnage of train int
train_type type of equipment object
track_type type of track int
front_engines # of head end locomotives int
loadfrght_cars # of loaded freight cars int
loadpass_cars # of loaded passenger cars int
emptyfrght_cars # of empty freight cars int
emptypass_cars # of empty passenger cars int
equip_damage equipment damage is USD int
track_damage track damage in USD int
cause cause of incidence object
total_killed # of killed int
total_injured # of injured int
max_speed maximum equipment speed int
total_damage total damage int
engineers_onduty # of engineers on duty float
conductors_onduty # of conductors on duty float
brakemen_onduty # of brakemen on duty float
region FRA designated region int
typrr type of railroad object
lat latitude object
long longitude object
signal_type type of signal int
date date of incident datetime
season season of year object

[Back to top]


Highway Data

Features Description Data Type
railroad_company * railroad code object
station nearest timeable station object
county FIPS county code object
state FIPS state code int
region FRA designated region int
city FIPS city code object
vehicle_speed estimated vehicle speed float
vehicle_type encoded highway user object
vehicle_direction encoded user direction object
position encoded user direction object
accident_type circumstance of accident int
hazmat_entity entity transporting hazmat object
temp temperature in degrees int
visibility encoded visibility int
weather weather conditions object
train_type type of equipment object
track_type type of track object
front_engines # of head end locomotives int
railcar_quantity quantity of railcar int
train_speed estimated train speed float
train_direction direction of train object
warning_location location of warning object
warning_signal crossing w highway signal object
lights crossing illuminated object
standveh passed a standing vehicle object
other_train another train involved object
motorist_action action of highway user object
view_obstruction track view obstruction int
vehicle_damage vehicle damage in USD float
driver_fate fate of driver object
vehicle_occupied was the vehicle occupied object
total_killed total # of deaths int
total_injured total # of deaths int
vehicle_occupants # of vehicle occupants int
ispublic_crossing is this a public crossing object
fips FIPS code int
whistle_ban whistle ban in effect object
driver_age age of driver object
driver_gender gender of driver object
train_occupants # people on train int
user_killed # of drivers killed int
user_injured # of drivers injured int
rail_killed # of rr employess killed int
rail_injured # of rr employees injured int
train_pass_killed # train passengers killed int
train_pass_injured # train passengers injured int
road_conditions encoded road conditions object
date date of incident datetime
season season of year object

[Back to top]


FIPS State Codes

FIPS Code Corresponding State
01 Alabama
02 Alaska
04 Arizona
05 Arkansas
06 California
08 Colorado
09 Connecticut
10 Delaware
11 District of Columbia
12 Florida
13 Georgia
15 Hawaii
16 Idaho
17 Illinois
18 Indiana
19 Iowa
20 Kansas
21 Kentucky
22 Louisiana
23 Maine
24 Maryland
25 Massachusetts
26 Michigan
27 Minnesota
28 Mississippi
29 Missouri
30 Montana
31 Nebraska
32 Nevada
33 New Hampshire
34 New Jersey
35 New Mexico
36 New York
37 North Carolina
38 North Dakota
39 Ohio
40 Oklahoma
41 Oregon
42 Pennsylvania
44 Rhode Island
45 South Carolina
46 South Dakota
47 Tennessee
48 Texas
49 Utah
50 Vermont
51 Virginia
53 Washington
54 West Virginia
55 Wisconsin
56 Wyoming

[Back to top]


Data Science Pipeline:

1. Acquire

  • The data is acquired from csv files sourced from the US Department of Transportation.
  • Two dataframes are created by concatenating the csv files for highway rail accidents and equipment rail accidents.
  • All functions to prepare the data are included in acquire.py.

2. Prepare

  • Prepare the data for analysis with prepare.py module with functions for each type of rail accident (highway and equipment)
  • The prepare module will return respective dataframes, split into train, validate and test. The train dataset will be ready for exploratory analysis.

3. Explore & Preprocessing

  • Run univariate, bivariate, and multivariate visulaizations for how features interact with each other and the target, railroad_company
  • Run statistical tests to help answer questions that arose from exploration
  • All functions to explore the data are included in explore.py.
  • Preprocessing module prepares the dataframes for modeling. All functions for preprocessing are included in preprocessing.py.

4. Model/Evaluate

  • Develop a baseline model for predicting the railroad operator based on the type of incident and incident features
  • Build classification models that improve upon the baseline accuracy with the understanding that model performance and evaluation metrics will better inform the overall analysis.

5. Deliver

  • A deployed model that can take in new data that is preprocessed and returns similar results. This is important for supporting any analysis and takeaways that were concluded from the modeling stage.
  • A report with visuals that highlight the findings from the project analysis
  • A reproducible notebook that is well-documented

[Back to top]


Conclusion:

  • US Railroads seem to be operating in a consistent manner with regards to Equipment Rail Accidents.
  • Characteristics of the highway driver and location, rather than the Railroad, appear to account for the variance in outcomes of highway rail accidents.

Next Steps:

  • Industry and local governments should further investigate the drivers for highway rail accidents, especially in locations with recurring incidents.

[Back to top]


Instructions for Reproducing Project:

  1. Read and follow this README.md.

  2. Download the following files to your working directory:

  1. Run our final Jupyter Notebook (final_report_and_findings.ipynb) to reproduce our findings and analysis.

[Back to top]


About

This repo contains our Codeup capstone project, Off the Rails. The team of David Berchelmann, Stephen Kane, Justin Sullivan, and Gabriela Tijerina set out to find the drivers of railway accidents.

Topics

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •