TY - JOUR
T1 - Analysis of Traffic Crashes Involving Pedestrians Using Big Data
T2 - Investigation of Contributing Factors and Identification of Hotspots
AU - Xie, Kun
AU - Ozbay, Kaan
AU - Kurkcu, Abdullah
AU - Yang, Hong
N1 - Funding Information:
The work was partially funded by the CitySMART laboratory of the UrbanITS center at the Tandon School of Engineering, and the Center for Urban Science and Progress (CUSP) at New York University (NYU). The authors would like to thank the New York State Department of Transportation, the New York City Department of Transportation, the New York Metropolitan Transportation Council, the Metropolitan Transportation Authority, and the New York City Department of City Planning for providing data for the study. The contents of this article reflect views of the authors who are responsible for the facts and accuracy of the data presented herein. The contents of the article do not necessarily reflect the official views or policies of the agencies.
Publisher Copyright:
© 2017 Society for Risk Analysis
PY - 2017/8
Y1 - 2017/8
N2 - This study aims to explore the potential of using big data in advancing the pedestrian risk analysis including the investigation of contributing factors and the hotspot identification. Massive amounts of data of Manhattan from a variety of sources were collected, integrated, and processed, including taxi trips, subway turnstile counts, traffic volumes, road network, land use, sociodemographic, and social media data. The whole study area was uniformly split into grid cells as the basic geographical units of analysis. The cell-structured framework makes it easy to incorporate rich and diversified data into risk analysis. The cost of each crash, weighted by injury severity, was assigned to the cells based on the relative distance to the crash site using a kernel density function. A tobit model was developed to relate grid-cell-specific contributing factors to crash costs that are left-censored at zero. The potential for safety improvement (PSI) that could be obtained by using the actual crash cost minus the cost of “similar” sites estimated by the tobit model was used as a measure to identify and rank pedestrian crash hotspots. The proposed hotspot identification method takes into account two important factors that are generally ignored, i.e., injury severity and effects of exposure indicators. Big data, on the one hand, enable more precise estimation of the effects of risk factors by providing richer data for modeling, and on the other hand, enable large-scale hotspot identification with higher resolution than conventional methods based on census tracts or traffic analysis zones.
AB - This study aims to explore the potential of using big data in advancing the pedestrian risk analysis including the investigation of contributing factors and the hotspot identification. Massive amounts of data of Manhattan from a variety of sources were collected, integrated, and processed, including taxi trips, subway turnstile counts, traffic volumes, road network, land use, sociodemographic, and social media data. The whole study area was uniformly split into grid cells as the basic geographical units of analysis. The cell-structured framework makes it easy to incorporate rich and diversified data into risk analysis. The cost of each crash, weighted by injury severity, was assigned to the cells based on the relative distance to the crash site using a kernel density function. A tobit model was developed to relate grid-cell-specific contributing factors to crash costs that are left-censored at zero. The potential for safety improvement (PSI) that could be obtained by using the actual crash cost minus the cost of “similar” sites estimated by the tobit model was used as a measure to identify and rank pedestrian crash hotspots. The proposed hotspot identification method takes into account two important factors that are generally ignored, i.e., injury severity and effects of exposure indicators. Big data, on the one hand, enable more precise estimation of the effects of risk factors by providing richer data for modeling, and on the other hand, enable large-scale hotspot identification with higher resolution than conventional methods based on census tracts or traffic analysis zones.
KW - Big data
KW - grid cell analysis
KW - pedestrian risk
UR - http://www.scopus.com/inward/record.url?scp=85016436528&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016436528&partnerID=8YFLogxK
U2 - 10.1111/risa.12785
DO - 10.1111/risa.12785
M3 - Article
C2 - 28314046
AN - SCOPUS:85016436528
SN - 0272-4332
VL - 37
SP - 1459
EP - 1476
JO - Risk Analysis
JF - Risk Analysis
IS - 8
ER -