TY - JOUR
T1 - Feature selection for ranking of most influential variables for evacuation behavior modeling across disasters
AU - Demiroluk, Sami
AU - Anil Yazici, M.
AU - Ozbay, Kaan
AU - Carnegie, Jon A.
N1 - Publisher Copyright:
© 2016, National Research Council. All rights reserved.
PY - 2016
Y1 - 2016
N2 - The extensive list of factors that affect the evacuee decision process makes it difficult to design effective surveys and to develop decision models with high predictive power. Regression models and significance levels can help identify relevant variables and overcome this problem to an extent. However, such approaches fall short of ranking these variables or recognizing the redundant ones. In this study, the use of a feature selection method was proposed to ensure that the selected features were relevant and not at the same time redundant. This method, called conditional mutual information maximization, consists of picking features at each step and minimizes the uncertainty in the decision conditional on the response of any feature already picked. As a case study, the variables influencing evacuation behavior in the Northern New Jersey Evacuation Survey were ranked and compared for disaster scenarios. To validate the method and to demonstrate how it compared with the traditional methods, logistic regression models were also estimated with the same data set. It was found that the top-ranked variables might be available through an existing database such as the U.S. census and some could be calculated on the basis of the threat type and government action. This fact can be useful for emergency planners when an evacuation survey for a study area is not readily available. Overall, the feature selection algorithm succeeds in identifying the most influential factors for all threat types. The suggested approach can help both preprocessing (e.g., defining a set of input variables) and postprocessing (e.g., identification of variables that should be kept) for behavioral modeling.
AB - The extensive list of factors that affect the evacuee decision process makes it difficult to design effective surveys and to develop decision models with high predictive power. Regression models and significance levels can help identify relevant variables and overcome this problem to an extent. However, such approaches fall short of ranking these variables or recognizing the redundant ones. In this study, the use of a feature selection method was proposed to ensure that the selected features were relevant and not at the same time redundant. This method, called conditional mutual information maximization, consists of picking features at each step and minimizes the uncertainty in the decision conditional on the response of any feature already picked. As a case study, the variables influencing evacuation behavior in the Northern New Jersey Evacuation Survey were ranked and compared for disaster scenarios. To validate the method and to demonstrate how it compared with the traditional methods, logistic regression models were also estimated with the same data set. It was found that the top-ranked variables might be available through an existing database such as the U.S. census and some could be calculated on the basis of the threat type and government action. This fact can be useful for emergency planners when an evacuation survey for a study area is not readily available. Overall, the feature selection algorithm succeeds in identifying the most influential factors for all threat types. The suggested approach can help both preprocessing (e.g., defining a set of input variables) and postprocessing (e.g., identification of variables that should be kept) for behavioral modeling.
UR - http://www.scopus.com/inward/record.url?scp=85012039875&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85012039875&partnerID=8YFLogxK
U2 - 10.3141/2599-04
DO - 10.3141/2599-04
M3 - Article
AN - SCOPUS:85012039875
SN - 0361-1981
VL - 2599
SP - 24
EP - 32
JO - Transportation Research Record
JF - Transportation Research Record
ER -