TY - JOUR
T1 - Multi-Armed Bandit On-Time Arrival Algorithms for Sequential Reliable Route Selection under Uncertainty
AU - Zhou, Jinkai
AU - Lai, Xuebo
AU - Chow, Joseph Y.J.
N1 - Publisher Copyright:
© National Academy of Sciences: Transportation Research Board 2019.
PY - 2019/10
Y1 - 2019/10
N2 - Traditionally vehicles act only as servers in transporting passengers and goods. With increasing sensor equipment in vehicles, including automated vehicles, there is a need to test algorithms that consider the dual role of vehicles as both servers and sensors. The paper formulates a sequential route selection problem as a shortest path problem with on-time arrival reliability under a multi-armed bandit setting, a type of reinforcement learning model. A decision-maker has to make a finite set of decisions sequentially on departure time and path between a fixed origin-destination pair such that on-time reliability is maximized while travel time is minimized. The upper confidence bound algorithm is extended to handle this problem. Several tests are conducted. First, simulated data successfully verifies the method, then a real-data scenario is constructed of a hotel shuttle service from midtown Manhattan in New York City providing hourly access to John F. Kennedy International Airport. Results suggest that route selection with multi-armed bandit learning algorithms can be effective but neglecting passenger scheduling constraints can have negative effects on on-time arrival reliability by as much as 4.8% and combined reliability and travel time by 66.1%.
AB - Traditionally vehicles act only as servers in transporting passengers and goods. With increasing sensor equipment in vehicles, including automated vehicles, there is a need to test algorithms that consider the dual role of vehicles as both servers and sensors. The paper formulates a sequential route selection problem as a shortest path problem with on-time arrival reliability under a multi-armed bandit setting, a type of reinforcement learning model. A decision-maker has to make a finite set of decisions sequentially on departure time and path between a fixed origin-destination pair such that on-time reliability is maximized while travel time is minimized. The upper confidence bound algorithm is extended to handle this problem. Several tests are conducted. First, simulated data successfully verifies the method, then a real-data scenario is constructed of a hotel shuttle service from midtown Manhattan in New York City providing hourly access to John F. Kennedy International Airport. Results suggest that route selection with multi-armed bandit learning algorithms can be effective but neglecting passenger scheduling constraints can have negative effects on on-time arrival reliability by as much as 4.8% and combined reliability and travel time by 66.1%.
UR - http://www.scopus.com/inward/record.url?scp=85067847194&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067847194&partnerID=8YFLogxK
U2 - 10.1177/0361198119850457
DO - 10.1177/0361198119850457
M3 - Article
AN - SCOPUS:85067847194
SN - 0361-1981
VL - 2673
SP - 673
EP - 682
JO - Transportation Research Record
JF - Transportation Research Record
IS - 10
ER -