Traditionally vehicles act only as servers in transporting passengers and goods. With increasing sensor equipment in vehicles, including automated vehicles, there is a need to test algorithms that consider the dual role of vehicles as both servers and sensors. The paper formulates a sequential route selection problem as a shortest path problem with on-time arrival reliability under a multi-armed bandit setting, a type of reinforcement learning model. A decision-maker has to make a finite set of decisions sequentially on departure time and path between a fixed origin-destination pair such that on-time reliability is maximized while travel time is minimized. The upper confidence bound algorithm is extended to handle this problem. Several tests are conducted. First, simulated data successfully verifies the method, then a real-data scenario is constructed of a hotel shuttle service from midtown Manhattan in New York City providing hourly access to John F. Kennedy International Airport. Results suggest that route selection with multi-armed bandit learning algorithms can be effective but neglecting passenger scheduling constraints can have negative effects on on-time arrival reliability by as much as 4.8% and combined reliability and travel time by 66.1%.
ASJC Scopus subject areas
- Civil and Structural Engineering
- Mechanical Engineering