Unified Automatic Control of Vehicular Systems With Reinforcement Learning

Zhongxia Yan, Abdul Rahman Kreidieh, Eugene Vinitsky, Alexandre M. Bayen, Cathy Wu

Research output: Contribution to journalArticlepeer-review


Emerging vehicular systems with increasing proportions of automated components present opportunities for optimal control to mitigate congestion and increase efficiency. There has been a recent interest in applying deep reinforcement learning (DRL) to these nonlinear dynamical systems for the automatic design of effective control strategies. Despite conceptual advantages of DRL being model-free, studies typically nonetheless rely on training setups that are painstakingly specialized to specific vehicular systems. This is a key challenge to efficient analysis of diverse vehicular and mobility systems. To this end, this article contributes a streamlined methodology for vehicular microsimulation and discovers high performance control strategies with minimal manual design. A variable-agent, multi-task approach is presented for optimization of vehicular Partially Observed Markov Decision Processes. The methodology is experimentally validated on mixed autonomy traffic systems, where fractions of vehicles are automated; empirical improvement, typically 15-60% over a human driving baseline, is observed in all configurations of six diverse open or closed traffic systems. The study reveals numerous emergent behaviors resembling wave mitigation, traffic signaling, and ramp metering. Finally, the emergent behaviors are analyzed to produce interpretable control strategies, which are validated against the learned control strategies. Note to Practitioners - As vehicular systems such as real-world traffic systems and robotic warehouses become increasingly automated, optimizing vehicle movements sees an increasing potential to reduce congestion and increase efficiency. For many vehicular systems, simulations of varying fidelity are commonly used for analysis and optimization without the need to deploy real vehicles. This article describes a unified and practical approach for optimal control of vehicles in arbitrary simulated vehicular systems while permitting partial automation, where the behavior of fractions of vehicles at given times can be modelled but not controlled. As illustrated by the diverse traffic systems considered in this article, the presented methodology emphasizes ease of application within any simulated vehicular system while minimizing manual efforts by the practitioner. The control inputs consist of local information around each automated vehicle, while the control outputs are commands for longitudinal acceleration and lateral lane change. Experimental results are presented for relatively small simulated traffic systems, though the methodology can be adapted to larger vehicular systems with minor modifications. Experimentally optimized behaviors provide insights to the practitioner which may assist in designing simplified and interpretable control strategies. Implementation in real-world systems depends on two requirements: 1) a reliable fallback mechanism for ensuring safety of vehicles, and 2) sufficient fidelity of the simulator for simulated behaviors to transfer. These requirements are under active research for traffic systems and may be practical in some robotic settings. To facilitate robust transfer of policies from simulated to real-world systems, future extensions of this work may inject additional randomization into simulation while reducing the unmodeled stochasticity of targeted real-world systems as much as possible.

Original languageEnglish (US)
Pages (from-to)789-804
Number of pages16
JournalIEEE Transactions on Automation Science and Engineering
Issue number2
StatePublished - Apr 1 2023


  • Primary topics: Mobile traffic control
  • automated vehicles
  • multi-agent systems
  • reinforcement learning Secondary topic keywords: Mixed autonomy

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering


Dive into the research topics of 'Unified Automatic Control of Vehicular Systems With Reinforcement Learning'. Together they form a unique fingerprint.

Cite this