Learning-Based Adaptive Optimal Control for Connected Vehicles in Mixed Traffic: Robustness to Driver Reaction Time

Research output: Contribution to journalArticlepeer-review


Through vehicle-to-vehicle (V2V) communication, both human-driven and autonomous vehicles can actively exchange data, such as velocities and bumper-to-bumper distances. Employing the shared data, control laws with improved performance can be designed for connected and autonomous vehicles (CAVs). In this article, taking into account human-vehicle interaction and heterogeneous driver behavior, an adaptive optimal control design method is proposed for a platoon mixed with multiple preceding human-driven vehicles and one CAV at the tail. It is shown that by using reinforcement learning and adaptive dynamic programming techniques, a near-optimal controller can be learned from real-time data for the CAV with V2V communications, but without the precise knowledge of the accurate car-following parameters of any driver in the platoon. The proposed method allows the CAV controller to adapt to different platoon dynamics caused by the unknown and heterogeneous driver-dependent parameters. To improve the safety performance during the learning process, our off-policy learning algorithm can leverage both the historical data and the data collected in real time, which leads to considerably reduced learning time duration. The effectiveness and efficiency of our proposed method is demonstrated by rigorous proofs and microscopic traffic simulations.

Original languageEnglish (US)
Pages (from-to)5267-5277
Number of pages11
JournalIEEE Transactions on Cybernetics
Issue number6
StatePublished - Jun 1 2022


  • Adaptive dynamic programming (ADP)
  • autonomous vehicles
  • connected vehicles
  • time-delay system

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Learning-Based Adaptive Optimal Control for Connected Vehicles in Mixed Traffic: Robustness to Driver Reaction Time'. Together they form a unique fingerprint.

Cite this