Tracking and Relative Localization of Drone Swarms with a Vision-Based Headset

Maxim Pavliv, Fabrizio Schiano, Christopher Reardon, Dario Floreano, Giuseppe Loianno

Research output: Contribution to journalArticlepeer-review


We address the detection, tracking, and relative localization of the agents of a drone swarm from a human perspective using a headset equipped with a single camera and an Inertial Measurement Unit (IMU). We train and deploy a deep neural network detector on image data to detect the drones. A joint probabilistic data association filter resolves the detection problems and couples this information with the headset IMU data to track the agents. In order to estimate the drones' relative poses in 3D space with respect to the human, we use an additional deep neural network that processes image regions of the drones provided by the tracker. Finally, to speed up the deep neural networks' training, we introduce an automated labeling process relying on a motion capture system. Several experimental results validate the effectiveness of the proposed approach. The approach is real-time, does not rely on any communication between the human and the drones, and can scale to a large number of agents, often called swarms. It can be used to spatially task a swarm of drones and also employed without a headset for formation control and coordination of terrestrial vehicles.

Original languageEnglish (US)
Article number9324934
Pages (from-to)1455-1462
Number of pages8
JournalIEEE Robotics and Automation Letters
Issue number2
StatePublished - Apr 2021


  • Aerial systems
  • applications
  • human-centered robotics
  • localization

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Biomedical Engineering
  • Human-Computer Interaction
  • Mechanical Engineering
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Control and Optimization
  • Artificial Intelligence


Dive into the research topics of 'Tracking and Relative Localization of Drone Swarms with a Vision-Based Headset'. Together they form a unique fingerprint.

Cite this