TY - GEN
T1 - Siamese Adaptive Transformer Network for Real-Time Aerial Tracking
AU - Xing, Daitao
AU - Tsoukalas, Athanasios
AU - Evangeliou, Nikolaos
AU - Giakoumidis, Nikolaos
AU - Tzes, Anthony
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Recent visual object trackers provide strong discriminability towards accurate tracking under challenging scenarios while neglecting the inference efficiency. Those methods handle all inputs with identical computation and fail to reduce intrinsic computational redundancy, which constrains their deployment on Unmanned Aerial Vehicles (UAVs). In this work, we propose a dynamic tracker which selectively activates the individual model components and allocates computation resources on demand during the inference, which allows deep network inference on onboard-CPU at real-time speed. The tracking pipeline is divided into several stages, where each stage consists of a transformer-based encoder that generates a robust target representation by learning pixels interdependence. An adaptive network selection module controls the propagation routing path determining the optimal computational graph according to confidence-based criteria. We further propose a spatial adaptive attention network to avoid computational overhead in the transformer encoder, where the self-attention only aggregates the dependencies information among selected points. Our model achieves a harmonious proportion between accuracy and efficiency for dealing with varying scenarios, leading to notable advantages over static models with a fixed computational cost. Comprehensive experiments on aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed, demonstrating its suitability on UAV-platforms which do not carry a dedicated GPU.
AB - Recent visual object trackers provide strong discriminability towards accurate tracking under challenging scenarios while neglecting the inference efficiency. Those methods handle all inputs with identical computation and fail to reduce intrinsic computational redundancy, which constrains their deployment on Unmanned Aerial Vehicles (UAVs). In this work, we propose a dynamic tracker which selectively activates the individual model components and allocates computation resources on demand during the inference, which allows deep network inference on onboard-CPU at real-time speed. The tracking pipeline is divided into several stages, where each stage consists of a transformer-based encoder that generates a robust target representation by learning pixels interdependence. An adaptive network selection module controls the propagation routing path determining the optimal computational graph according to confidence-based criteria. We further propose a spatial adaptive attention network to avoid computational overhead in the transformer encoder, where the self-attention only aggregates the dependencies information among selected points. Our model achieves a harmonious proportion between accuracy and efficiency for dealing with varying scenarios, leading to notable advantages over static models with a fixed computational cost. Comprehensive experiments on aerial and prevalent tracking benchmarks achieve competitive results while operating at high speed, demonstrating its suitability on UAV-platforms which do not carry a dedicated GPU.
UR - http://www.scopus.com/inward/record.url?scp=85136152355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136152355&partnerID=8YFLogxK
U2 - 10.1109/ICUAS54217.2022.9836047
DO - 10.1109/ICUAS54217.2022.9836047
M3 - Conference contribution
AN - SCOPUS:85136152355
T3 - 2022 International Conference on Unmanned Aircraft Systems, ICUAS 2022
SP - 570
EP - 575
BT - 2022 International Conference on Unmanned Aircraft Systems, ICUAS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Unmanned Aircraft Systems, ICUAS 2022
Y2 - 21 June 2022 through 24 June 2022
ER -