Abstract
Recognizing the activities causing distraction in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their generalization ability, efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models like CLIP have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos. CLIP's vision embedding offers zero-shot transfer and task-based finetuning, which can classify distracted activities from naturalistic driving video. Our results show that this framework offers state-of-the-art performance on zero-shot transfer, finetuning and video-based models for predicting the driver's state on four public datasets. We propose frame-based and video-based frameworks developed on top of the CLIP's visual representation for distracted driving detection and classification tasks and report the results. Our code is available at https://github.com/zahid-isu/DriveCLIP
Original language | English (US) |
---|---|
Pages (from-to) | 11602-11616 |
Number of pages | 15 |
Journal | IEEE Transactions on Intelligent Transportation Systems |
Volume | 25 |
Issue number | 9 |
DOIs | |
State | Published - 2024 |
Keywords
- CLIP
- Distracted driving
- computer vision
- embedding
- vision-language model
- zero-shot transfer
ASJC Scopus subject areas
- Automotive Engineering
- Mechanical Engineering
- Computer Science Applications