TY - GEN
T1 - EdgeL3
T2 - 33rd IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
AU - Kumari, Sangeeta
AU - Roy, Dhrubojyoti
AU - Cartwright, Mark
AU - Bello, Juan Pablo
AU - Arora, Anish
N1 - Funding Information:
§ Co-primary authors The authors would like to thank Jason Cramer, Ho-Hsiang Wu and Justin Salamon for their valuable feedback. This work is supported by NSF award 1544753.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Urban noise sensing in deeply embedded devices at the edge of the Internet of Things (IoT) is challenging not only because of the lack of sufficiently labeled training data but also because device resources are quite limited. Look, Listen, and Learn (L3), a recently proposed state-of-the-art transfer learning technique, mitigates the first challenge by training self-supervised deep audio embeddings through binary Audio-Visual Correspondence (AVC), and the resulting embeddings can be used to train a variety of downstream audio classification tasks. However, with close to 4.7 million parameters, the multi-layer L3-Net CNN is still prohibitively expensive to be run on small edge devices, such as 'motes' that use a single microcontroller and limited memory to achieve long-lived self-powered operation. In this paper, we comprehensively explore the feasibility of compressing the L3-Net for mote-scale inference. We use pruning, ablation, and knowledge distillation techniques to show that the originally proposed L3-Net architecture is substantially overparameterized, not only for AVC but for the target task of sound classification as evaluated on two popular downstream datasets. Our findings demonstrate the value of fine-tuning and knowledge distillation in regaining the performance lost through aggressive compression strategies. Finally, we present EdgeL3, the first L3-Net reference model compressed by 1-2 orders of magnitude for real-time urban noise monitoring on resource-constrained edge devices, that can fit in just 0.4 MB of memory through half-precision floating point representation.
AB - Urban noise sensing in deeply embedded devices at the edge of the Internet of Things (IoT) is challenging not only because of the lack of sufficiently labeled training data but also because device resources are quite limited. Look, Listen, and Learn (L3), a recently proposed state-of-the-art transfer learning technique, mitigates the first challenge by training self-supervised deep audio embeddings through binary Audio-Visual Correspondence (AVC), and the resulting embeddings can be used to train a variety of downstream audio classification tasks. However, with close to 4.7 million parameters, the multi-layer L3-Net CNN is still prohibitively expensive to be run on small edge devices, such as 'motes' that use a single microcontroller and limited memory to achieve long-lived self-powered operation. In this paper, we comprehensively explore the feasibility of compressing the L3-Net for mote-scale inference. We use pruning, ablation, and knowledge distillation techniques to show that the originally proposed L3-Net architecture is substantially overparameterized, not only for AVC but for the target task of sound classification as evaluated on two popular downstream datasets. Our findings demonstrate the value of fine-tuning and knowledge distillation in regaining the performance lost through aggressive compression strategies. Finally, we present EdgeL3, the first L3-Net reference model compressed by 1-2 orders of magnitude for real-time urban noise monitoring on resource-constrained edge devices, that can fit in just 0.4 MB of memory through half-precision floating point representation.
KW - Audio embedding
KW - Convolutional neural nets
KW - Deep learning
KW - Edge network
KW - Finetuning
KW - Knowledge distillation
KW - Pruning
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85070367374&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85070367374&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2019.00145
DO - 10.1109/IPDPSW.2019.00145
M3 - Conference contribution
AN - SCOPUS:85070367374
T3 - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
SP - 877
EP - 884
BT - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 May 2019 through 24 May 2019
ER -