TY - JOUR
T1 - Smart audio signal classification for tracking of construction tasks
AU - Mannem, Karunakar Reddy
AU - Mengiste, Eyob
AU - Hasan, Saed
AU - de Soto, Borja García
AU - Sacks, Rafael
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/9
Y1 - 2024/9
N2 - This paper presents a model for sound classification in construction that leverages a unique combination of Mel spectrograms and Mel-Frequency Cepstral Coefficient (MFCC) values. This model combines deep neural networks like Convolution Neural Networks (CNN) and Long short-term memory (LSTM) to create CNN-LSTM and MFCCs-LSTM architectures, enabling the extraction of spectral and temporal features from audio data. The audio data, generated from construction activities in a real-time closed environment is used to evaluate the proposed model and resulted in an overall Precision, Recall, and F1-score of 91%, 89%, and 91%, respectively. This performance surpasses other established models, including Deep Neural Networks (DNN), CNN, and Recurrent Neural Networks (RNN), as well as a combination of these models as CNN-DNN, CNN-RNN, and CNN-LSTM. These results underscore the potential of combining Mel spectrograms and MFCC values to provide a more informative representation of sound data, thereby enhancing sound classification in noisy environments.
AB - This paper presents a model for sound classification in construction that leverages a unique combination of Mel spectrograms and Mel-Frequency Cepstral Coefficient (MFCC) values. This model combines deep neural networks like Convolution Neural Networks (CNN) and Long short-term memory (LSTM) to create CNN-LSTM and MFCCs-LSTM architectures, enabling the extraction of spectral and temporal features from audio data. The audio data, generated from construction activities in a real-time closed environment is used to evaluate the proposed model and resulted in an overall Precision, Recall, and F1-score of 91%, 89%, and 91%, respectively. This performance surpasses other established models, including Deep Neural Networks (DNN), CNN, and Recurrent Neural Networks (RNN), as well as a combination of these models as CNN-DNN, CNN-RNN, and CNN-LSTM. These results underscore the potential of combining Mel spectrograms and MFCC values to provide a more informative representation of sound data, thereby enhancing sound classification in noisy environments.
KW - Activity tracking
KW - Audio
KW - CNN
KW - LSTM
KW - MFCC
KW - Mel spectrograms
KW - Sound
UR - http://www.scopus.com/inward/record.url?scp=85194565655&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194565655&partnerID=8YFLogxK
U2 - 10.1016/j.autcon.2024.105485
DO - 10.1016/j.autcon.2024.105485
M3 - Article
AN - SCOPUS:85194565655
SN - 0926-5805
VL - 165
JO - Automation in Construction
JF - Automation in Construction
M1 - 105485
ER -