TY - GEN
T1 - TinyDigiClones
T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024
AU - Basit, Abdul
AU - Shafique, Muhammad
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Conversational AI has made significant strides, however the integration of multi-modal interactions, particularly on edge devices, presents a novel frontier to be explored primarily due to the computational constraints. This paper proposes the tinyDigiClones framework, which enables communication with a personalized AI assistant leveraging optimized large language models (LLMs) for natural language processing (NLP), and deep-learning models for automatic speech recognition (ASR) and realistic voice synthesis. This paper explores various options for different AI models employed in our framework, with a primary focus on ensuring efficient deployment on edge devices while maintaining high accuracy. To replicate users' voice fonts and learn the unique vocal characteristics, the Text-to-Speech (TTS) models are trained using a custom dataset of audio-text pairs. It is generated automatically by the ASR module which segments extended sentences into shorter, transcript-matched audio files. Moreover, deploying state-of-the-art LLMs on resource-constrained devices presents a significant challenge, particularly in maintaining minimal latency, given their extensive parameter counts. Towards this, we explore several lightweight LLMs and employ optimization techniques aimed at reducing computational costs. The integration of these models is personified through a digital avatar mirroring the user's facial and voice likeness, offering an immersive experience. Deployment on the edge alleviates the server latency and enhances privacy enabling real-time interaction capabilities of AI chatbots, ideal for interactive digital avatars.
AB - Conversational AI has made significant strides, however the integration of multi-modal interactions, particularly on edge devices, presents a novel frontier to be explored primarily due to the computational constraints. This paper proposes the tinyDigiClones framework, which enables communication with a personalized AI assistant leveraging optimized large language models (LLMs) for natural language processing (NLP), and deep-learning models for automatic speech recognition (ASR) and realistic voice synthesis. This paper explores various options for different AI models employed in our framework, with a primary focus on ensuring efficient deployment on edge devices while maintaining high accuracy. To replicate users' voice fonts and learn the unique vocal characteristics, the Text-to-Speech (TTS) models are trained using a custom dataset of audio-text pairs. It is generated automatically by the ASR module which segments extended sentences into shorter, transcript-matched audio files. Moreover, deploying state-of-the-art LLMs on resource-constrained devices presents a significant challenge, particularly in maintaining minimal latency, given their extensive parameter counts. Towards this, we explore several lightweight LLMs and employ optimization techniques aimed at reducing computational costs. The integration of these models is personified through a digital avatar mirroring the user's facial and voice likeness, offering an immersive experience. Deployment on the edge alleviates the server latency and enhances privacy enabling real-time interaction capabilities of AI chatbots, ideal for interactive digital avatars.
KW - Automatic Speech Recognition
KW - Large Language Models
KW - Personalized Digital Avatars
KW - Text-to-speech Synthesis
UR - http://www.scopus.com/inward/record.url?scp=85204956803&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204956803&partnerID=8YFLogxK
U2 - 10.1109/IJCNN60899.2024.10649909
DO - 10.1109/IJCNN60899.2024.10649909
M3 - Conference contribution
AN - SCOPUS:85204956803
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2024 through 5 July 2024
ER -