TinyDigiClones: A Multi-Modal LLM-Based Framework for Edge-optimized Personalized Avatars

Abdul Basit, Muhammad Shafique

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Conversational AI has made significant strides, however the integration of multi-modal interactions, particularly on edge devices, presents a novel frontier to be explored primarily due to the computational constraints. This paper proposes the tinyDigiClones framework, which enables communication with a personalized AI assistant leveraging optimized large language models (LLMs) for natural language processing (NLP), and deep-learning models for automatic speech recognition (ASR) and realistic voice synthesis. This paper explores various options for different AI models employed in our framework, with a primary focus on ensuring efficient deployment on edge devices while maintaining high accuracy. To replicate users' voice fonts and learn the unique vocal characteristics, the Text-to-Speech (TTS) models are trained using a custom dataset of audio-text pairs. It is generated automatically by the ASR module which segments extended sentences into shorter, transcript-matched audio files. Moreover, deploying state-of-the-art LLMs on resource-constrained devices presents a significant challenge, particularly in maintaining minimal latency, given their extensive parameter counts. Towards this, we explore several lightweight LLMs and employ optimization techniques aimed at reducing computational costs. The integration of these models is personified through a digital avatar mirroring the user's facial and voice likeness, offering an immersive experience. Deployment on the edge alleviates the server latency and enhances privacy enabling real-time interaction capabilities of AI chatbots, ideal for interactive digital avatars.

Original languageEnglish (US)
Title of host publication2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350359312
DOIs
StatePublished - 2024
Event2024 International Joint Conference on Neural Networks, IJCNN 2024 - Yokohama, Japan
Duration: Jun 30 2024Jul 5 2024

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2024 International Joint Conference on Neural Networks, IJCNN 2024
Country/TerritoryJapan
CityYokohama
Period6/30/247/5/24

Keywords

  • Automatic Speech Recognition
  • Large Language Models
  • Personalized Digital Avatars
  • Text-to-speech Synthesis

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'TinyDigiClones: A Multi-Modal LLM-Based Framework for Edge-optimized Personalized Avatars'. Together they form a unique fingerprint.

Cite this