Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol

Gopika Krishnan, Julia Drabek, Akshay Anantapadmanabhan, Kaustuv Kanti Ganguli, Carlos Guedes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a pipeline to convert spoken Konnakol sequences, a South Indian vocal percussion language, into synthetic rhythmic sequences performed on the mridangam. We fine-tune the Whisper speech-to-text model on Konnakol data, enabling accurate transcription of spoken sequences, despite the small size of our dataset (approximately 15 minutes). The transcriptions are rhythmically encoded in a format that is compatible with the Konnakol Typewriter, a web application that converts these sequences into mridangam audio. Additionally, these transcriptions serve as input for a Markov model, which generates new rhythmic sequences that can also be processed through the Konnakol Typewriter to produce mridangam audio. Whisper's performance is impressive with very low error rates, making it an ideal tool for this task. This pipeline not only facilitates the transcription of Konnakol but also opens possibilities for creating educational tools, preserving cultural heritage, and generating data for rhythm-based applications. Future work will focus on refining the process to improve accuracy and versatility.

Original languageEnglish (US)
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2025 - Workshop Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331519315
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2025 - Hyderabad, India
Duration: Apr 6 2025Apr 11 2025

Publication series

Name2025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2025 - Workshop Proceedings

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2025
Country/TerritoryIndia
CityHyderabad
Period4/6/254/11/25

Keywords

  • Automatic Speech Recognition (ASR)
  • Carnatic Music
  • Konnakol Transcription
  • Machine Learning
  • Markov Chain Generation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Signal Processing
  • Media Technology
  • Acoustics and Ultrasonics

Fingerprint

Dive into the research topics of 'Closing the Loop on Speech to Music Translation: Automatically Generating Synthetic Percussive Sequences on the Mridangam from Konnakol'. Together they form a unique fingerprint.

Cite this