T3M: Text Guided 3D Human Motion Synthesis from Speech

Wenshuo Peng, Kaipeng Zhang, Sai Qian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Speech-driven 3D motion synthesis seeks to create lifelike animations based on human speech, with potential uses in virtual reality, gaming, and the film production. Existing approaches reply solely on speech audio for motion generation, leading to inaccurate and inflexible synthesis results. To mitigate this problem, we introduce a novel text-guided 3D human motion synthesis method, termed T3M. Unlike traditional approaches, T3M allows precise control over motion synthesis via textual input, enhancing the degree of diversity and user customization. The experiment results demonstrate that T3M can greatly outperform the state-of-the-art methods in both quantitative metrics and qualitative evaluations. We have publicly released our code at https://github.com/Gloria2tt/naacl2024.git.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2024 - Findings
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics (ACL)
Pages1168-1177
Number of pages10
ISBN (Electronic)9798891761193
DOIs
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics: NAACL 2024 - Mexico City, Mexico
Duration: Jun 16 2024Jun 21 2024

Publication series

NameFindings of the Association for Computational Linguistics: NAACL 2024 - Findings

Conference

Conference2024 Findings of the Association for Computational Linguistics: NAACL 2024
Country/TerritoryMexico
CityMexico City
Period6/16/246/21/24

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'T3M: Text Guided 3D Human Motion Synthesis from Speech'. Together they form a unique fingerprint.

Cite this