Naturalistic Head Motion Generation From Speech

In collaboration with University of Maryland

AuthorsTrisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan, Barry-John Theobald

Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the variation in the perceptual quality of head motions sampled from a generative model. We show that, despite providing more diverse head motions, the generative model produces motions with varying degrees of perceptual quality. We finally show that objective metrics commonly used in previous research do not accurately reflect the perceptual quality of generated head motions. These results open an interesting avenue for future work to investigate better objective metrics that correlate with human perception of quality.

Related readings and updates.

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

January 24, 2025research area Human-Computer Interaction

This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in human-like non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of…

On the Role of Lip Articulation in Visual Speech Perception

April 24, 2023research area Human-Computer Interaction, research area Speech and Natural Language Processingconference ICASSP

*= Equal Contribution

Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understanding what impacts human…

Naturalistic Head Motion Generation From Speech

Related readings and updates.

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

On the Role of Lip Articulation in Visual Speech Perception

Discover opportunities in Machine Learning.