Carnegie Mellon University
Carnegie Mellon University
Adobe Research
ACM IUI 2026

Visual Lyrics: Generating Animated Text for Music Lyric Videos with an Augmented Text Editor

1Carnegie Mellon University
2Adobe Research
Visual Lyrics interface
Visual Lyrics interface. The system analyzes a song's audio and language features to suggest words that can be highlighted with image, animation, or visual stylizations. On the Annotation Panel (left), suggestions appear as annotations over lyrics (a). The user can edit the annotations to steer the creative direction of the lyric video. The Generation Panel (right) displays generated animated scenes for each line of lyrics (b). The user can see intermediate LLM instructions for creating the images, animations, and visuals (c). The user can regenerate new instructions or edit them manually for finegrained control.

Summary

Animated lyric videos transform song lyrics into dynamic visual experiences, offering a powerful medium for artistic expression and audience engagement. However, creating these videos is challenging, requiring expertise in audio, typography, graphic design, and animation, making it inaccessible to novices. To address this challenge, we introduce Visual Lyrics, a proof-of-concept system for generating animated lyric videos controlled with an augmented text editor interface. We examined existing lyric videos to distill a taxonomy and design guidelines, informing the design of Visual Lyrics. Our key insight is a multimodal music analysis pipeline based on the taxonomy and leveraging LLM's strong natural language understanding and code generation capabilities to synthesize creative and semantically meaningful animations. We collected a dataset of over 300 code-driven creative text animations to serve as inspiration for our LLM-driven pipeline, which we open source. In a user study, Visual Lyrics enabled novices to easily create high-quality animated lyric videos with high ratings of enjoyment, inspiration, and exploration.

Video

About 2 min read

Semantically-Matching Stylizations

Three distinct ways of stylizing lyrics: image, animation, and visual.

🖼️

Image

Image stylization generates a supporting graphic for the video. They can be used for words that are visually-concrete objects or abstract metaphors that can be associated with concrete objects.

Animation

Animation stylization animates the word itself. They can be applied to words related to motion or words sung with special vocal attributes like upwards or downwards pitch shift, word elongation, and vibrato.

🎨

Visual

Visual stylization modifies font attributes, such as font family, size, color, or rotation. They can be used for words related to size, color, emotional qualities, or depending on the energy of the vocals (loud/quiet).

Example Results

Lyric video results

🔊 Videos include audio

Hip-hop/Trap (multilingual) (ZOOM)

Rap/Rock (Lose Yourself)

Disco/Pop (Espresso)

Indie/Electronica (Fireflies)

Pop/R&B (7 Rings)

Trap/Rap (Money)

Rap/Hip-hop (portrait video, multilingual) (TEAM TOMODACHI)

More examples of creative stylizations

Failure Cases

🔊 Videos include audio

Multiple simulataneous vocal lines (texts overlapping each other)

Very fast rap (animation cut off early)

Dataset

We collected a dataset of 306 code-driven creative text animations to serve as inspiration for our generation pipeline.