Visual Lyrics: Generating Animated Text for Music Lyric Videos with an Augmented Text Editor

Summary
Animated lyric videos transform song lyrics into dynamic visual experiences, offering a powerful medium for artistic expression and audience engagement. However, creating these videos is challenging, requiring expertise in audio, typography, graphic design, and animation, making it inaccessible to novices. To address this challenge, we introduce Visual Lyrics, a proof-of-concept system for generating animated lyric videos controlled with an augmented text editor interface. We examined existing lyric videos to distill a taxonomy and design guidelines, informing the design of Visual Lyrics. Our key insight is a multimodal music analysis pipeline based on the taxonomy and leveraging LLM's strong natural language understanding and code generation capabilities to synthesize creative and semantically meaningful animations. We collected a dataset of over 300 code-driven creative text animations to serve as inspiration for our LLM-driven pipeline, which we open source. In a user study, Visual Lyrics enabled novices to easily create high-quality animated lyric videos with high ratings of enjoyment, inspiration, and exploration.
Video
Semantically-Matching Stylizations
Three distinct ways of stylizing lyrics: image, animation, and visual.
Image
Image stylization generates a supporting graphic for the video. They can be used for words that are visually-concrete objects or abstract metaphors that can be associated with concrete objects.
Animation
Animation stylization animates the word itself. They can be applied to words related to motion or words sung with special vocal attributes like upwards or downwards pitch shift, word elongation, and vibrato.
Visual
Visual stylization modifies font attributes, such as font family, size, color, or rotation. They can be used for words related to size, color, emotional qualities, or depending on the energy of the vocals (loud/quiet).
Example Results
Lyric video results
🔊 Videos include audio
Hip-hop/Trap (multilingual) (ZOOM)
Rap/Rock (Lose Yourself)
Disco/Pop (Espresso)
Indie/Electronica (Fireflies)
Pop/R&B (7 Rings)
Trap/Rap (Money)
Rap/Hip-hop (portrait video, multilingual) (TEAM TOMODACHI)
More examples of creative stylizations
Failure Cases
🔊 Videos include audio
Multiple simulataneous vocal lines (texts overlapping each other)
Very fast rap (animation cut off early)
Dataset
We collected a dataset of 306 code-driven creative text animations to serve as inspiration for our generation pipeline.




