Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

"T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs."

Shukang Yin et al. (2024)

Details and statistics

DOI: 10.48550/ARXIV.2411.19951

access: open

type: Informal or Other Publication

metadata version: 2025-01-03