Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Picture for Wenyi Yu

Wenyi Yu

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Viaarxiv icon

Can Large Language Models Understand Spatial Audio?

Jun 12, 2024
Viaarxiv icon

An Improved Empirical Fisher Approximation for Natural Gradient Descent

Add code
Jun 10, 2024
Viaarxiv icon

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Add code
Mar 21, 2024
Figure 1 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 2 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 3 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Figure 4 for M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Viaarxiv icon

SALMONN: Towards Generic Hearing Abilities for Large Language Models

Add code
Oct 20, 2023
Viaarxiv icon

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Add code
Oct 10, 2023
Viaarxiv icon

Connecting Speech Encoder and Large Language Model for ASR

Sep 26, 2023
Viaarxiv icon