Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Picture for Yuxuan Wang

Yuxuan Wang

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Jun 25, 2024
Viaarxiv icon

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Add code
Jun 24, 2024
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Viaarxiv icon

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Add code
Jun 19, 2024
Viaarxiv icon

Bayesian Intervention Optimization for Causal Discovery

Jun 16, 2024
Viaarxiv icon

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

Jun 12, 2024
Viaarxiv icon

Can Large Language Models Understand Spatial Audio?

Jun 12, 2024
Viaarxiv icon

Progressive Confident Masking Attention Network for Audio-Visual Segmentation

Jun 04, 2024
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Viaarxiv icon

Human-Centered LLM-Agent User Interface: A Position Paper

Add code
May 19, 2024
Figure 1 for Human-Centered LLM-Agent User Interface: A Position Paper
Figure 2 for Human-Centered LLM-Agent User Interface: A Position Paper
Figure 3 for Human-Centered LLM-Agent User Interface: A Position Paper
Figure 4 for Human-Centered LLM-Agent User Interface: A Position Paper
Viaarxiv icon