Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Picture for Lu Lu

Lu Lu

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Add code
Jul 05, 2024
Figure 1 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 2 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 3 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 4 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Viaarxiv icon

A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR

Add code
Jun 25, 2024
Figure 1 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Figure 2 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Figure 3 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Figure 4 for A Comprehensive Solution to Connect Speech Encoder and Large Language Model for ASR
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Viaarxiv icon

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Add code
Jun 19, 2024
Figure 1 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 2 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 3 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Figure 4 for SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Viaarxiv icon

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

Add code
Jun 12, 2024
Viaarxiv icon

Can Large Language Models Understand Spatial Audio?

Add code
Jun 12, 2024
Figure 1 for Can Large Language Models Understand Spatial Audio?
Figure 2 for Can Large Language Models Understand Spatial Audio?
Figure 3 for Can Large Language Models Understand Spatial Audio?
Figure 4 for Can Large Language Models Understand Spatial Audio?
Viaarxiv icon

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Add code
Jun 04, 2024
Figure 1 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 2 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 3 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Figure 4 for Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Viaarxiv icon

Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research

Add code
May 14, 2024
Viaarxiv icon

Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences

Add code
May 08, 2024
Figure 1 for Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
Figure 2 for Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
Figure 3 for Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
Figure 4 for Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences
Viaarxiv icon

SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Add code
Mar 04, 2024
Figure 1 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Figure 2 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Figure 3 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Figure 4 for SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Viaarxiv icon