Lists (1)
Sort Name ascending (A-Z)
Starred repositories
🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Music repair method to convert lossy MP3 compressed music to lossless music.
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
openvpi / DiffSinger
Forked from MoonInTheRiver/DiffSingerAn advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
Dockerized Facebook Demucs library to make it easy its execution
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Keep track of big models in audio domain, including speech, singing, music etc.
multi-task and multi-track music transcription for everyone
A curated list of awesome article, tutorial, library, webpage, etc.
You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Robust Singing Voice Transcription and MIDI Extraction
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]
首个中文的简单从零开始实现视觉SLAM理论与实践教程,使用Python实现。包括:ORB特征点提取,对极几何,视觉里程计后端优化,实时三维重建地图。A easy SLAM practical tutorial (Python).图像处理、otsu二值化。更多其他教程我的CSDN博客
BPM detection for audio files (currently just .wav). Takes in the whole file, and prints out the BPM.
Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.
A CNN which converts piano audio to a simplified MIDI format