Stars
看<Parsing Techniques -- A practical Guide>,顺手撸的一点代码,纯练习用.
Chinese translation of Bjarne Stroustrup's HOPL4 paper
This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.
VB Diarization with Eigenvoice and HMM Priors, refactored
Variational Bayes HMM over x-vectors diarization
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset
In defence of metric learning for speaker recognition
Simple, online, and realtime tracking of multiple objects in a video sequence.
Sequence modeling benchmarks and temporal convolutional networks
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Pytorch code for "Rethinking CNN Models for Audio Classification"
CNN based audio classifier by pytorch (LeNet / VGG / ResNet)
You can find the speech algorithms you want here
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Multimodal speaker diarization using pre-trained audio-visual synchronization model
Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
Tiny-ImageNet Classifier using Pytorch
an Audio-Visual Voice Activity Detection using Deep Learning
A PyTorch implementation of End-to-End Neural Diarization
Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus
A curated list of Multimodal Related Research.
[CVPR'19] [PyTorch] Gated Spatio Temporal Energy Graph
Memory Enhanced Global-Local Aggregation for Video Object Detection, CVPR2020