Video captioning using SCN-LSTM models with S2VT baseline
-
Updated
Mar 15, 2023 - Python
Video captioning using SCN-LSTM models with S2VT baseline
For understanding working of transformer.
Data collection and automatic labeling for dense video captioning models
This repository provides a tool for automatically generating subtitles for video content, improving accessibility and viewer experience by adding captions quickly and accurately.
Multimodal Video Captioning project for the Natural Language Processing course at Tsinghua University, spring 2021
Frame-by-Frame Multi-object Tracking Guided Information Augmentation For Video Captioning
Video captioning | Video2Description
auto-caption program for generation word by word captioning on a green-screen video
LSTM RNN and Transformer networks video captioning on MSVD and MSR-VTT using attributes and SVOS
ZerolanCore integrates many open-source, locally deployable AI models, and aims to integrate a series of AI models such as large language model (LLM), automatic speech recognition (ASR), text-to-speech (TTS), image captioning, optical character recognition (OCR), video captioning, etc.
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
An encoder-decoder deep learning model (with/without attention mechanism) where the input is an arabic sign-language video and the output is its translation in text format.
AI based Video summarizer along with captioning.
This project utilizes advanced deep learning techniques to automatically generate contextually relevant captions for videos by extracting spatial and temporal features, while incorporating Gaussian attention to focus on important regions. This enhances video indexing, retrieval, and accessibility for visually impaired individuals.
S2VT with Attention
[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Master Thesis on Multimodal Video Captioning, done at Huawei's Research Center in Amsterdam.
Official code for Global Semantic Descriptors for Zero-Shot Action Recognition (IEEE Signal Processing Letters 2022)
Second-place solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2022 workshop)
Add a description, image, and links to the video-captioning topic page so that developers can more easily learn about it.
To associate your repository with the video-captioning topic, visit your repo's landing page and select "manage topics."