Toward Long Form Audio-Visual Video Understanding
Abstract
References
Index Terms
- Toward Long Form Audio-Visual Video Understanding
Recommendations
Event-centric multi-modal fusion method for dense video captioning
AbstractDense video captioning aims to automatically describe several events that occur in a given video, which most state-of-the-art models accomplish by locating and describing multiple events in an untrimmed video. Despite much progress in ...
The DIRAC AWEAR audio-visual platform for detection of unexpected and incongruent events
ICMI '08: Proceedings of the 10th international conference on Multimodal interfacesIt is of prime importance in everyday human life to cope with and respond appropriately to events that are not foreseen by prior experience. Machines to a large extent lack the ability to respond appropriately to such inputs. An important class of ...
Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in video
Understanding video events, i.e., the translation of low-level content in video sequences into high-level semantic concepts, is a research topic that has received much interest in recent years. Important applications of this paper include smart ...
Comments
Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Funding Sources
- National Natural Science Foundation of China
- Public Computing Cloud, Renmin University of China
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 268Total Downloads
- Downloads (Last 12 months)268
- Downloads (Last 6 weeks)81
Other Metrics
Citations
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in