Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3547976acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Partially Relevant Video Retrieval

Published: 10 October 2022 Publication History

Abstract

Current methods for text-to-video retrieval (T2VR) are trained and tested on video-captioning oriented datasets such as MSVD, MSR-VTT and VATEX. A key property of these datasets is that videos are assumed to be temporally pre-trimmed with short duration, whilst the provided captions well describe the gist of the video content. Consequently, for a given paired video and caption, the video is supposed to be fully relevant to the caption. In reality, however, as queries are not known a priori, pre-trimmed video clips may not contain sufficient content to fully meet the query. This suggests a gap between the literature and the real world. To fill the gap, we propose in this paper a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR). An untrimmed video is considered to be partially relevant w.r.t. a given textual query if it contains a moment relevant to the query. PRVR aims to retrieve such partially relevant videos from a large collection of untrimmed videos. PRVR differs from single video moment retrieval and video corpus moment retrieval, as the latter two are to retrieve moments rather than untrimmed videos. We formulate PRVR as a multiple instance learning (MIL) problem, where a video is simultaneously viewed as a bag of video clips and a bag of video frames. Clips and frames represent video content at different time scales. We propose a Multi-Scale Similarity Learning (MS-SL) network that jointly learns clip-scale and frame-scale similarities for PRVR. Extensive experiments on three datasets (TVR, ActivityNet Captions, and Charades-STA) demonstrate the viability of the proposed method. We also show that our method can be used for improving video corpus moment retrieval.

Supplementary Material

MP4 File (MM22-fp0929.mp4)
This video is about the paper named Partially Relevant Video Retrieval accepted by ACM MM 2022. To fill the conventional text-to-video retrieval(T2VR) task gap between the literature and the real world, we propose a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR) in the paper. In the video, we will start with the motivation of the paper, and report the related work, methods, and experiments in turn. If you are interested in our research, please refer to our paper's homepage http://danieljf24.github.io/prvr/ to get the full paper and source code.

References

[1]
Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. 2017. Localizing moments in video with natural language. In Proceedings of the IEEE International Conference on Computer Vision. 5803--5812.
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[3]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[4]
David Chen and William B Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 190--200.
[5]
Jingyuan Chen, Xinpeng Chen, Lin Ma, Zequn Jie, and Tat-Seng Chua. 2018. Temporally grounding natural sentence in video. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 162--171.
[6]
Long Chen, Chujie Lu, Siliang Tang, Jun Xiao, Dong Zhang, Chilie Tan, and Xiaolin Li. 2020. Rethinking the bottom-up framework for query-based video localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10551--10558.
[7]
Shizhe Chen, Yida Zhao, Qin Jin, and Qi Wu. 2020. Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10638--10647.
[8]
Ioana Croitoru, Simion-Vlad Bogolin, Yang Liu, Samuel Albanie, Marius Leordeanu, Hailin Jin, and Andrew Zisserman. 2021. TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 11583--11593.
[9]
Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89, 1--2 (1997), 31--71.
[10]
Jianfeng Dong, Xirong Li, and Cees GM Snoek. 2018. Predicting visual features from text for image and video caption retrieval. IEEE Transactions on Multimedia 20, 12 (2018), 3377--3388.
[11]
Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, and Xun Wang. 2019. Dual encoding for zero-example video retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9346--9355.
[12]
Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang, and Meng Wang. 2021. Dual encoding for video retrieval by text. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[13]
Jianfeng Dong, Yabing Wang, Xianke Chen, Xiaoye Qu, Xirong Li, Yuan He, and Xun Wang. 2022. Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[14]
Victor Escorcia, Mattia Soldan, Josef Sivic, Bernard Ghanem, and Bryan Russell. 2019. Temporal localization of moments in video collections with natural language. arXiv preprint arXiv:1907.12763 (2019).
[15]
Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE: Improving visual-semantic embeddings with hard negatives. In Proceedings of the British Machine Vision Conference. 935--943.
[16]
Ji Feng and Zhi-Hua Zhou. 2017. Deep MIML network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31. 14747--14350.
[17]
Zerun Feng, Zhimin Zeng, Caili Guo, and Zheng Li. 2021. Exploiting visual semantic reasoning for video-text retrieval. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 1005--1011.
[18]
Valentin Gabeur, Chen Sun, Karteek Alahari, and Cordelia Schmid. 2020. Multimodal transformer for video retrieval. In European Conference on Computer Vision. 214--229.
[19]
Jiyang Gao, Chen Sun, Zhenheng Yang, and Ram Nevatia. 2017. Tall: Temporal activity localization via language query. In Proceedings of the IEEE International Conference on Computer Vision. 5267--5275.
[20]
Junyu Gao and Changsheng Xu. 2021. Fast video moment retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1523--1532.
[21]
Simon Ging, Mohammadreza Zolfaghari, Hamed Pirsiavash, and Thomas Brox. 2020. COOT: Cooperative hierarchical transformer for video-text representation learning. Advances in Neural Information Processing Systems 33 (2020), 22605-- 22618.
[22]
Ning Han, Jingjing Chen, Guangyi Xiao, Hao Zhang, Yawen Zeng, and Hao Chen. 2021. Fine-grained Cross-modal Alignment Network for Text-Video Retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 3826--3834.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[24]
Zhijian Hou, Chong-Wah Ngo, and Wing Kwong Chan. 2021. CONQUER: Contextual query-aware ranking for video corpus moment retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 3900--3908.
[25]
Fan Hu, Aozhu Chen, ZiyueWang, Fangming Zhou, Jianfeng Dong, and Xirong Li. 2022. Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval. In Proceedings of the 17th European Conference on Computer Vision.
[26]
Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. In International Conference on Machine Learning. 2127--2136.
[27]
Weike Jin, Zhou Zhao, Pengcheng Zhang, Jieming Zhu, Xiuqiang He, and Yueting Zhuang. 2021. Hierarchical Cross-Modal Graph Consistency Learning for Video- Text Retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1114--1124.
[28]
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. 2017. Dense-captioning events in videos. In Proceedings of the IEEE International Conference on Computer Vision. 706--715.
[29]
Jie Lei, Licheng Yu, Tamara L Berg, and Mohit Bansal. 2020. TVR: A large-scale dataset for video-subtitle moment retrieval. In European Conference on Computer Vision. 447--463.
[30]
Bin Li, Yin Li, and KevinWEliceiri. 2021. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14318--14328.
[31]
Xirong Li, Chaoxi Xu, Gang Yang, Zhineng Chen, and Jianfeng Dong. 2019. W2VV: Fully deep learning for ad-hoc video search. In Proceedings of the 27th ACM International Conference on Multimedia. 1786--1794.
[32]
Xirong Li, Fangming Zhou, Chaoxi Xu, Jiaqi Ji, and Gang Yang. 2020. SEA: Sentence encoder assembly for video retrieval by textual queries. IEEE Transactions on Multimedia 23 (2020), 4351--4362.
[33]
Xirong Li, Yang Zhou, Jie Wang, Hailan Lin, Jianchun Zhao, Dayong Ding, Weihong Yu, and Youxin Chen. 2021. Multi-Modal Multi-Instance Learning for Retinal Disease Recognition. In Proceedings of the 29th ACM International Conference on Multimedia. 2474--2482.
[34]
Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, and Yulai Xie. 2021. Context-aware biaffine localizing network for temporal sentence grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11235--11244.
[35]
Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, and Zichuan Xu. 2020. Jointly cross-and self-modal graph attention network for query-based moment localization. In Proceedings of the 28th ACM International Conference on Multimedia. 4070--4078.
[36]
Hongying Liu, Ruyi Luo, Fanhua Shang, Mantang Niu, and Yuanyuan Liu. 2021. Progressive Semantic Matching for Video-Text Retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 5083--5091.
[37]
Yang Liu, Samuel Albanie, Arsha Nagrani, and Andrew Zisserman. 2019. Use what you have: Video retrieval using representations from collaborative experts. arXiv preprint arXiv:1907.13487 (2019).
[38]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[39]
Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, and Tianrui Li. 2021. CLIP4clip: An empirical study of clip for end to end video clip retrieval. arXiv preprint arXiv:2104.08860 (2021).
[40]
Oded Maron and Tomás Lozano-Pérez. 1997. A framework for multiple-instance learning. Advances in Neural Information Processing Systems 10 (1997).
[41]
Antoine Miech, Jean-Baptiste Alayrac, Lucas Smaira, Ivan Laptev, Josef Sivic, and Andrew Zisserman. 2020. End-to-end learning of visual representations from uncurated instructional videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9879--9889.
[42]
Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. 2019. HowTo100M: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2630--2640.
[43]
Jonghwan Mun, Minsu Cho, and Bohyung Han. 2020. Local-global video-text interactions for temporal grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10810--10819.
[44]
Maxime Oquab, Léon Bottou, Ivan Laptev, and Josef Sivic. 2015. Is object localization for free?-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 685--694.
[45]
Sudipta Paul, Niluthpol Chowdhury Mithun, and Amit K Roy-Chowdhury. 2021. Text-based localization of moments in a video corpus. IEEE Transactions on Image Processing 30 (2021), 8886--8899.
[46]
Pedro O Pinheiro and Ronan Collobert. 2015. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1713--1721.
[47]
Xiaoye Qu, Pengwei Tang, Zhikang Zou, Yu Cheng, Jianfeng Dong, Pan Zhou, and Zichuan Xu. 2020. Fine-grained iterative attention network for temporal language localization in videos. In Proceedings of the 28th ACM International Conference on Multimedia. 4280--4288.
[48]
Xue Song, Jingjing Chen, ZuxuanWu, and Yu-Gang Jiang. 2021. Spatial-temporal graphs for cross-modal text2video retrieval. IEEE Transactions on Multimedia (2021).
[49]
Yale Song and Mohammad Soleymani. 2019. Polysemous visual-semantic embedding for cross-modal retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1979--1988.
[50]
Ming Tu, Jing Huang, Xiaodong He, and Bowen Zhou. 2019. Multiple instance learning with graph neural networks. arXiv preprint arXiv:1906.04881 (2019).
[51]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[52]
JingwenWang, Lin Ma, andWenhao Jiang. 2020. Temporally grounding language queries in videos by contextual boundary-aware prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12168--12175.
[53]
Wei Wang, Junyu Gao, Xiaoshan Yang, and Changsheng Xu. 2020. Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval. IEEE Transactions on Multimedia (2020).
[54]
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, and William Yang Wang. 2019. VATEX: A large-scale, high-quality multilingual dataset for videoand- language research. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4581--4591.
[55]
Yunxiao Wang, Meng Liu, Yinwei Wei, Zhiyong Cheng, Yinglong Wang, and Liqiang Nie. 2022. Siamese Alignment Network for Weakly Supervised Video Moment Retrieval. IEEE Transactions on Multimedia (2022).
[56]
Zheng Wang, Jingjing Chen, and Yu-Gang Jiang. 2021. Visual co-occurrence alignment learning for weakly-supervised video moment retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 1459--1468.
[57]
Peng Wu, Xiangteng He, Mingqian Tang, Yiliang Lv, and Jing Liu. 2021. HANet: Hierarchical Alignment Networks for Video-Text Retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 3518--3527.
[58]
Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, and Jun Xiao. 2021. Boundary proposal network for two-stage natural language video localization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2986--2994.
[59]
Jun Xu, Tao Mei, Ting Yao, and Yong Rui. 2016. MSR-VTT: A large video description dataset for bridging video and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5288--5296.
[60]
Xun Yang, Jianfeng Dong, Yixin Cao, XunWang, MengWang, and Tat-Seng Chua. 2020. Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval.In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1339--1348.
[61]
Xun Yang, Fuli Feng,Wei Ji, MengWang, and Tat-Seng Chua. 2021. Deconfounded video moment retrieval with causal intervention. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1--10.
[62]
Xun Yang, ShanshanWang, Jian Dong, Jianfeng Dong, MengWang, and Tat-Seng Chua. 2022. Video moment retrieval with cross-modal neural architecture search. IEEE Transactions on Image Processing 31 (2022), 1204--1216.
[63]
Youngjae Yu, Jongseok Kim, and Gunhee Kim. 2018. A joint sequence fusion model for video question answering and retrieval. In Proceedings of the European Conference on Computer Vision. 471--487.
[64]
Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, and Wenwu Zhu. 2019. Semantic conditioned dynamic modulation for temporal sentence grounding in videos. Advances in Neural Information Processing Systems 32 (2019).
[65]
Yitian Yuan, Tao Mei, and Wenwu Zhu. 2019. To find where you talk: Temporal sentence localization in video with attention based location regression. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9159--9166.
[66]
Bowen Zhang, Hexiang Hu, Joonseok Lee, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, and Fei Sha. 2020. A hierarchical multi-modal encoder for moment localization in video corpus. arXiv preprint arXiv:2011.09046 (2020).
[67]
Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, and Larry S Davis. 2019. Man: Moment alignment network for natural language moment retrieval via iterative graph adjustment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1247--1257.
[68]
Hao Zhang, Aixin Sun, Wei Jing, Guoshun Nan, Liangli Zhen, Joey Tianyi Zhou, and Rick Siow Mong Goh. 2021. Video corpus moment retrieval with contrastive learning. In Proceedings of the 44th International ACMSIGIR Conference on Research and Development in Information Retrieval. 685--695.
[69]
Songyang Zhang, Houwen Peng, Jianlong Fu, and Jiebo Luo. 2020. Learning 2d temporal adjacent networks for moment localization with natural language. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12870--12877.
[70]
Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, YabingWang, Pan Zhou, Baolong Liu, and Xun Wang. 2022. Progressive localization networks for language-based moment localization. ACM Transactions on Multimedia Computing, Communications, and Applications (2022).

Cited By

View all
  • (2024)Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366357120:10(1-21)Online publication date: 12-Sep-2024
  • (2024)A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple DistractorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679838(2869-2878)Online publication date: 21-Oct-2024
  • (2024)Weakly Supervised Video Moment Retrieval via Location-irrelevant Proposal LearningCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3651942(1595-1603)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multiple instance learning
  2. partially relevant
  3. video representation learning
  4. video-text retrieval

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China
  • NSFC
  • Public Welfare Technology Research Project of Zhejiang Province

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)22
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exploiting Instance-level Relationships in Weakly Supervised Text-to-Video RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366357120:10(1-21)Online publication date: 12-Sep-2024
  • (2024)A New Framework for Evaluating Faithfulness of Video Moment Retrieval against Multiple DistractorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679838(2869-2878)Online publication date: 21-Oct-2024
  • (2024)Weakly Supervised Video Moment Retrieval via Location-irrelevant Proposal LearningCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3651942(1595-1603)Online publication date: 13-May-2024
  • (2024)Toward Video Anomaly Retrieval From Video Anomaly Detection: New Benchmarks and ModelIEEE Transactions on Image Processing10.1109/TIP.2024.337407033(2213-2225)Online publication date: 2024
  • (2024)Emotional Video Captioning With Vision-Based Emotion Interpretation NetworkIEEE Transactions on Image Processing10.1109/TIP.2024.335904533(1122-1135)Online publication date: 2024
  • (2024)Video Corpus Moment Retrieval via Deformable Multigranularity Feature Fusion and Adversarial TrainingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329456734:8(6686-6698)Online publication date: Aug-2024
  • (2024)Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01566(16551-16560)Online publication date: 16-Jun-2024
  • (2024)Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01288(13569-13580)Online publication date: 16-Jun-2024
  • (2024)Hierarchical matching and reasoning for multi-query image retrievalNeural Networks10.1016/j.neunet.2024.106200173(106200)Online publication date: May-2024
  • (2023)Feature Enhancement and Foreground-Background Separation for Weakly Supervised Temporal Action LocalizationProceedings of the 5th ACM International Conference on Multimedia in Asia10.1145/3595916.3626423(1-7)Online publication date: 6-Dec-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media