Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Instance-aware Fine-grained Micro-action Recognition
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11320–11326https://doi.org/10.1145/3664647.3688976Micro-action involves low-amplitude movement of human body, which brings challenges to common action recognition. This paper focuses on the extremely small region of human body as well as the severe long-tail distribution in micro-action recognition. An ...
- research-articleOctober 2024
Modeling Event-level Causal Representation for Video Classification
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 3936–3944https://doi.org/10.1145/3664647.3681547Classifying videos differs from that of images in the need to capture the information on what has happened, instead of what is in the frames. Conventional methods typically follow the data-driven approach, which uses transformer-based attention models to ...
- short-paperJuly 2024
Detecting and Explaining Emotions in Video Advertisements
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 2734–2738https://doi.org/10.1145/3626772.3657664The use of video advertisements is a common marketing strategy in today's digital age. Extensive research is conducted by companies to comprehend the emotions conveyed in video advertisements, as they play a crucial role in crafting memorable ...
- research-articleAugust 2024
Deep Learning-based Grading Model for Middle School Physical Fitness Test of Table Tennis
MIDA '24: Proceedings of the 2024 International Conference on Machine Intelligence and Digital ApplicationsPages 401–405https://doi.org/10.1145/3662739.3669986Concerns over the physical health of middle school students have prompted the inclusion of physical fitness tests into high school entrance examinations. Table tennis skills assessments, in particular, are popular among students due to their engaging ...
- research-articleJune 2024Best Paper
Identification of Speaker Roles and Situation Types in News Videos
ICMR '24: Proceedings of the 2024 International Conference on Multimedia RetrievalPages 506–514https://doi.org/10.1145/3652583.3658101The proliferation of news sources on the web amplifies the problem of disinformation and misinformation, impacting public perception and societal stability. These issues necessitate the identification of bias in news broadcasts, whereby the analysis and ...
-
- research-articleJanuary 2024
Effectiveness of deep learning techniques in TV programs classification: A comparative analysis
Integrated Computer-Aided Engineering (ICAE), Volume 31, Issue 4Pages 439–453https://doi.org/10.3233/ICA-240740In the application areas of streaming, social networks, and video-sharing platforms such as YouTube and Facebook, along with traditional television systems, programs’ classification stands as a pivotal effort in multimedia content management. Despite ...
- research-articleOctober 2023
Exploring Motion Cues for Video Test-Time Adaptation
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 1840–1850https://doi.org/10.1145/3581783.3612153Test-time adaptation (TTA) aims at boosting the generalization capability of a trained model by conducting self-/un-supervised learning during testing in real-world applications. Though TTA on image-based tasks has seen significant progress, TTA ...
- ArticleJuly 2023
MAF: Multimodal Auto Attention Fusion for Video Classification
Advances and Trends in Artificial Intelligence. Theory and ApplicationsPages 253–264https://doi.org/10.1007/978-3-031-36819-6_22AbstractVideo classification is a complex task that involves analyzing audio and video signals using deep neural models. To reliably classify these signals, researchers have developed multimodal fusion techniques that combine audio and video data into ...
- posterJune 2023
Convolutional Method for Modeling Video Temporal Context Effectively in Transformer
SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingPages 1205–1208https://doi.org/10.1145/3555776.3578481Video understanding remains a challenging task because video understanding models have many parameters to be trained and should capture detailed spatiotemporal contexts in video effectively. Recent methods have typically employed 3D convolution modules ...
- research-articleFebruary 2023
Deep Unsupervised Key Frame Extraction for Efficient Video Classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 19, Issue 3Article No.: 119, Pages 1–17https://doi.org/10.1145/3571735Video processing and analysis have become an urgent task, as a huge amount of videos (e.g., YouTube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is important in video processing and analysis since it ...
- research-articleJune 2023
Introduction of a Method for Systematic Surface Defect Classification on Virtual Car Body Parts
ICIEAEU '23: Proceedings of the 2023 10th International Conference on Industrial Engineering and ApplicationsPages 295–301https://doi.org/10.1145/3587889.3588214This paper proposes a method to classify systematic surface defects on virtual car body parts through a CRNN. Systematic surface defects occur during the manufacturing process of outer surface parts. Those defects get classified by auditors according ...
- research-articleOctober 2022
Long-term Leap Attention, Short-term Periodic Shift for Video Classification
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 5773–5782https://doi.org/10.1145/3503161.3547908Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes T times longer sequence than the latter under the current attention of quadratic complexity (T2N2). The existing works treat the ...
- research-articleOctober 2022
Hierarchical Hourglass Convolutional Network for Efficient Video Classification
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 5880–5891https://doi.org/10.1145/3503161.3547841Videos naturally contain dynamic variation over the temporal axis, which will result in the same visual clues (e.g., semantics, objects) changing their scale, position, and perspective patterns between adjacent frames. A primary trend in video CNN is ...
- research-articleOctober 2022
Transformer Video Classification algorithm based on video token-to-token
ICCCM '22: Proceedings of the 10th International Conference on Computer and Communications ManagementPages 118–124https://doi.org/10.1145/3556223.3556241The expression of video content presents the combination of audio-visual aspects. How to effectively combine audio and video features and generate robust content representation is still a problem to be explored. In this paper, we propose a multi-modal ...
- research-articleOctober 2022
Improving Identity-Relevant Deepfake Video Detection in Real-World with Adversarial Data Augmentation
ICMSSP '22: Proceedings of the 2022 7th International Conference on Multimedia Systems and Signal ProcessingPages 14–18https://doi.org/10.1145/3545822.3545826Recently, GAN-based deepfake videos frequently appeared on video websites, which has a bad impact on the credibility of the videos. Due to the variety of forgery algorithms, the classification methods trained on large-scale datasets often have poor ...
- short-paperOctober 2021
Rethinking the Impacts of Overfitting and Feature Quality on Small-scale Video Classification
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 4760–4764https://doi.org/10.1145/3474085.3479226While Transformers have yielded impressive results for video classification on large datasets recently, simpler models without the transformer architecture can be promising for small datasets. In this paper, we propose three major techniques to improve ...
- short-paperOctober 2021
NJU MCG - Sensetime Team Submission to Pre-training for Video Understanding Challenge Track II
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 4799–4802https://doi.org/10.1145/3474085.3479221This paper presents the method that underlies our submission to the Pre-training for Video Understanding Challenge Track II. We follow the basic pipeline of temporal segment networks [20] and further improve its performance in several aspects. ...
- research-articleOctober 2021
Token Shift Transformer for Video Classification
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 917–925https://doi.org/10.1145/3474085.3475272Transformer achieves remarkable successes in understanding 1 and 2-dimensional signals (e.g., NLP and Image Content Understanding). As a potential alternative to convolutional neural networks, it shares merits of strong interpretability, high ...
- research-articleOctober 2021
When Video Classification Meets Incremental Classes
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 880–889https://doi.org/10.1145/3474085.3475265With the rapid development of social media, tremendous videos with new classes are generated daily, which raise an urgent demand for video classification methods that can continuously update new classes while maintaining the knowledge of old videos with ...
- research-articleOctober 2021
STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition
International Journal of Automation and Computing (SPIJAC), Volume 18, Issue 5Pages 718–730https://doi.org/10.1007/s11633-021-1289-9AbstractLearning comprehensive spatiotemporal features is crucial for human action recognition. Existing methods tend to model the spatiotemporal feature blocks in an integrate-separate-integrate form, such as appearance-and-relation network (ARTNet) and ...