Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2024
P2FTrack: Multi-Object Tracking with Motion Prior and Feature Posterior
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 21, Issue 1Article No.: 22, Pages 1–22https://doi.org/10.1145/3700443Multiple object tracking (MOT) has emerged as a crucial component of the rapidly developing computer vision. However, existing multi-object tracking methods often overlook the relationship between features and motion, hindering the ability to strike a ...
- short-paperDecember 2024
TraMSR: Transformer and Mamba based Practical Speech Super-Resolution for Mobile Wearables
ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and NetworkingPages 1686–1688https://doi.org/10.1145/3636534.3697460Speech super-resolution techniques offer a promising solution to enhance audio quality in wearable devices, particularly when addressing the challenges of reduced sampling rates necessitated by battery life constraints and network instability. However, ...
- research-articleNovember 2024
TSFormer: Tracking Structure Transformer for Image Inpainting
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 12Article No.: 381, Pages 1–23https://doi.org/10.1145/3696452Recent studies have shown that image structure can significantly facilitate image inpainting. However, current approaches mostly explore structure prior without considering its guidance to texture reconstruction, leading to performance degradation. To ...
- research-articleNovember 2024
Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 12Article No.: 374, Pages 1–20https://doi.org/10.1145/3695877Category-level pose estimation is proposed to predict the 6D pose of objects under a specific category and has wide applications in fields such as robotics, virtual reality, and autonomous driving. With the development of VR/AR technology, pose estimation ...
- research-articleNovember 2024
Gradient-Based Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims
- Kevin Meng,
- Damian Jimenez,
- Jacob Daniel Devasier,
- Sai Sandeep Naraparaju,
- Fatma Arslan,
- Daniel Obembe,
- Chengkai Li
ACM Transactions on Intelligent Systems and Technology (TIST), Volume 15, Issue 6Article No.: 120, Pages 1–25https://doi.org/10.1145/3689212This article presents the latest developments to ClaimBuster’s claim-spotting model, which tackles the critical task of identifying check-worthy claims from large streams of information. We introduce the first adversarially regularized, transformer-based ...
-
- research-articleNovember 2024
PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly
ACM Transactions on Graphics (TOG), Volume 44, Issue 1Article No.: 5, Pages 1–16https://doi.org/10.1145/3702226Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated, which involves ...
- research-articleNovember 2024
- research-articleNovember 2024
Time-Series Prediction Algorithm for Boiler Power Generation Steam Temperature
IOTMMIM '24: Proceedings of the First International Workshop on IoT Datasets for Multi-modal Large ModelPages 27–36https://doi.org/10.1145/3698385.3699874In boiler power generation scenario, multi-modal data could be collected from the system, where main steam temperature could be the most important factor as it influence the whole power generation system's efficiency and safety. In this paper a time ...
- short-paperNovember 2024
T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation
SIGSPATIAL '24: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information SystemsPages 569–572https://doi.org/10.1145/3678717.3691271Trajectory similarity computation is crucial for analyzing movement patterns in applications like traffic management and wildlife tracking. Recent self-supervised learning methods such as contrastive learning have made advancements in trajectory ...
- research-articleOctober 2024
Spatial and Channel Squeeze & Excitation in Adapting Vision Transformers for Temporal Action Localization
McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and PracticePages 20–25https://doi.org/10.1145/3688867.3690169Transformer-based methods have achieved impressive performance on temporal action localization (TAL). Although this achievement is attributed to the multiheaded self-attention (MSA) mechanism, there is still a lack of systematic understanding. ...
- research-articleOctober 2024
Less is More: Adaptive Feature Selection and Fusion for Eye Contact Detection
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11390–11396https://doi.org/10.1145/3664647.3688987Detecting eye contact is essential for embodied robots to engage in natural interactions with humans, enhancing the intuitiveness and comfort of these exchanges. However, eye contact detection often presents a significant challenge due to a variety of ...
- research-articleOctober 2024
Towards Engagement Prediction: A Cross-Modality Dual-Pipeline Approach using Visual and Audio Features
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11383–11389https://doi.org/10.1145/3664647.3688986Engagement estimation is crucial for advancing natural human-computer interaction, allowing artificial agents to dynamically adjust their responses based on user engagement levels and creating more intuitive and immersive experiences. Despite ...
- research-articleOctober 2024
HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6093–6102https://doi.org/10.1145/3664647.3681641Expressive Human Mesh Recovery (HMR) involves reconstructing the 3D human body, including hands and face, from RGB images. It is difficult because humans are highly deformable, and hands are small and frequently occluded. Recent approaches have attempted ...
- research-articleOctober 2024
Open-Vocabulary Audio-Visual Semantic Segmentation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7533–7541https://doi.org/10.1145/3664647.3681586Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking the ...
- research-articleOctober 2024
Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 5102–5111https://doi.org/10.1145/3664647.3681576The prevalence of multimedia applications has led to increased concerns and demand for auto face retouching. Face retouching aims to enhance portrait quality by removing blemishes. However, the existing auto-retouching methods rely heavily on a large ...
- research-articleOctober 2024
Traj2Former: A Local Context-aware Snapshot and Sequential Dual Fusion Transformer for Trajectory Classification
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8053–8061https://doi.org/10.1145/3664647.3681340The wide use of mobile devices has led to a proliferated creation of extensive trajectory data, rendering trajectory classification increasingly vital and challenging for downstream applications. Existing deep learning methods offer powerful feature ...
- research-articleOctober 2024
Rethinking the Implicit Optimization Paradigm with Dual Alignments for Referring Remote Sensing Image Segmentation
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 2031–2040https://doi.org/10.1145/3664647.3681318Referring Remote Sensing Image Segmentation (RRSIS) is a challenging task that aims to identify specific regions in aerial images that are relevant to given textual conditions. Existing methods tend to adopt the paradigm of implicit optimization, ...
- research-articleOctober 2024
AVHash: Joint Audio-Visual Hashing for Video Retrieval
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 2370–2378https://doi.org/10.1145/3664647.3681266Video hashing is a technique of encoding videos into binary vectors, facilitating efficient video storage and high-speed computation. Current approaches to video hashing predominantly utilize sequential frame images to produce semantic binary codes. ...
- research-articleOctober 2024
Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4841–4850https://doi.org/10.1145/3664647.3681246Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive ...
- research-articleOctober 2024
Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6851–6859https://doi.org/10.1145/3664647.3681187The burgeoning field of text-to-music generation models has shown great promise in their ability to generate high-quality music aligned with users' textual descriptions. These models effectively capture abstract/global musical features such as style and ...