Keyword: transformer : Search

research-article

P2FTrack: Multi-Object Tracking with Motion Prior and Feature Posterior

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 21, Issue 1Article No.: 22, Pages 1–22https://doi.org/10.1145/3700443

Multiple object tracking (MOT) has emerged as a crucial component of the rapidly developing computer vision. However, existing multi-object tracking methods often overlook the relationship between features and motion, hindering the ability to strike a ...

short-paper

Open Access

TraMSR: Transformer and Mamba based Practical Speech Super-Resolution for Mobile Wearables

ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and NetworkingPages 1686–1688https://doi.org/10.1145/3636534.3697460

Speech super-resolution techniques offer a promising solution to enhance audio quality in wearable devices, particularly when addressing the challenges of reduced sampling rates necessitated by battery life constraints and network instability. However, ...

research-article

TSFormer: Tracking Structure Transformer for Image Inpainting

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 12Article No.: 381, Pages 1–23https://doi.org/10.1145/3696452

Recent studies have shown that image structure can significantly facilitate image inpainting. However, current approaches mostly explore structure prior without considering its guidance to texture reconstruction, leading to performance degradation. To ...

research-article

Category-Level Pose Estimation and Iterative Refinement for Monocular RGB-D Image

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 12Article No.: 374, Pages 1–20https://doi.org/10.1145/3695877

Category-level pose estimation is proposed to predict the 6D pose of objects under a specific category and has wide applications in fields such as robotics, virtual reality, and autonomous driving. With the development of VR/AR technology, pose estimation ...

research-article

Gradient-Based Adversarial Training on Transformer Networks for Detecting Check-Worthy Factual Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 15, Issue 6Article No.: 120, Pages 1–25https://doi.org/10.1145/3689212

This article presents the latest developments to ClaimBuster’s claim-spotting model, which tackles the critical task of identifying check-worthy claims from large streams of information. We introduce the first adversarially regularized, transformer-based ...

research-article

PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly

ACM Transactions on Graphics (TOG), Volume 44, Issue 1Article No.: 5, Pages 1–16https://doi.org/10.1145/3702226

Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated, which involves ...

research-article

Open Access

Hybrid Prompt Learning for Generating Justifications of Security Risks in Automation Rules

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 15, Issue 5Article No.: 103, Pages 1–26https://doi.org/10.1145/3675401

Trigger-action platforms (TAPs) enable users without programming experience to personalize the behavior of Internet of Things applications and services through IF-THEN rules. Unfortunately, the arbitrary connection of smart devices and online services, ...

research-article

Time-Series Prediction Algorithm for Boiler Power Generation Steam Temperature

IOTMMIM '24: Proceedings of the First International Workshop on IoT Datasets for Multi-modal Large ModelPages 27–36https://doi.org/10.1145/3698385.3699874

In boiler power generation scenario, multi-modal data could be collected from the system, where main steam temperature could be the most important factor as it influence the whole power generation system's efficiency and safety. In this paper a time ...

short-paper

T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation

SIGSPATIAL '24: Proceedings of the 32nd ACM International Conference on Advances in Geographic Information SystemsPages 569–572https://doi.org/10.1145/3678717.3691271

Trajectory similarity computation is crucial for analyzing movement patterns in applications like traffic management and wildlife tracking. Recent self-supervised learning methods such as contrastive learning have made advancements in trajectory ...

research-article

Spatial and Channel Squeeze & Excitation in Adapting Vision Transformers for Temporal Action Localization

McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and PracticePages 20–25https://doi.org/10.1145/3688867.3690169

Transformer-based methods have achieved impressive performance on temporal action localization (TAL). Although this achievement is attributed to the multiheaded self-attention (MSA) mechanism, there is still a lack of systematic understanding. ...

research-article

Less is More: Adaptive Feature Selection and Fusion for Eye Contact Detection

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11390–11396https://doi.org/10.1145/3664647.3688987

Detecting eye contact is essential for embodied robots to engage in natural interactions with humans, enhancing the intuitiveness and comfort of these exchanges. However, eye contact detection often presents a significant challenge due to a variety of ...

research-article

Open Access

Towards Engagement Prediction: A Cross-Modality Dual-Pipeline Approach using Visual and Audio Features

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 11383–11389https://doi.org/10.1145/3664647.3688986

Engagement estimation is crucial for advancing natural human-computer interaction, allowing artificial agents to dynamically adjust their responses based on user engagement levels and creating more intuitive and immersive experiences. Despite ...

research-article

Open Access

HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6093–6102https://doi.org/10.1145/3664647.3681641

Expressive Human Mesh Recovery (HMR) involves reconstructing the 3D human body, including hands and face, from RGB images. It is difficult because humans are highly deformable, and hands are small and frequently occluded. Recent approaches have attempted ...

research-article

Open-Vocabulary Audio-Visual Semantic Segmentation

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 7533–7541https://doi.org/10.1145/3664647.3681586

Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking the ...

research-article

Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 5102–5111https://doi.org/10.1145/3664647.3681576

The prevalence of multimedia applications has led to increased concerns and demand for auto face retouching. Face retouching aims to enhance portrait quality by removing blemishes. However, the existing auto-retouching methods rely heavily on a large ...

research-article

Traj2Former: A Local Context-aware Snapshot and Sequential Dual Fusion Transformer for Trajectory Classification

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 8053–8061https://doi.org/10.1145/3664647.3681340

The wide use of mobile devices has led to a proliferated creation of extensive trajectory data, rendering trajectory classification increasingly vital and challenging for downstream applications. Existing deep learning methods offer powerful feature ...

research-article

Rethinking the Implicit Optimization Paradigm with Dual Alignments for Referring Remote Sensing Image Segmentation

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 2031–2040https://doi.org/10.1145/3664647.3681318

Referring Remote Sensing Image Segmentation (RRSIS) is a challenging task that aims to identify specific regions in aerial images that are relevant to given textual conditions. Existing methods tend to adopt the paradigm of implicit optimization, ...

research-article

AVHash: Joint Audio-Visual Hashing for Video Retrieval

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 2370–2378https://doi.org/10.1145/3664647.3681266

Video hashing is a technique of encoding videos into binary vectors, facilitating efficient video storage and high-speed computation. Current approaches to video hashing predominantly utilize sequential frame images to produce semantic binary codes. ...

research-article

Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 4841–4850https://doi.org/10.1145/3664647.3681246

Advances in computer vision research enable human-like high-dimensional perceptual induction over analogical visual reasoning problems, such as Raven's Progressive Matrices (RPMs). In this paper, we propose a Hierarchical Perception and Predictive ...

research-article

Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6851–6859https://doi.org/10.1145/3664647.3681187

The burgeoning field of text-to-music generation models has shown great promise in their ability to generate high-quality music aligned with users' textual descriptions. These models effectively capture abstract/global musical features such as style and ...

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences