Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleDecember 2024
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
AbstractThis paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different from previous methods, Wear-Any-Way is a customizable solution. Besides generating high-fidelity results, our method supports users to precisely manipulate ...
- ArticleOctober 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
- Chen Ju,
- Haicheng Wang,
- Haozhe Cheng,
- Xu Chen,
- Zhonghua Zhai,
- Weilin Huang,
- Jinsong Lan,
- Shuai Xiao,
- Bo Zheng
AbstractVision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve ...
- research-articleMay 2024
Enhancing Cross-Domain Click-Through Rate Prediction via Explicit Feature Augmentation
WWW '24: Companion Proceedings of the ACM Web Conference 2024Pages 423–432https://doi.org/10.1145/3589335.3648341Cross-domain CTR (CDCTR) prediction is an important research topic that studies how to leverage meaningful data from a related domain to help CTR prediction in target domain. Most existing CDCTR works design implicit ways to transfer knowledge across ...
- research-articleMay 2024
AttrSeg: open-vocabulary semantic segmentation via attribute decomposition-aggregation
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsArticle No.: 450, Pages 10258–10270Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time. Recent works explore vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical ...
- research-articleNovember 2023
Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization
IEEE Transactions on Multimedia (TOM), Volume 25Pages 6688–6701https://doi.org/10.1109/TMM.2022.3213478Weakly-supervised temporal action localization aims to localize actions from untrimmed long videos with only video-level category labels. Most previous methods ignore the incompleteness issue of Class Activation Sequences (CAS), suffering from trivial ...
- ArticleOctober 2022
Prompting Visual-Language Models for Efficient Video Understanding
AbstractImage-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for “zero-shot” generalisation. This paper presents a simple but ...
- research-articleOctober 2022
Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 3742–3753https://doi.org/10.1145/3503161.3548317We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. To understand what enables to learn useful representations, we systematically investigate the effects of data ...
- research-articleJuly 2022
MePark: Using Meters as Sensors for Citywide On-Street Parking Availability Prediction
IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 23, Issue 7Pages 7244–7257https://doi.org/10.1109/TITS.2021.3067675Real-time parking availability prediction is of great value to optimize the on-street parking resource utilization and improve traffic conditions, while the expensive costs of the existing parking availability sensing systems have limited their large-...
- ArticleApril 2023
The Introduction of Positive Position in “VWish + VPneg” and Its Pragmatic Consequences
AbstractWhen verbs about wish are followed by a negative clause, the positive attitude within these verbs is often suppressed. The sentence cannot express a “pure” wish, but rather express concern or curse. In interactive communication, negation always ...
- ArticleOctober 2021
Unsupervised Domain Adaption via Similarity-Based Prototypes for Cross-Modality Segmentation
Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global HealthPages 133–143https://doi.org/10.1007/978-3-030-87722-4_13AbstractDeep learning models have achieved great success on various vision challenges, but a well-trained model would face drastic performance degradation when applied to unseen data. Since the model is sensitive to domain shift, unsupervised domain ...
- research-articleDecember 2020
D2Park: Diversified Demand-aware On-street Parking Guidance
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 4, Issue 4Article No.: 163, Pages 1–25https://doi.org/10.1145/3432214To address the increasingly serious parking pain, numerous mobile Apps have emerged to help drivers to find a convenient parking spot with various auxiliary information. However, the phenomenon of "multiple cars chasing the same spot" still exists, ...
- ArticleAugust 2020
Bottom-Up Temporal Action Localization with Mutual Regularization
AbstractRecently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves evaluating the frame-level ...