Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
introduction
Free access

Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments

Published: 06 January 2023 Publication History
  • Get Citation Alerts
  • In recent years, with the widespread availability of digital sensors (e.g., cameras) and an increasing need for urban artificial intelligence applications, the ability of learning the representations, similarities, and associations of multimedia data in dynamic environments becomes critically important in many multimedia applications. Its goal is to design flexible learning machines to learn environmentally robust descriptors of multimedia data and model complex relationships among them in complex application scenarios, benefiting diverse tasks such as visual object re-identification, cross-modal retrieval, and human pose estimation. The aim of this Special Section on “Learning Representations, Similarities, and Associations in Dynamic Multimedia Environments” is to bring academic researchers and industry developers together for sharing the recent advances and future trends of the representation/similarity learning and association of complex multimedia data.
    The Special Section attracted 25 submissions and after a rigorous review, six papers have been finally accepted for publication. Specifically, two papers are about person re-identification. The rest four papers work on cross-modal matching, human pose estimation, few-shot classification, and compatible representation learning, respectively. Those papers bring novel algorithms, insights, and meaningful discussions to their studied tasks.
    In the article entitled “Rank-in-Rank Loss for Person Re-identification”, Xu et al. propose a Differentiable Retrieval-Sort Loss (DRSL) to optimize the re-ID model. Considering that the ranking and sorting operations are non-differentiable and non-convex, the DRSL also performs the optimization of automatic derivation and backpropagation. The DRSL can not only maintain the inter-class distance distribution but also preserve the intra-class similarity structure in terms of angle constraints. In the article entitled “3D Skeleton and Two Streams Approach to Person Re-identification Using Optimized Region Matching”, Han et al. propose a 3D skeleton and two-stream approach for person Re-ID. The first stream uses the 3D skeleton for background filtering and region segmentation, and the second stream uses the Siamese net for global descriptor extraction. The two streams are finally effectively fused to improve the distance learning with an optimized region matching strategy.
    In the article entitled “Guided Graph Attention Learning for Video-Text Matching”, Li et al. propose a Guided Graph Attention Learning (GGAL) model to enhance the video embedding learning by capturing important region-level semantic concepts within the spatial-temporal space. The GGAL model builds connections between object regions and performs hierarchical graph reasoning on both frame-level and whole video level region graphs. The global context is used to guide the attention learning on this hierarchical graph topology. Then the learned video embedding can be better aligned with text captions.
    In the article entitled “GLPose: Global-Local Representation Learning for Human Pose Estimation”, Jiao et al. propose a global-local enhanced pose estimation (GLPose) network to tackle the challenging multi-frame human pose estimation task. The GLPose framework consists of a feature processing module that conditionally incorporates global semantic information and local visual context to generate a robust human representation, and a feature enhancement module that excavates complementary information from this aggregated representation to enhance keyframe features for precise estimation.
    In the article entitled “Revisiting Local Descriptor for Improved Few-Shot Classification”, He et al. propose a Dense Classification and Attentive Pooling (DCAP) method for few-shot visual object classification. Specifically, it formulates the meta-learning as a two-stage training paradigm, where it introduces a dense classification pre-training stage to reduce semantic discrepancy among local descriptors and devises an attentive pooling strategy in meta-finetuning to select more informative local descriptors for few-shot classification.
    In the article entitled “CL2R: Compatible Lifelong Learning Representations”, Biondi et al. propose a method to partially mimic natural intelligence for the problem of lifelong learning representations that are compatible. The authors identify stationarity as the property that the feature representation is required to hold to achieve compatibility and propose a novel training procedure that encourages local and global stationarity on the learned representation. Due to stationarity, the statistical properties of the learned features do not change over time, making them interoperable with previously learned features.
    In closing, the guest editors would like to thank all the authors who significantly contributed to this Special Section and the reviewers for their efforts in respecting deadlines and their constructive reviews. We are also grateful to the Editor-in-Chief, Abdulmotaleb El Saddik and the Information Director, Mohammad Anwar Hossain for their support. We hope this Special Section will inspire further research and development ideas for learning representations, similarity, and associations in dynamic multimedia environments.
    Xun Yang
    University of Science and Technology of China, China
    Liang Zheng
    Australian National University, Australia
    Elisa Ricci
    University of Trento, Italy
    Meng Wang
    Hefei University of Technology, China
    Guest Editors

    Cited By

    View all
    • (2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
    • (2024)DAG-YOLO: A Context-feature Adaptive Fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674978Online publication date: 27-Jun-2024
    • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
    • Show More Cited By

    Index Terms

    1. Introduction to the Special Section on Learning Representations, Similarity, and Associations in Dynamic Multimedia Environments
                Index terms have been assigned to the content through auto-classification.

                Recommendations

                Comments

                Information & Contributors

                Information

                Published In

                cover image ACM Transactions on Multimedia Computing, Communications, and Applications
                ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 2s
                June 2022
                383 pages
                ISSN:1551-6857
                EISSN:1551-6865
                DOI:10.1145/3561949
                • Editor:
                • Abdulmotaleb El Saddik
                Issue’s Table of Contents

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                Published: 06 January 2023
                Published in TOMM Volume 18, Issue 2s

                Permissions

                Request permissions for this article.

                Check for updates

                Qualifiers

                • Introduction
                • Refereed

                Contributors

                Other Metrics

                Bibliometrics & Citations

                Bibliometrics

                Article Metrics

                • Downloads (Last 12 months)54
                • Downloads (Last 6 weeks)12

                Other Metrics

                Citations

                Cited By

                View all
                • (2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
                • (2024)DAG-YOLO: A Context-feature Adaptive Fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674978Online publication date: 27-Jun-2024
                • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
                • (2023)Sparsity-guided Discriminative Feature Encoding for Robust Keypoint DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362843220:3(1-22)Online publication date: 17-Oct-2023
                • (2023)Boosting Few-shot Object Detection with Discriminative Representation and Class MarginACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360847820:3(1-19)Online publication date: 10-Nov-2023
                • (2023)Pseudo Object Replay and Mining for Incremental Object DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611952(153-162)Online publication date: 26-Oct-2023
                • (2023)Distilled Meta-learning for Multi-Class Incremental LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357604519:4(1-16)Online publication date: 15-Mar-2023
                • (2023)When Object Detection Meets Knowledge Distillation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.325754645:8(10555-10579)Online publication date: 1-Aug-2023
                • (2023)Integrating cloud and mist computing to lower latency in IoT topologiesTransactions on Emerging Telecommunications Technologies10.1002/ett.483434:10Online publication date: 12-Oct-2023
                • (2022)Face Mask-Wearing Detection Model Based on Loss Function and Attention MechanismComputational Intelligence and Neuroscience10.1155/2022/24522912022Online publication date: 1-Jan-2022
                • Show More Cited By

                View Options

                View options

                PDF

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format.

                HTML Format

                Get Access

                Login options

                Full Access

                Media

                Figures

                Other

                Tables

                Share

                Share

                Share this Publication link

                Share on social media