Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2024
Document-level Relation Extraction with Progressive Self-distillation
ACM Transactions on Information Systems (TOIS), Volume 42, Issue 6Article No.: 143, Pages 1–34https://doi.org/10.1145/3656168Document-level relation extraction (RE) aims to simultaneously predict relations (including no-relation cases denoted as NA) between all entity pairs in a document. It is typically formulated as a relation classification task with entities pre-detected in ...
- research-articleJanuary 2024
Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 5Article No.: 133, Pages 1–23https://doi.org/10.1145/3638558Image captioning (IC), bringing vision to language, has drawn extensive attention. A crucial aspect of IC is the accurate depiction of visual relations among image objects. Visual relations encompass two primary facets: content relations and structural ...
- research-articleJanuary 2024
Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video Commenting
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 4Article No.: 104, Pages 1–24https://doi.org/10.1145/3633334Automatic live video commenting is getting increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from current methods. Sentimental ...
- research-articleOctober 2023
Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching
MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023, Pages 4828–4837https://doi.org/10.1145/3581783.3611703Image-text matching, as a fundamental cross-modal task, bridges vision and language. The key challenge lies in accurately learning the semantic similarity of these two heterogeneous modalities. To determine the semantic similarity between visual and ...
- ArticleSeptember 2023
Contour-Augmented Concept Prediction Network for Image Captioning
Artificial Neural Networks and Machine Learning – ICANN 2023Sep 2023, Pages 180–191https://doi.org/10.1007/978-3-031-44210-0_15AbstractSemantic information in images is essential for image captioning. However, previous works leverage the pre-trained object detector to mine semantics in an image, making the model unable to accurately capture visual semantics, and further making ...
-
- research-articleJune 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 45, Issue 6June 2023, Pages 7123–7141https://doi.org/10.1109/TPAMI.2022.3223908Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how ...
- research-articleFebruary 2023
GH-DDM: the generalized hybrid denoising diffusion model for medical image generation
Multimedia Systems (MUME), Volume 29, Issue 3Jun 2023, Pages 1335–1345https://doi.org/10.1007/s00530-023-01059-0AbstractDeep-learning-based medical imaging plays a pivotal role in modern healthcare while suffering from the data scarcity bottleneck, since obtaining sufficient high-quality data in the medical imaging area is difficult and expensive. To alleviate this ...
- research-articleJanuary 2023
Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
IEEE Transactions on Multimedia (TOM), Volume 252023, Pages 1320–1332https://doi.org/10.1109/TMM.2022.3141603Image-text matching, as a fundamental cross-modal task, bridges the gap between vision and language. The core is to accurately learn semantic alignment to find relevant shared semantics in image and text. Existing methods typically attend to all fragments ...
- research-articleDecember 2022
Intra-Class Adaptive Augmentation With Neighbor Correction for Deep Metric Learning
IEEE Transactions on Multimedia (TOM), Volume 252023, Pages 7758–7771https://doi.org/10.1109/TMM.2022.3227414Deep metric learning aims to learn an embedding space, where semantically similar samples are close together and dissimilar ones are repelled against. To explore more hard and informative training signals for augmentation and generalization, recent ...
- research-articleOctober 2022
Background Layout Generation and Object Knowledge Transfer for Text-to-Image Generation
MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022, Pages 4327–4335https://doi.org/10.1145/3503161.3548154Text-to-Image generation (T2I) aims to generate realistic and semantically consistent images according to the natural language descriptions. Built upon the recent advances in generative adversarial networks (GANs), existing T2I models have made great ...
- research-articleOctober 2022
Fine-tuning with Multi-modal Entity Prompts for News Image Captioning
MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022, Pages 4365–4373https://doi.org/10.1145/3503161.3547883News Image Captioning aims to generate descriptions for images embedded in news articles, including plentiful real-world concepts, especially about named entities. However, existing methods are limited in the entity-level template. Not only is it labor-...
- research-articleOctober 2022
DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation
MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022, Pages 4345–4354https://doi.org/10.1145/3503161.3547881Text-to-image generation aims at generating realistic images which are semantically consistent with the given text. Previous works mainly adopt the multi-stage architecture by stacking generator-discriminator pairs to engage multiple adversarial ...
- research-articleJuly 2022
Semantically Similarity-Wise Dual-Branch Network for Scene Graph Generation
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 32, Issue 7July 2022, Pages 4573–4583https://doi.org/10.1109/TCSVT.2021.3130197Scene graph generation aims to detect visual entities and relationships between them from an image. The object-level visual information is of vital importance for predicting accurate relationships. However, most existing methods essentially encode visual ...
- research-articleJuly 2022
Self-Supervised Synthesis Ranking for Deep Metric Learning
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 32, Issue 7July 2022, Pages 4736–4750https://doi.org/10.1109/TCSVT.2021.3124908The core purpose of deep metric learning is to construct an embedding space, where objects belonging to the same class are gathered together and the ones from different classes are pushed apart. Most existing approaches typically insist to inter-class ...
- research-articleJune 2022
Weakly Supervised Pediatric Bone Age Assessment Using Ultrasonic Images via Automatic Anatomical RoI Detection
ICMR '22: Proceedings of the 2022 International Conference on Multimedia RetrievalJune 2022, Pages 647–653https://doi.org/10.1145/3512527.3531436Bone age assessment (BAA) is vital in pediatric clinical diagnosis. Existing deep learning methods predict bone age based on Regions of Interest (RoIs) detection or segmentation of hand radiograph, which requires expensive annotations. Limitations of ...
- research-articleMay 2022
Multi-task hourglass network for online automatic diagnosis of developmental dysplasia of the hip
World Wide Web (WWWJ), Volume 26, Issue 2Mar 2023, Pages 539–559https://doi.org/10.1007/s11280-022-01051-0AbstractDevelopmental dysplasia of the hip (DDH) is one of the most common diseases in children. Due to the experience-requiring medical image analysis work, online automatic diagnosis of DDH has intrigued the researchers. Traditional implementation of ...
- research-articleMay 2022
Joint Local Correlation and Global Contextual Information for Unsupervised 3D Model Retrieval and Classification
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 32, Issue 5May 2022, Pages 3265–3278https://doi.org/10.1109/TCSVT.2021.3099496Unsupervised 3D model analysis has attracted tremendous attentions with the increasing growth of 3D model data and the extensive human annotations. Many effective methods have been designed to address the 3D model analysis with labeled information, while ...
- research-articleJanuary 2022
Focus Your Attention: A Focal Attention for Multimodal Learning
IEEE Transactions on Multimedia (TOM), Volume 242022, Pages 103–115https://doi.org/10.1109/TMM.2020.3046855The key point in multimodal learning is to learn semantic alignment that finds the correspondence between sub-elements of instances from different modality data. Attention mechanism has shown its power in semantic alignment learning as it enables to ...
- research-articleJanuary 2022
Task-Adaptive Attention for Image Captioning
IEEE Transactions on Circuits and Systems for Video Technology (IEEETCSVT), Volume 32, Issue 1Jan. 2022, Pages 43–51https://doi.org/10.1109/TCSVT.2021.3067449Attention mechanisms are now widely used in image captioning models. However, most attention models only focus on visual features. When generating syntax related words, little visual information is needed. In this case, these attention models could ...
- research-articleOctober 2021
Mask and Predict: Multi-step Reasoning for Scene Graph Generation
MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021, Pages 4128–4136https://doi.org/10.1145/3474085.3475545Scene Graph Generation (SGG) aims to parse the image as a set of semantics, containing objects and their relations. Currently, the SGG methods only stay at presenting the intuitive detection in the image, such as the triplet "logo on board". Intuitively,...