research-article

Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching

Authors:

Heng Tao ShenAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 4

Article No.: 127, Pages 1 - 23

https://doi.org/10.1145/3458281

Published: 12 November 2021 Publication History

Abstract

Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into a common space to calculate the image-sentence similarity. However, the image-sentence similarity obtained by these methods may be coarse as (1) an intermediate common space is introduced to implicitly match the heterogeneous features of images and sentences in a global level, and (2) only the inter-modality relations of images and sentences are captured while the intra-modality relations are ignored. To overcome the limitations, we propose a novel Cross-Modal Hybrid Feature Fusion (CMHF) framework for directly learning the image-sentence similarity by fusing multimodal features with inter- and intra-modality relations incorporated. It can robustly capture the high-level interactions between visual regions in images and words in sentences, where flexible attention mechanisms are utilized to generate effective attention flows within and across the modalities of images and sentences. A structured objective with ranking loss constraint is formed in CMHF to learn the image-sentence similarity based on the fused fine-grained features of different modalities bypassing the usage of intermediate common space. Extensive experiments and comprehensive analysis performed on two widely used datasets—Microsoft COCO and Flickr30K—show the effectiveness of the hybrid feature fusion framework in CMHF, in which the state-of-the-art matching performance is achieved by our proposed CMHF method.

References

[1]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 6077–6086.

[2]

Hedi Ben-Younes, Rémi Cadène, Matthieu Cord, and Nicolas Thome. 2017. MUTAN: Multimodal tucker fusion for visual question answering. In Proceedings of the IEEE International Conference on Computer Vision. 2631–2639.

[3]

Hedi Ben-Younes, Rémi Cadène, Nicolas Thome, and Matthieu Cord. 2019. BLOCK: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’19). 8102–8109.

Digital Library

[4]

Leon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT 2010. 177–186.

[5]

Timothy J. Buschman and Earl K. Miller. 2007. Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science 315, 5820 (2007), 1860–1862.

[6]

Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, and Jungong Han. 2020. IMRAM: Iterative matching with recurrent attention memory for cross-modal image-text retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 12652–12660.

[7]

Heeyoul Choi, Kyunghyun Cho, and Yoshua Bengio. 2018. Fine-grained attention mechanism for neural machine translation. Neurocomputing 284 (2018), 171–176.

[8]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555.

[9]

Stéphane Clinchant, Julien Ah-Pine, and Gabriela Csurka. 2011. Semantic combination of textual and visual information in multimedia retrieval. In Proceedings of the IEEE International Conference on Computer Vision. ACM, New York, NY, 44.

Digital Library

[10]

Fartash Faghri, David J. Fleet, Jamie Kiros, and Sanja Fidler. 2018. VSE++: Improving visual-semantic embeddings with hard negatives. In Proceedings of the British Machine Vision Conference. 12.

[11]

Zhiwei Fang, Jing Liu, Xueliang Liu, Qu Tang, Yong Li, and Hanqing Lu. 2019. BTDP: Toward sparse fusion with block term decomposition pooling for visual question answering. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 2s (2019), Article 50, 21 pages.

Digital Library

[12]

Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems. 2121–2129.

Digital Library

[13]

Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 457–468.

[14]

Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, and Hongsheng Li. 2019. Dynamic fusion with intra- and inter-modality attention flow for visual question answering. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 6639–6648.

[15]

Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. 2016. Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 317–326.

[16]

David Guthrie, Ben Allison, Wei Liu, Louise Guthrie, and Yorick Wilks. 2006. A closer look at skip-gram modelling. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). 1222–1225.

[17]

Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, and Qibin Zhao. 2019. Deep multimodal multilinear fusion with high-order polynomial pooling. In Advances in Neural Information Processing Systems. 12113–12122.

Digital Library

[18]

Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, and Zhoujun Li. 2019. Bi-directional spatial-semantic attention networks for image-text matching. IEEE Transactions on Image Processing 28, 4 (2019), 2008–2020.

Digital Library

[19]

Lun Huang, Wenmin Wang, Jie Chen, and Xiao-Yong Wei. 2019. Attention on attention for image captioning. In Proceedings of the IEEE International Conference on Computer Vision. 4634–4643.

[20]

Yan Huang, Wei Wang, and Liang Wang. 2017. Instance-aware image and sentence matching with selective multimodal LSTM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 7254–7262.

[21]

Yan Huang, Qi Wu, Chunfeng Song, and Liang Wang. 2018. Learning semantic concepts and order for image and sentence matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 6163–6171.

[22]

Y. Huang, Q. Wu, W. Wang, and L. Wang. 2020. Image and sentence matching via semantic concepts and order learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 3 (2020), 636–650.

[23]

Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2016. Hadamard product for low-rank bilinear pooling. arXiv:1610.04325.

[24]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980.

[25]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, et al. 2016. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv:1602.07332.

Digital Library

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.

Digital Library

[27]

Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV’18). 201–216.

Digital Library

[28]

Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, and Xiaogang Wang. 2017. Identity-aware textual-visual matching with latent co-attention. In Proceedings of the IEEE International Conference on Computer Vision. 1890–1899.

[29]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV’14). 740–755.

[30]

Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards micro-video understanding by joint sequential-sparse modeling. In Proceedings of the ACM International Conference on Multimedia (MM’17). 970–978.

Digital Library

[31]

Ruoyu Liu, Yao Zhao, Shikui Wei, Liang Zheng, and Yi Yang. 2019. Modality-invariant image-text embedding for image-sentence matching. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), Article 27, 19 pages.

Digital Library

[32]

Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, and Hongsheng Li. 2019. Improving referring expression grounding with cross-modal attention-guided erasing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 1950–1959.

[33]

Yu Liu, Yanming Guo, Erwin M. Bakker, and Michael S. Lew. 2017. Learning a recurrent residual fusion network for multimodal matching. In Proceedings of the IEEE International Conference on Computer Vision. 4127–4136.

[34]

Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2018. Efficient low-rank multimodal fusion with modality-specific factors. arXiv:1806.00064.

[35]

Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), Article 22, 24 pages.

Digital Library

[36]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28. 91–99.

Digital Library

[37]

Nikolaos Sarafianos, Xiang Xu, and Ioannis A. Kakadiaris. 2019. Adversarial representation learning for text-to-image matching. In Proceedings of the IEEE International Conference on Computer Vision. 5814–5824.

[38]

Yale Song and Mohammad Soleymani. 2019. Polysemous visual-semantic embedding for cross-modal retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 1979–1988.

[39]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605.

Digital Library

[40]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.

Digital Library

[41]

Haoran Wang, Zhong Ji, Zhigang Lin, Yanwei Pang, and Xuelong Li. 2020. Stacked squeeze-and-excitation recurrent residual network for visual-semantic matching. Pattern Recognition 105 (2020), 107359.

[42]

Haoran Wang, Ying Zhang, Zhong Ji, Yanwei Pang, and Lin Ma. 2020. Consensus-aware visual-semantic embedding for image-text matching. In Proceedings of the European Conference on Computer Vision (ECCV’20), Vol. 12369. 18–34.

Digital Library

[43]

Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A comprehensive survey on cross-modal retrieval. arXiv:1607.06215.

[44]

Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5005–5013.

[45]

Liwei Wang, Yin Li, and Svetlana Lazebnik. 2017. Learning two-branch neural networks for image-text matching tasks. arXiv:1704.03470.

[46]

Shuhui Wang, Yangyu Chen, Junbao Zhuo, Qingming Huang, and Qi Tian. 2018. Joint global and co-attentive representation learning for image-sentence retrieval. In Proceedings of the ACM International Conference on Multimedia (MM’18). 1398–1406.

Digital Library

[47]

Shuo Wang, Dan Guo, Xin Xu, Li Zhuo, and Meng Wang. 2019. Cross-modality retrieval by joint correlation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 2s (2019), Article 56, 16 pages.

Digital Library

[48]

Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, and Xilin Chen. 2020. Cross-modal scene graph matching for relationship-aware image-text retrieval. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20). 1497–1506.

[49]

Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, and Jingkuan Song. 2019. Matching images and text with multi-modal tensor fusion and re-ranking. In Proceedings of the ACM International Conference on Multimedia (MM’19). 12–20.

Digital Library

[50]

Xin Wang, Yuan-Fang Wang, and William Yang Wang. 2018. Watch, listen, and describe: Globally and locally aligned cross-modal attentions for video captioning. arXiv:1804.05448.

[51]

Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, and Jing Shao. 2019. CAMP: Cross-modal adaptive message passing for text-image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 5764–5773.

[52]

Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 10938–10947.

[53]

Yinwei Wei, Xiang Wang, Weili Guan, Liqiang Nie, Zhouchen Lin, and Baoquan Chen. 2020. Neural multimodal cooperative learning toward micro-video understanding. IEEE Transactions on Image Processing 29 (2020), 1–14.

[54]

Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-modal retrieval with CNN visual features: A new baseline. IEEE Transactions on Cybernetics 47, 2 (2017), 449–460.

[55]

Cedric Westphal and Guanhong Pei. 2009. Scalable routing via greedy embedding. In Proceedings of IEEE INFOCOM 2009. IEEE, Los Alamitos, CA, 2826–2830.

[56]

Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, and Wei-Ying Ma. 2019. Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 6609–6618.

[57]

Ran Xu, Caiming Xiong, Wei Chen, and Jason J. Corso. 2015. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.

Digital Library

[58]

Xing Xu, Kaiyi Lin, Yang Yang, Alan Hanjalic, and Tao Heng Shen. 2020. Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited. IEEE Transactions on Pattern Analysis and Machine Intelligence 12 (2020), 1–18.

[59]

Xing Xu, Huimin Lu, Jingkuan Song, Yang Yang, Heng Tao Shen, and Xuelong Li. 2020. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Transactions on Cybernetics 50, 6 (2020), 2400–2413.

[60]

Zhenguo Yang, Zehang Lin, Peipei Kang, Jianming Lv, Qing Li, and Wenyin Liu. 2020. Learning shared semantic space with correlation alignment for cross-modal event retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1 (2020), Article 9, 22 pages.

Digital Library

[61]

Zhaoda Ye and Yuxin Peng. 2020. Sequential cross-modal hashing learning via multi-scale correlation mining. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 4 (2020), Article 105, 20 pages.

Digital Library

[62]

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2 (2014), 67–78.

[63]

Jin Yuan, Lei Zhang, Songrui Guo, Yi Xiao, and Zhiyong Li. 2020. Image captioning with a joint attention mechanism by visual concept samples. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), Article 83, 22 pages.

Digital Library

[64]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. arXiv:1707.07250.

[65]

Dongxiang Zhang, Rui Cao, and Sai Wu. 2019. Information fusion in visual question answering: A survey. Information Fusion 52 (2019), 268–280.

Digital Library

[66]

Shanshan Zhang, Jian Yang, and Bernt Schiele. 2018. Occluded pedestrian detection through guided attention in CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 6995–7003.

[67]

Ying Zhang and Huchuan Lu. 2018. Deep cross-modal projection learning for image-text matching. In Proceedings of the European Conference on Computer Vision, Vol. 11205. 707–723.

Cited By

Khan ILee J(2024)PAR-Net: An Enhanced Dual-Stream CNN–ESN Architecture for Human Physical Activity RecognitionSensors10.3390/s2406190824:6(1908)Online publication date: 16-Mar-2024
https://doi.org/10.3390/s24061908
Zeng RMa WWu XLiu WLiu J(2024)Image–Text Cross-Modal Retrieval with Instance Contrastive EmbeddingElectronics10.3390/electronics1302030013:2(300)Online publication date: 9-Jan-2024
https://doi.org/10.3390/electronics13020300
Li DZhang HCheng JLiu B(2024)Improving efficiency of DNN-based relocalization module for autonomous driving with server-side computingJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00592-113:1Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-024-00592-1
Show More Cited By

Index Terms

Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval
2. Information systems
  1. Information retrieval

Recommendations

Colour image cross-modal retrieval method based on multi-modal visual data fusion

Because the traditional colour image cross-modal retrieval methods have the problems of low retrieval accuracy and recall, and long retrieval time, a colour image cross-modal retrieval method based on multi-modal visual data fusion is proposed. First, ...
HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-...
Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Cross-modal retrieval extends the ability of search engines to deal with the massive cross-modal data. The goal of image-text cross-modal retrieval is to search images (texts) by using text (image) queries by computing the similarities of images and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 4

November 2021

529 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3492437

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2021

Accepted: 01 March 2021

Revised: 01 February 2021

Received: 01 November 2020

Published in TOMM Volume 17, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Fundamental Research Funds for the Central Universities
Sichuan Science and Technology Program, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
1,018
Total Downloads

Downloads (Last 12 months)177
Downloads (Last 6 weeks)12

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khan ILee J(2024)PAR-Net: An Enhanced Dual-Stream CNN–ESN Architecture for Human Physical Activity RecognitionSensors10.3390/s2406190824:6(1908)Online publication date: 16-Mar-2024
https://doi.org/10.3390/s24061908
Zeng RMa WWu XLiu WLiu J(2024)Image–Text Cross-Modal Retrieval with Instance Contrastive EmbeddingElectronics10.3390/electronics1302030013:2(300)Online publication date: 9-Jan-2024
https://doi.org/10.3390/electronics13020300
Li DZhang HCheng JLiu B(2024)Improving efficiency of DNN-based relocalization module for autonomous driving with server-side computingJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00592-113:1Online publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-024-00592-1
Shu CChen YTan CLuo YDou H(2024)Enhancing trust transfer in supply chain finance: a blockchain-based transitive trust modelJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-023-00557-w13:1Online publication date: 2-Jan-2024
https://dl.acm.org/doi/10.1186/s13677-023-00557-w
Zuo RZheng CLi FZhu LZhang Z(2024)Privacy-Enhanced Prototype-based Federated Cross-modal Hashing for Cross-modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674507Online publication date: 25-Jun-2024
https://dl.acm.org/doi/10.1145/3674507
Zhang LZhou XLi DYang Z(2024)HCCNet: Hybrid Coupled Cooperative Network for Robust Indoor LocalizationACM Transactions on Sensor Networks10.1145/366564520:4(1-22)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3665645
Lyu YQin PXu TZhu CChen E(2024)InteractNet: Social Interaction Recognition for Semantic-rich VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366366820:8(1-21)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3663668
Huang JRen HLiu SLiu YLv CLu JXie CLu H(2024)Real-Time Attentive Dilated U-Net for Extremely Dark Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365466820:8(1-19)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3654668
Jha MBhandari A(2024)NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank MinimizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877220:6(1-22)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3638772
Qiu HLi HWu QShi HWang LMeng FXu L(2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637214
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents