research-article

Generalization or Instantiation?: Estimating the Relative Abstractness between Images and Text

Authors:

Wei QinAuthors Info & Claims

ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence

Pages 275 - 282

https://doi.org/10.1145/3404555.3404610

Published: 20 August 2020 Publication History

Abstract

Learning from multi-modal data is very often in current data mining and knowledge management applications. However, the information imbalance between modalities brings challenges for many multi-modal learning tasks, such as cross-modal retrieval, image captioning, and image synthesis. Understanding the cross-modal information gap is an important foundation for designing models and choosing the evaluating criteria of those applications. Especially for text and image data, existing researches have proposed the abstractness to measure the information imbalance. They evaluate the abstractness disparity by training a classifier using the manually annotated multi-modal sample pairs. However, these methods ignore the impact of the intra-modal relationship on the inter-modal abstractness; besides, the annotating process is very labor-intensive, and the quality cannot be guaranteed. In order to evaluate the text-image relationship more comprehensively and reduce the cost of evaluating, we propose the relative abstractness index (RAI) to measure the abstractness between multi-modal items, which measures the abstractness of a sample according to its certainty of differentiating the items of another modality. Besides, we proposed a cycled generating model to compute the RAI values between images and text. In contrast to existing works, the proposed index can better describe the image-text information disparity, and its computing process needs no annotated training samples.

References

[1]

Baltrušaitis, T., Ahuja, C., and Morency, L., 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2, 423--443.

Digital Library

[2]

Barthes, R., 1978. The Rhetoric of the Image. In Image-Music-Text Hill and Wang, London, 32--51.

[3]

Bateman, J., 2014. Text and Image: A Critical Introduction to the Visual/Verbal Divide. Routledge, New York.

[4]

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., 2014. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. In Empirical Methods on Natural Language Processing(EMNLP) ACL, 1724--1734.

[5]

Doan, H. and Nguyen, V., 2019. Improving Dynamic Hand Gesture Recognition on Multi-views with Multi-modalities. International Journal of Machine Learning and Computing 9, 6, 795--800.

[6]

Gu, J., Cai, J., Wang, G., and Chen, T., 2018. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning. In AAAI Conference on Artifcial Intelligence AAAI, 6837--6844.

[7]

Henning, C. and Ewerth, R., 2018. Estimating the Information Gap between Textual and Visual Representations. International Journal of Multimedia Information Retrieval 7, 1 (March 01), 43--56.

[8]

Karpathy, A. and Fei-Fei, L., 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4, 664--676.

Digital Library

[9]

Kingma, D.P. and Ba, J., 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.

[10]

Marsh, E.E. and Domas White, M., 2003. A Taxonomy of Relationships between Images and Text. Journal of Documentation 59, 6, 647--672.

[11]

Martinec, R. and Salway, A., 2005. A System for Image-text Relations in New (and old) Media. Visual communication 4, 3, 337--371.

[12]

Otto, C., Holzki, S., and Ewerth, R., 2019. "Is this an example image?" -- Predicting the Relative Abstractness Level of Image and Text. In European Conference on Information Retrieval Springer, 711--725.

[13]

Qiao, T., Zhang, J., Xu, D., and Tao, D., 2019. MirrorGAN: Learning Text-to-image Generation by Redescription. In IEEE Conference on Computer Vision and Pattern Recognition IEEE, 1505--1514.

[14]

Radford, A., Metz, L., and Chintala, S., 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations (San Juan, Puerto Rico, May 2-4 2016).

[15]

Rashtchian, C., Young, P., Hodosh, M., and Hockenmaier, J., 2010. Collecting Image Annotations Using Amazon's Mechanical Turk. In NAACL Hlt 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk NAACL, 139--147.

[16]

Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., and Goel, V., 2017. Self-Critical Sequence Training for Image Captioning. In IEEE Conference on Computer Vision and Pattern Recognition IEEE, 7008--7024.

[17]

Unsworth, L., 2007. Image/text Relations and Intersemiosis: Towards Multimodal Text Description for Multiliteracies Education. In International Systemic Functional Congress Springer, 1165--1205.

[18]

Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., and He, X., 2018. Attngan: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition IEEE, 1316--1324.

[19]

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D.N., 2017. Stackgan: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In IEEE International Conference on Computer Vision IEEE, 5907--5915.

[20]

Zhu, J.-Y., Park, T., Isola, P., and Efros, A.A., 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In IEEE International Conference on Computer Vision IEEE, 2223--2232.

Index Terms

Generalization or Instantiation?: Estimating the Relative Abstractness between Images and Text
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
2. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining
    2. Multimedia information systems

Recommendations

Understanding Conversational and Expressive Style in a Multimodal Embodied Conversational Agent
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Embodied conversational agents have changed the ways we can interact with machines. However, these systems often do not meet users’ expectations. A limitation is that the agents are monotonic in behavior and do not adapt to an interlocutor. We present ...
Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

One of the key challenges in designing Embodied Conversational Agents (ECA) is to produce human-like gestural and visual prosody expressivity. Another major challenge is to maintain the interlocutor's attention by adapting the agent's behavior to the ...
Efficient multi-modal geometric mean metric learning

We developed a geometric mean distance metric learning algorithm for high-dimensional multi-modal data.The proposed method is efficient compared with the state-of-the-art.Empirical results verify the effectiveness of our algorithm. With the fast ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCAI '20: Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence

April 2020

563 pages

ISBN:9781450377089

DOI:10.1145/3404555

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCAI '20

ICCAI '20: 2020 6th International Conference on Computing and Artificial Intelligence

April 23 - 26, 2020

Tianjin, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
37
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents