Article

A text-to-picture synthesis system for augmenting communication

Authors:

Andrew B. Goldberg,

Mohamed Eldawy,

Charles R. Dyer,

Bradley StrockAuthors Info & Claims

AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence - Volume 2

Pages 1590 - 1595

Published: 22 July 2007 Publication History

Abstract

We present a novel Text-to-Picture system that synthesizes a picture from general, unrestricted natural language text. The process is analogous to Text-to-Speech synthesis, but with pictorial output that conveys the gist of the text. Our system integrates multiple AI components, including natural language processing, computer vision, computer graphics, and machine learning. We present an integration framework that combines these components by first identifying infonnative and 'picturable' text units, then searching for the most likely image parts conditioned on the text, and finally optimizing the picture layout conditioned on both the text and image parts. The effectiveness of our system is assessed in two user studies using children's books and news articles. Experiments show that the synthesized pictures convey as much infonnation about children's stories as the original artists' illustrations, and much more information about news articles than their original photos alone. These results suggest that Text-to-Picture synthesis has great potential in augmenting human-computer and human-human communication modalities, with applications in education and health care, among others.

References

[1]

Adorni, G.; Manzo, M. D.; and Giunchiglia, F. 1984. Natural language driven image generation. In Proc. COLING, 495-500.

[2]

Ben-Haim, N.; Babenko, B.; and Belongie, S. 2006. Improving web-based image search via content based clustering. In Proc. CVPR Workshops.

[3]

Brown, D. C., and Chandrasekaran, B. 1981. Design considerations for picture production in a natural language graphics system. Computer Graphics 15(2):174-207.

[4]

Clay, S. R., and Wilhelms, J. 1996. Put: Language-based interactive manipulation of objects. IEEE Computer Graphics and Applications 16(2):31-39.

[5]

Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24(5):603-619.

[6]

Coyne, B., and Sproat, R. 2001. WordsEye: An automatic text-to-scene conversion system. In Proc. SIGGRAPH 2001, 487-496.

[7]

Deselaers, T.; Keysers, D.; and Ney, H. 2004. Features for image retrieval: A quantitative comparison. In Proc. 26th DAGM Symposium, 228-236.

[8]

Felzenszwalb, P. F., and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. Int. J. Computer Vision 59(2):167-181.

[9]

Hehner, B. 1980. Blissymbolics for use. Blissymbolics Communication Institute.

[10]

Johansson, R.; Berglund, A.; Danielsson, M.; and Nugues, P. 2005. Automatic text-to-scene conversion in the traffic accident domain. In Proc. 19th IJCAI, 1073-1078.

[11]

Lin, C.-Y., and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proc. HLTNAACL 2003 Conf., 71-78.

[12]

Lu, R., and Zhang, S. 2002. Automatic Generation of Computer Animation: Using AI for Movie Animation. Lecture Notes in AI, vol. 2160. Berlin: Springer-Verlag.

[13]

Mayer, R. 2001. Multimedia Learning. Cambridge University Press, Cambridge, UK.

[14]

Mihalcea, R., and Leong, B. 2006. Toward Communicating Simple Sentences Using Pictorial Representations. In Proc. Conf. Association for Machine Translation in the Americas (AMTA).

[15]

Mihalcea, R., and Tarau, P. 2004. TextRank: Bringing order into texts. In Proc. Conf. Empirical Methods in Natural Language Processing, 404-411.

[16]

Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proc. 40th ACL Meeting, 311-318.

[17]

Pedersen, T.; Patwardhan, S.; and Michelizzi, J. 2004. Word-Net::Similarity - Measuring the relatedness of concepts. In Proc. 19th AAAI Conf., 1024-1025.

[18]

Turney, P. 1999. Learning to extract keyphrases from text. Technical Report ERB-1057, Institute for Information Technology, National Research Council of Canada.

[19]

Wang, J.; Sun, J.; Quan, L.; Tang, X.; and Shum, H.-Y. 2006. Picture collage. In Proc. Computer Vision and Pattern Recognition Conf., 347-354.

[20]

Yamada, A.; Yamamoto, T.; Ikeda, H.; Nishida, T.; and Doshita, S. 1992. Reconstructing spatial image from natural language texts. In Proc. COLING, Vol. 4, 1279-1283.

Cited By

Liu XKirilyuk VYuan XOlwal AChi PChen XDu R(2023)Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581566(1-20)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581566
Kalender MEren MWu ZCirakman OKutluk SGultekin GKorkmaz E(2018)VideolizationMultimedia Tools and Applications10.1007/s11042-016-4275-477:1(567-595)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-016-4275-4
Jhamtani HVarma SGundapuneni MDutta SHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)A Supervised Approach for Text IllustrationProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967214(217-221)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2967214
Show More Cited By

A text-to-picture synthesis system for augmenting communication
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Entrainment in Human-Agent Text Communication
Agent Computing and Multi-Agent Systems

Non-verbal information such as utterance speed and switching pause create an impression of the speaker. If intelligent agents could handle such non-verbal information properly, the quality of interactions between agents and human users would improve. ...
System Synthesis with VHDL: A Transformational Approach
Text-based communication influences self-esteem more than face-to-face or cellphone communication

Diary data was collected for 76 participants over 6day intervals.3649 social interactions were randomly sampled across communication channels.Self-esteem was measured at baseline and follow-up.Face-to-face communication was the most frequent form of ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence - Volume 2

July 2007

1990 pages

ISBN:9781577353232

Editor:
Anthony Cohn
University of Leeds

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 22 July 2007

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XKirilyuk VYuan XOlwal AChi PChen XDu R(2023)Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581566(1-20)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581566
Kalender MEren MWu ZCirakman OKutluk SGultekin GKorkmaz E(2018)VideolizationMultimedia Tools and Applications10.1007/s11042-016-4275-477:1(567-595)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-016-4275-4
Jhamtani HVarma SGundapuneni MDutta SHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)A Supervised Approach for Text IllustrationProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967214(217-221)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2967214
Hassani KLee W(2016)Visualizing Natural Language DescriptionsACM Computing Surveys10.1145/293271049:1(1-34)Online publication date: 29-Jun-2016
https://dl.acm.org/doi/10.1145/2932710
Liu XHuet B(2016)Linking socially contributed media with eventsMultimedia Systems10.1007/s00530-014-0436-322:4(433-442)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1007/s00530-014-0436-3
Jiang YLiu JLu H(2016)Chat with illustrationMultimedia Systems10.1007/s00530-014-0371-322:1(5-16)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s00530-014-0371-3
Wang XDu JWu SLi XXin HZhang YLi F(2015)High-level semantic image annotation based on hot Internet topicsMultimedia Tools and Applications10.1007/s11042-013-1742-z74:6(2055-2084)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1007/s11042-013-1742-z
Song WFinch ATanaka-Ishii KYasuda KSumita E(2013)picoTransACM Transactions on Interactive Intelligent Systems10.1145/2448116.24481213:1(1-31)Online publication date: 24-Apr-2013
https://dl.acm.org/doi/10.1145/2448116.2448121
Hall MClough Pde Lacalle OSoroa AAgirre E(2012)Enabling the discovery of digital cultural heritage objects through WikipediaProceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities10.5555/2390357.2390370(94-100)Online publication date: 24-Apr-2012
https://dl.acm.org/doi/10.5555/2390357.2390370
Sun CBao BXu CMei TYu XLienhart RZha ZLiu YSatoh S(2012)Kinect-based visual communication systemProceedings of the 4th International Conference on Internet Multimedia Computing and Service10.1145/2382336.2382353(55-59)Online publication date: 9-Sep-2012
https://dl.acm.org/doi/10.1145/2382336.2382353
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents