Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1619797.1619900guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A text-to-picture synthesis system for augmenting communication

Published: 22 July 2007 Publication History

Abstract

We present a novel Text-to-Picture system that synthesizes a picture from general, unrestricted natural language text. The process is analogous to Text-to-Speech synthesis, but with pictorial output that conveys the gist of the text. Our system integrates multiple AI components, including natural language processing, computer vision, computer graphics, and machine learning. We present an integration framework that combines these components by first identifying infonnative and 'picturable' text units, then searching for the most likely image parts conditioned on the text, and finally optimizing the picture layout conditioned on both the text and image parts. The effectiveness of our system is assessed in two user studies using children's books and news articles. Experiments show that the synthesized pictures convey as much infonnation about children's stories as the original artists' illustrations, and much more information about news articles than their original photos alone. These results suggest that Text-to-Picture synthesis has great potential in augmenting human-computer and human-human communication modalities, with applications in education and health care, among others.

References

[1]
Adorni, G.; Manzo, M. D.; and Giunchiglia, F. 1984. Natural language driven image generation. In Proc. COLING, 495-500.
[2]
Ben-Haim, N.; Babenko, B.; and Belongie, S. 2006. Improving web-based image search via content based clustering. In Proc. CVPR Workshops.
[3]
Brown, D. C., and Chandrasekaran, B. 1981. Design considerations for picture production in a natural language graphics system. Computer Graphics 15(2):174-207.
[4]
Clay, S. R., and Wilhelms, J. 1996. Put: Language-based interactive manipulation of objects. IEEE Computer Graphics and Applications 16(2):31-39.
[5]
Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24(5):603-619.
[6]
Coyne, B., and Sproat, R. 2001. WordsEye: An automatic text-to-scene conversion system. In Proc. SIGGRAPH 2001, 487-496.
[7]
Deselaers, T.; Keysers, D.; and Ney, H. 2004. Features for image retrieval: A quantitative comparison. In Proc. 26th DAGM Symposium, 228-236.
[8]
Felzenszwalb, P. F., and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. Int. J. Computer Vision 59(2):167-181.
[9]
Hehner, B. 1980. Blissymbolics for use. Blissymbolics Communication Institute.
[10]
Johansson, R.; Berglund, A.; Danielsson, M.; and Nugues, P. 2005. Automatic text-to-scene conversion in the traffic accident domain. In Proc. 19th IJCAI, 1073-1078.
[11]
Lin, C.-Y., and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proc. HLTNAACL 2003 Conf., 71-78.
[12]
Lu, R., and Zhang, S. 2002. Automatic Generation of Computer Animation: Using AI for Movie Animation. Lecture Notes in AI, vol. 2160. Berlin: Springer-Verlag.
[13]
Mayer, R. 2001. Multimedia Learning. Cambridge University Press, Cambridge, UK.
[14]
Mihalcea, R., and Leong, B. 2006. Toward Communicating Simple Sentences Using Pictorial Representations. In Proc. Conf. Association for Machine Translation in the Americas (AMTA).
[15]
Mihalcea, R., and Tarau, P. 2004. TextRank: Bringing order into texts. In Proc. Conf. Empirical Methods in Natural Language Processing, 404-411.
[16]
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proc. 40th ACL Meeting, 311-318.
[17]
Pedersen, T.; Patwardhan, S.; and Michelizzi, J. 2004. Word-Net::Similarity - Measuring the relatedness of concepts. In Proc. 19th AAAI Conf., 1024-1025.
[18]
Turney, P. 1999. Learning to extract keyphrases from text. Technical Report ERB-1057, Institute for Information Technology, National Research Council of Canada.
[19]
Wang, J.; Sun, J.; Quan, L.; Tang, X.; and Shum, H.-Y. 2006. Picture collage. In Proc. Computer Vision and Pattern Recognition Conf., 347-354.
[20]
Yamada, A.; Yamamoto, T.; Ikeda, H.; Nishida, T.; and Doshita, S. 1992. Reconstructing spatial image from natural language texts. In Proc. COLING, Vol. 4, 1279-1283.

Cited By

View all
  • (2023)Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581566(1-20)Online publication date: 19-Apr-2023
  • (2018)VideolizationMultimedia Tools and Applications10.1007/s11042-016-4275-477:1(567-595)Online publication date: 1-Jan-2018
  • (2016)A Supervised Approach for Text IllustrationProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967214(217-221)Online publication date: 1-Oct-2016
  • Show More Cited By
  1. A text-to-picture synthesis system for augmenting communication

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
    July 2007
    1990 pages
    ISBN:9781577353232

    Sponsors

    • Association for the Advancement of Artificial Intelligence

    Publisher

    AAAI Press

    Publication History

    Published: 22 July 2007

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581566(1-20)Online publication date: 19-Apr-2023
    • (2018)VideolizationMultimedia Tools and Applications10.1007/s11042-016-4275-477:1(567-595)Online publication date: 1-Jan-2018
    • (2016)A Supervised Approach for Text IllustrationProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967214(217-221)Online publication date: 1-Oct-2016
    • (2016)Visualizing Natural Language DescriptionsACM Computing Surveys10.1145/293271049:1(1-34)Online publication date: 29-Jun-2016
    • (2016)Linking socially contributed media with eventsMultimedia Systems10.1007/s00530-014-0436-322:4(433-442)Online publication date: 1-Jul-2016
    • (2016)Chat with illustrationMultimedia Systems10.1007/s00530-014-0371-322:1(5-16)Online publication date: 1-Feb-2016
    • (2015)High-level semantic image annotation based on hot Internet topicsMultimedia Tools and Applications10.1007/s11042-013-1742-z74:6(2055-2084)Online publication date: 1-Mar-2015
    • (2013)picoTransACM Transactions on Interactive Intelligent Systems10.1145/2448116.24481213:1(1-31)Online publication date: 24-Apr-2013
    • (2012)Enabling the discovery of digital cultural heritage objects through WikipediaProceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities10.5555/2390357.2390370(94-100)Online publication date: 24-Apr-2012
    • (2012)Kinect-based visual communication systemProceedings of the 4th International Conference on Internet Multimedia Computing and Service10.1145/2382336.2382353(55-59)Online publication date: 9-Sep-2012
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media