Article

Parsing natural scenes and natural language with recursive neural networks

Authors:

Richard Socher,

Cliff Chiung-Yu Lin,

Christopher D. ManningAuthors Info & Claims

ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning

Pages 129 - 136

Published: 28 June 2011 Publication History

Abstract

Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure helps us to not only identify the units that an image or sentence contains but also how they interact to form a whole. We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. The same algorithm can be used both to provide a competitive syntactic parser for natural language sentences from the Penn Treebank and to outperform alternative approaches for semantic scene segmentation, annotation and classification. For segmentation and annotation our algorithm obtains a new level of state-of-the-art performance on the Stanford background dataset (78.1%). The features from the image parse tree outperform Gist descriptors for scene classification by 4%.

References

[1]

Aude, O. and Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. IJCV, 42, 2001.

[2]

Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. JMLR, 3, 2003.

[3]

Collobert, R. and Weston, J. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008.

[4]

Comaniciu, D. and Meer, P. Mean shift: a robust approach toward feature space analysis. IEEE PAMI, 24(5):603-619, May 2002.

[5]

Goller, C. and Küchler, A. Learning task-dependent distributed representations by backpropagation through structure. In ICNN, 1996.

[6]

Gould, S., Fulton, R., and Koller, D. Decomposing a Scene into Geometric and Semantically Consistent Regions. In ICCV, 2009.

[7]

Gupta, A. and Davis, L. S. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008.

[8]

Henderson, J. Neural network probability estimation for broad coverage parsing. In EACL, 2003.

[9]

Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, 313, 2006.

[10]

Hoiem, D., Efros, A.A., and Hebert, M. Putting Objects in Perspective. CVPR, 2006.

[11]

Lee, H., Grosse, R., Ranganath, R., and Ng, A. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, 2009.

[12]

Li, L-J., Socher, R., and Fei-Fei, L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In CVPR, 2009.

[13]

Manning, C. D. and Schütze, H. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.

[14]

Petrov, S., Barrett, L., Thibaux, R., and Klein, D. Learning accurate, compact, and interpretable tree annotation. In ACL, 2006.

[15]

Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. Objects in context. In ICCV, 2007.

[16]

Ratliff, N., Bagnell, J. A., and Zinkevich, M. (Online) sub-gradient methods for structured prediction. In AI Stats, 2007.

[17]

Schmid, Cordelia. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.

[18]

Shotton, J., Winn, J., Rother, C., and Criminisi, A. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, 2006.

[19]

Siskind, J. M., J. Sherman, Jr, Pollak, I., Harper, M. P., and Bouman, C. A. Spatial Random Tree Grammars for Modeling Hierarchal Structure in Images with Regions of Arbitrary Shape. IEEE PAMI, 29, 2007.

[20]

Socher, R. and Fei-Fei, L. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.

[21]

Socher, R., Manning, C. D., and Ng, A. Y. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Deep Learning and Unsupervised Feature Learning Workshop, 2010.

[22]

Taskar, B., Klein, D., Collins, M., Koller, D., and Manning, C. Max-margin parsing. In EMNLP, 2004.

[23]

Tighe, Joseph and Lazebnik, Svetlana. Superparsing: scalable nonparametric image parsing with superpixels. In ECCV, 2010.

[24]

Zhu, Long, Chen, Yuanhao, Torralba, Antonio, Freeman, William T., and Yuille, Alan L. Part and appearance sharing: Recursive Compositional Models for multi-view. In CVPR, 2010.

[25]

Zhu, Song C. and Mumford, David. A stochastic grammar of images. Found. Trends. Comput. Graph. Vis., 2(4): 259-362, 2006.

Cited By

Zoupanos SKolovos SKanavos APapadimitriou OMaragoudakis M(2022)Efficient comparison of sentence embeddingsProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549752(1-6)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3549737.3549752
Demir S(2022)Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354382622:2(1-27)Online publication date: 27-Dec-2022
https://dl.acm.org/doi/10.1145/3543826
Alyami SAlhothali AJamal A(2022)Systematic literature review of arabic aspect-based sentiment analysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.07.00134:9(6524-6551)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2022.07.001
Show More Cited By

Parsing natural scenes and natural language with recursive neural networks
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning approaches
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

GLR parsing with multiple grammars for natural language queries

This article presents an approach for parsing natural language queries that integrates multiple subparsers and subgrammars, in contrast to the traditional single grammar and parser approach. In using LR(k) parsers for natural language processing, we are ...
Towards Incremental Parsing of Natural Language Using Recursive Neural Networks

In this paper we develop novel algorithmic ideas for building a natural language parser grounded upon the hypothesis of incrementality. Although widely accepted and experimentally supported under a cognitive perspective as a model of the human parser, ...
Lr parsing for tree adjoining grammars and its application to corpus-based natural language parsing

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning

June 2011

1216 pages

ISBN:9781450306195

Sponsors

NSF: National Science Foundation
Xerox
Microsoft Research: Microsoft Research
Yahoo!
Amazon: Amazon.com

Publisher

Omnipress

Madison, WI, United States

Publication History

Published: 28 June 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

91
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zoupanos SKolovos SKanavos APapadimitriou OMaragoudakis M(2022)Efficient comparison of sentence embeddingsProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549752(1-6)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3549737.3549752
Demir S(2022)Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354382622:2(1-27)Online publication date: 27-Dec-2022
https://dl.acm.org/doi/10.1145/3543826
Alyami SAlhothali AJamal A(2022)Systematic literature review of arabic aspect-based sentiment analysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.07.00134:9(6524-6551)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.jksuci.2022.07.001
Legrand JCollobert R(2022)Recurrent Greedy Parsing with Neural NetworksMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_9(130-144)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44851-9_9
Peng HLi GWang WZhao YJin ZRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Integrating tree path in transformer for code representationProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540976(9343-9354)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540976
Kang WKim DRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Deeply shared filter bases for parameter-efficient convolutional neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540827(7397-7408)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540827
Mukherjee RChaudhuri SJermaine C(2021)Searching a database of source codes using contextualized code searchProceedings of the VLDB Endowment10.14778/3401960.340197213:10(1765-1778)Online publication date: 10-Mar-2021
https://dl.acm.org/doi/10.14778/3401960.3401972
Doleschal JKimelfeld BMartens W(2021)Database Principles and Challenges in Text AnalysisACM SIGMOD Record10.1145/3484622.348462450:2(6-17)Online publication date: 31-Aug-2021
https://dl.acm.org/doi/10.1145/3484622.3484624
Ge XChen FJose JJi ZWu ZLiu XShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Structured Multi-modal Feature Embedding and Alignment for Image-Sentence RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475634(5185-5193)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475634
Li JLiu CTao CChan ZZhao DZhang MYan R(2021)Dialogue History Matters! Personalized Response Selection in Multi-Turn Retrieval-Based ChatbotsACM Transactions on Information Systems10.1145/345318339:4(1-25)Online publication date: 17-Aug-2021
https://dl.acm.org/doi/10.1145/3453183
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents