Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3104482.3104499guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Parsing natural scenes and natural language with recursive neural networks

Published: 28 June 2011 Publication History

Abstract

Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure helps us to not only identify the units that an image or sentence contains but also how they interact to form a whole. We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. The same algorithm can be used both to provide a competitive syntactic parser for natural language sentences from the Penn Treebank and to outperform alternative approaches for semantic scene segmentation, annotation and classification. For segmentation and annotation our algorithm obtains a new level of state-of-the-art performance on the Stanford background dataset (78.1%). The features from the image parse tree outperform Gist descriptors for scene classification by 4%.

References

[1]
Aude, O. and Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. IJCV, 42, 2001.
[2]
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. JMLR, 3, 2003.
[3]
Collobert, R. and Weston, J. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML, 2008.
[4]
Comaniciu, D. and Meer, P. Mean shift: a robust approach toward feature space analysis. IEEE PAMI, 24(5):603-619, May 2002.
[5]
Goller, C. and Küchler, A. Learning task-dependent distributed representations by backpropagation through structure. In ICNN, 1996.
[6]
Gould, S., Fulton, R., and Koller, D. Decomposing a Scene into Geometric and Semantically Consistent Regions. In ICCV, 2009.
[7]
Gupta, A. and Davis, L. S. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008.
[8]
Henderson, J. Neural network probability estimation for broad coverage parsing. In EACL, 2003.
[9]
Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science, 313, 2006.
[10]
Hoiem, D., Efros, A.A., and Hebert, M. Putting Objects in Perspective. CVPR, 2006.
[11]
Lee, H., Grosse, R., Ranganath, R., and Ng, A. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML, 2009.
[12]
Li, L-J., Socher, R., and Fei-Fei, L. Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In CVPR, 2009.
[13]
Manning, C. D. and Schütze, H. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, Massachusetts, 1999.
[14]
Petrov, S., Barrett, L., Thibaux, R., and Klein, D. Learning accurate, compact, and interpretable tree annotation. In ACL, 2006.
[15]
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. Objects in context. In ICCV, 2007.
[16]
Ratliff, N., Bagnell, J. A., and Zinkevich, M. (Online) sub-gradient methods for structured prediction. In AI Stats, 2007.
[17]
Schmid, Cordelia. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, 2006.
[18]
Shotton, J., Winn, J., Rother, C., and Criminisi, A. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV, 2006.
[19]
Siskind, J. M., J. Sherman, Jr, Pollak, I., Harper, M. P., and Bouman, C. A. Spatial Random Tree Grammars for Modeling Hierarchal Structure in Images with Regions of Arbitrary Shape. IEEE PAMI, 29, 2007.
[20]
Socher, R. and Fei-Fei, L. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.
[21]
Socher, R., Manning, C. D., and Ng, A. Y. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Deep Learning and Unsupervised Feature Learning Workshop, 2010.
[22]
Taskar, B., Klein, D., Collins, M., Koller, D., and Manning, C. Max-margin parsing. In EMNLP, 2004.
[23]
Tighe, Joseph and Lazebnik, Svetlana. Superparsing: scalable nonparametric image parsing with superpixels. In ECCV, 2010.
[24]
Zhu, Long, Chen, Yuanhao, Torralba, Antonio, Freeman, William T., and Yuille, Alan L. Part and appearance sharing: Recursive Compositional Models for multi-view. In CVPR, 2010.
[25]
Zhu, Song C. and Mumford, David. A stochastic grammar of images. Found. Trends. Comput. Graph. Vis., 2(4): 259-362, 2006.

Cited By

View all
  • (2022)Efficient comparison of sentence embeddingsProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549752(1-6)Online publication date: 7-Sep-2022
  • (2022)Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354382622:2(1-27)Online publication date: 27-Dec-2022
  • (2022)Systematic literature review of arabic aspect-based sentiment analysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.07.00134:9(6524-6551)Online publication date: 1-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'11: Proceedings of the 28th International Conference on International Conference on Machine Learning
June 2011
1216 pages
ISBN:9781450306195

Sponsors

  • NSF: National Science Foundation
  • Xerox
  • Microsoft Research: Microsoft Research
  • Yahoo!
  • Amazon: Amazon.com

Publisher

Omnipress

Madison, WI, United States

Publication History

Published: 28 June 2011

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Efficient comparison of sentence embeddingsProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549752(1-6)Online publication date: 7-Sep-2022
  • (2022)Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/354382622:2(1-27)Online publication date: 27-Dec-2022
  • (2022)Systematic literature review of arabic aspect-based sentiment analysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2022.07.00134:9(6524-6551)Online publication date: 1-Oct-2022
  • (2022)Recurrent Greedy Parsing with Neural NetworksMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_9(130-144)Online publication date: 10-Mar-2022
  • (2021)Integrating tree path in transformer for code representationProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540976(9343-9354)Online publication date: 6-Dec-2021
  • (2021)Deeply shared filter bases for parameter-efficient convolutional neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540827(7397-7408)Online publication date: 6-Dec-2021
  • (2021)Searching a database of source codes using contextualized code searchProceedings of the VLDB Endowment10.14778/3401960.340197213:10(1765-1778)Online publication date: 10-Mar-2021
  • (2021)Database Principles and Challenges in Text AnalysisACM SIGMOD Record10.1145/3484622.348462450:2(6-17)Online publication date: 31-Aug-2021
  • (2021)Structured Multi-modal Feature Embedding and Alignment for Image-Sentence RetrievalProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475634(5185-5193)Online publication date: 17-Oct-2021
  • (2021)Dialogue History Matters! Personalized Response Selection in Multi-Turn Retrieval-Based ChatbotsACM Transactions on Information Systems10.1145/345318339:4(1-25)Online publication date: 17-Aug-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media