Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3123266.3123294acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Improving Event Extraction via Multimodal Integration

Published: 19 October 2017 Publication History

Abstract

In this paper, we focus on improving Event Extraction (EE) by incorporating visual knowledge with words and phrases from text documents. We first discover visual patterns from large-scale text-image pairs in a weakly-supervised manner and then propose a multimodal event extraction algorithm where the event extractor is jointly trained with textual features and visual patterns. Extensive experimental results on benchmark data sets demonstrate that the proposed multimodal EE method can achieve significantly better performance on event extraction: absolute 7.1% F-score gain on event trigger labeling and 8.5% F-score gain on event argument labeling.

References

[1]
Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of Linguistic Annotation and Interoperability with Discourse, Workshop at the Annual Meeting of the Association for Computational Linguistics.
[2]
Antoine Bosselut, Jianfu Chen, David Warren, Hannaneh Hajishirzi, and Yejin Choi. 2016. Learning prototypical event structure from photo albums Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16).
[3]
Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. Proceedings of the Conference on Empirical Methods on Natural Language Processing.
[4]
Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, and Hsin-Hsi Chen. 2016. Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization Proceedings of the 2016 ACM on Multimedia Conference. ACM.
[5]
Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event extraction via dynamic multi-pooling convolutional neural networks Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. Vol. 1.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.
[7]
Xiaocheng Feng, Lifu Huang, Duyu Tang, Bing Qin, Heng Ji, and Ting Liu. 2016. A Language-Independent Neural Network for Event Detection Proceddings of the 54th Annual Meeting of the Association for Computational Linguistics. 66.
[8]
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016).
[9]
Tao Ge, Wenzhe Pei, Heng Ji, Sujian Li, Baobao Chang, and Zhifang Sui. 2015. Bring you to the past: Automatic Generation of Topically Relevant Event Chronicles. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-2015).
[10]
Ross Girshick. 2015. Fast r-cnn Proceedings of the IEEE International Conference on Computer Vision.
[11]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation Proceedings of the IEEE conference on computer vision and pattern recognition.
[12]
Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, and Heng Tao Shen. 2016. Attention-based LSTM with Semantic Consistency for Videos Captioning Proceedings of the 2016 ACM on Multimedia Conference. ACM.
[13]
Zhang Hanwang, Zawlin Kyaw, Jinyang Yu, and Shih-Fu Chang. 2017. PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (2017).
[14]
Julian Hitschler, Shigehiko Schamoni, and Stefan Riezler. 2016. Multimodal pivots for image caption translation. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.
[15]
Yu Hong, Jianfeng Zhang, Bin Ma, Jianmin Yao, Guodong Zhou, and Qiaoming Zhu. 2011. Using cross-entity inference to improve event extraction Proceedings of Annual Meeting of the Association for Computational Linguistics.
[16]
Lifu Huang, T Cassidy, X Feng, H Ji, CR Voss, J Han, and A Sil. 2016 a. Liberal event extraction and event schema induction Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-16).
[17]
Lifu Huang, Jonathan May, Xiaoman Pan, and Heng Ji. 2016 b. Building a Fine-Grained Entity Typing System Overnight for a New X (X= Language, Domain, Genre). arXiv preprint arXiv:1603.03112 (2016).
[18]
Hamid Izadinia, Fereshteh Sadeghi, Santosh K Divvala, Hannaneh Hajishirzi, Yejin Choi, and Ali Farhadi. 2015. Segment-phrase table for semantic segmentation, visual entailment and paraphrasing Proceedings of the IEEE International Conference on Computer Vision.
[19]
Heng Ji and Ralph Grishman. 2008. Refining Event Extraction through Unsupervised Cross-Document Inference Proceedings of Annual Meeting of the Association for Computational Linguistics.
[20]
Satwik Kottur, Ramakrishna Vedantam, José MF Moura, and Devi Parikh. 2016. Visual word2vec (vis-w2v): Learning visually grounded word embeddings using abstract scenes Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[21]
Hongzhi Li, Joseph G. Ellis, Heng Ji, and Shih-Fu Chang. 2016 a. Event Specific Multimodal Pattern Mining for Knowledge Base Construction Proceedings of ACM Multimedia Conference.
[22]
Hao Li, Heng Ji, Hongbo Deng, and Jiawei Han. 2001. Exploiting Background Information Networks to Enhance Bilingual Event Extraction Through Topic Modeling. In Proc. International Conference on Advances in Information Mining and Management (IMMM2011).
[23]
Qi Li, Heng Ji, Yu Hong, and Sujian Li. 2014. Constructing information networks using one single model Proceedings of the Conference on Empirical Methods on Natural Language Processing.
[24]
Qi Li, Heng Ji, and Liang Huang. 2013. Joint Event Extraction via Structured Prediction with Global Features. Proceedings of Annual Meeting of the Association for Computational Linguistics.
[25]
Yao Li, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2016 b. Mining mid-level visual patterns with deep CNN activations. International Journal of Computer Vision (2016).
[26]
Shasha Liao and Ralph Grishman. 2010. Using document level cross-event inference to improve event extraction Proceedings of Annual Meeting of the Association for Computational Linguistics.
[27]
Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual Relationship Detection with Language Priors Proceedings of European Conference on Computer Vision.
[28]
Takashi Miyazaki and Nobuyuki Shimizu. 2016. Cross-lingual image caption generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. Vol. 1.
[29]
Thien Huu Nguyen, Kyunghyun Cho, and Ralph Grishman. 2016. Joint Event Extraction via Recurrent Neural Networks Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[30]
NIST. 2005. The ACE 2005 Evaluation Plan. http://www.itl.nist.gov/iad/mig/tests/ace/ace05/doc/ace05-evaplan.v3.pdf. (2005).
[31]
Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, and Yueting Zhuang. 2016. Hierarchical recurrent neural encoder for video representation with application to captioning Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[32]
Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, and Devi Parikh. 2016. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions Proceedings of Conference on Empirical Methods in Natural Language Processing.
[33]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Advances in Neural Information Processing Systems (NIPS).
[34]
Christina Sauper and Regina Barzilay. 2009. Automatically generating wikipedia articles: A structure-aware approach Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics.
[35]
Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black holes and white rabbits: Metaphor identification with visual features Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[36]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR Vol. abs/1409.1556 (2014).
[37]
Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant, and Xiaoyi Ma. 2015. From light to rich ERE: annotation of entities, relations, and events Proceedings of Workshop on EVENTS: Definition, Detection, Coreference, and Representation, workshop at the North American Chapter of the Association for Computational Linguistics Conference.
[38]
Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, and Kate Saenko. 2016. Improving LS™-based video description with linguistic knowledge mined from text Proceedings of Conference on Empirical Methods in Natural Language Processing.
[39]
Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia Vol. 57 (2006).
[40]
Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. A Transition-based Algorithm for AMR Parsing. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[41]
William Yang Wang, Yashar Mehdad, Dragomir R Radev, and Amanda Stent. 2016. A low-rank approximation approach to learning joint embeddings of news stories and images for timeline summarization. In in Proceedings of Conference of the North American Chapter of the Association for COmputational Linguistics: Human Language Technologies.
[42]
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics Vol. 2 (2014).
[43]
Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, and Tat-Seng Chua. 2017. Visual translation embedding network for visual relation detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (2017).

Cited By

View all
  • (2024)Prompt-Enhanced Prototype Framework for Few-shot Event Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651359(1-7)Online publication date: 30-Jun-2024
  • (2023)Event-Centric Temporal Knowledge Graph Construction: A SurveyMathematics10.3390/math1123485211:23(4852)Online publication date: 2-Dec-2023
  • (2023)A Survey on Multimodal Knowledge Graphs: Construction, Completion and ApplicationsMathematics10.3390/math1108181511:8(1815)Online publication date: 11-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event extraction
  2. multimodal approach
  3. natural language processing
  4. visual pattern discovery

Qualifiers

  • Research-article

Conference

MM '17
Sponsor:
MM '17: ACM Multimedia Conference
October 23 - 27, 2017
California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)76
  • Downloads (Last 6 weeks)10
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Prompt-Enhanced Prototype Framework for Few-shot Event Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651359(1-7)Online publication date: 30-Jun-2024
  • (2023)Event-Centric Temporal Knowledge Graph Construction: A SurveyMathematics10.3390/math1123485211:23(4852)Online publication date: 2-Dec-2023
  • (2023)A Survey on Multimodal Knowledge Graphs: Construction, Completion and ApplicationsMathematics10.3390/math1108181511:8(1815)Online publication date: 11-Apr-2023
  • (2023)Role Knowledge Prompting for Document-Level Event Argument ExtractionApplied Sciences10.3390/app1305304113:5(3041)Online publication date: 27-Feb-2023
  • (2023)Few-shot Domain-Adaptative Visually-fused Event Detection from Text2023 26th International Conference on Information Fusion (FUSION)10.23919/FUSION52260.2023.10224213(1-8)Online publication date: 28-Jun-2023
  • (2023)Dependency-based BERT for Chinese Event Argument ExtractionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363330622:12(1-21)Online publication date: 21-Nov-2023
  • (2023)Training Multimedia Event Extraction With Generated Images and CaptionsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612526(5504-5513)Online publication date: 26-Oct-2023
  • (2023)Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role LabelingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612096(5281-5291)Online publication date: 26-Oct-2023
  • (2023)Multimodal Chinese Event Extraction on Text and Audio2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191258(1-8)Online publication date: 18-Jun-2023
  • (2023)Multimodal Event Classification in Social MediaNeural Information Processing10.1007/978-981-99-8178-6_26(338-350)Online publication date: 30-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media