research-article

Improving Event Extraction via Multimodal Integration

Authors:

Spencer Whitehead,

Shih-Fu ChangAuthors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 270 - 278

https://doi.org/10.1145/3123266.3123294

Published: 19 October 2017 Publication History

Abstract

In this paper, we focus on improving Event Extraction (EE) by incorporating visual knowledge with words and phrases from text documents. We first discover visual patterns from large-scale text-image pairs in a weakly-supervised manner and then propose a multimodal event extraction algorithm where the event extractor is jointly trained with textual features and visual patterns. Extensive experimental results on benchmark data sets demonstrate that the proposed multimodal EE method can achieve significantly better performance on event extraction: absolute 7.1% F-score gain on event trigger labeling and 8.5% F-score gain on event argument labeling.

References

[1]

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. In Proceedings of Linguistic Annotation and Interoperability with Discourse, Workshop at the Annual Meeting of the Association for Computational Linguistics.

[2]

Antoine Bosselut, Jianfu Chen, David Warren, Hannaneh Hajishirzi, and Yejin Choi. 2016. Learning prototypical event structure from photo albums Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16).

[3]

Danqi Chen and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using Neural Networks. Proceedings of the Conference on Empirical Methods on Natural Language Processing.

[4]

Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, and Hsin-Hsi Chen. 2016. Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization Proceedings of the 2016 ACM on Multimedia Conference. ACM.

Digital Library

[5]

Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event extraction via dynamic multi-pooling convolutional neural networks Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. Vol. 1.

[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.

[7]

Xiaocheng Feng, Lifu Huang, Duyu Tang, Bing Qin, Heng Ji, and Ting Liu. 2016. A Language-Independent Neural Network for Event Detection Proceddings of the 54th Annual Meeting of the Association for Computational Linguistics. 66.

[8]

Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016).

[9]

Tao Ge, Wenzhe Pei, Heng Ji, Sujian Li, Baobao Chang, and Zhifang Sui. 2015. Bring you to the past: Automatic Generation of Topically Relevant Event Chronicles. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-2015).

[10]

Ross Girshick. 2015. Fast r-cnn Proceedings of the IEEE International Conference on Computer Vision.

Digital Library

[11]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation Proceedings of the IEEE conference on computer vision and pattern recognition.

Digital Library

[12]

Zhao Guo, Lianli Gao, Jingkuan Song, Xing Xu, Jie Shao, and Heng Tao Shen. 2016. Attention-based LSTM with Semantic Consistency for Videos Captioning Proceedings of the 2016 ACM on Multimedia Conference. ACM.

Digital Library

[13]

Zhang Hanwang, Zawlin Kyaw, Jinyang Yu, and Shih-Fu Chang. 2017. PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (2017).

[14]

Julian Hitschler, Shigehiko Schamoni, and Stefan Riezler. 2016. Multimodal pivots for image caption translation. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.

[15]

Yu Hong, Jianfeng Zhang, Bin Ma, Jianmin Yao, Guodong Zhou, and Qiaoming Zhu. 2011. Using cross-entity inference to improve event extraction Proceedings of Annual Meeting of the Association for Computational Linguistics.

Digital Library

[16]

Lifu Huang, T Cassidy, X Feng, H Ji, CR Voss, J Han, and A Sil. 2016 a. Liberal event extraction and event schema induction Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-16).

[17]

Lifu Huang, Jonathan May, Xiaoman Pan, and Heng Ji. 2016 b. Building a Fine-Grained Entity Typing System Overnight for a New X (X= Language, Domain, Genre). arXiv preprint arXiv:1603.03112 (2016).

[18]

Hamid Izadinia, Fereshteh Sadeghi, Santosh K Divvala, Hannaneh Hajishirzi, Yejin Choi, and Ali Farhadi. 2015. Segment-phrase table for semantic segmentation, visual entailment and paraphrasing Proceedings of the IEEE International Conference on Computer Vision.

Digital Library

[19]

Heng Ji and Ralph Grishman. 2008. Refining Event Extraction through Unsupervised Cross-Document Inference Proceedings of Annual Meeting of the Association for Computational Linguistics.

[20]

Satwik Kottur, Ramakrishna Vedantam, José MF Moura, and Devi Parikh. 2016. Visual word2vec (vis-w2v): Learning visually grounded word embeddings using abstract scenes Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[21]

Hongzhi Li, Joseph G. Ellis, Heng Ji, and Shih-Fu Chang. 2016 a. Event Specific Multimodal Pattern Mining for Knowledge Base Construction Proceedings of ACM Multimedia Conference.

Digital Library

[22]

Hao Li, Heng Ji, Hongbo Deng, and Jiawei Han. 2001. Exploiting Background Information Networks to Enhance Bilingual Event Extraction Through Topic Modeling. In Proc. International Conference on Advances in Information Mining and Management (IMMM2011).

[23]

Qi Li, Heng Ji, Yu Hong, and Sujian Li. 2014. Constructing information networks using one single model Proceedings of the Conference on Empirical Methods on Natural Language Processing.

[24]

Qi Li, Heng Ji, and Liang Huang. 2013. Joint Event Extraction via Structured Prediction with Global Features. Proceedings of Annual Meeting of the Association for Computational Linguistics.

[25]

Yao Li, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2016 b. Mining mid-level visual patterns with deep CNN activations. International Journal of Computer Vision (2016).

Digital Library

[26]

Shasha Liao and Ralph Grishman. 2010. Using document level cross-event inference to improve event extraction Proceedings of Annual Meeting of the Association for Computational Linguistics.

Digital Library

[27]

Cewu Lu, Ranjay Krishna, Michael Bernstein, and Li Fei-Fei. 2016. Visual Relationship Detection with Language Priors Proceedings of European Conference on Computer Vision.

[28]

Takashi Miyazaki and Nobuyuki Shimizu. 2016. Cross-lingual image caption generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Vol. Vol. 1.

[29]

Thien Huu Nguyen, Kyunghyun Cho, and Ralph Grishman. 2016. Joint Event Extraction via Recurrent Neural Networks Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[30]

NIST. 2005. The ACE 2005 Evaluation Plan. http://www.itl.nist.gov/iad/mig/tests/ace/ace05/doc/ace05-evaplan.v3.pdf. (2005).

[31]

Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, and Yueting Zhuang. 2016. Hierarchical recurrent neural encoder for video representation with application to captioning Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[32]

Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, and Devi Parikh. 2016. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions Proceedings of Conference on Empirical Methods in Natural Language Processing.

[33]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Advances in Neural Information Processing Systems (NIPS).

Digital Library

[34]

Christina Sauper and Regina Barzilay. 2009. Automatically generating wikipedia articles: A structure-aware approach Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1. Association for Computational Linguistics.

Digital Library

[35]

Ekaterina Shutova, Douwe Kiela, and Jean Maillard. 2016. Black holes and white rabbits: Metaphor identification with visual features Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[36]

K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR Vol. abs/1409.1556 (2014).

[37]

Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant, and Xiaoyi Ma. 2015. From light to rich ERE: annotation of entities, relations, and events Proceedings of Workshop on EVENTS: Definition, Detection, Coreference, and Representation, workshop at the North American Chapter of the Association for Computational Linguistics Conference.

[38]

Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, and Kate Saenko. 2016. Improving LS™-based video description with linguistic knowledge mined from text Proceedings of Conference on Empirical Methods in Natural Language Processing.

[39]

Christopher Walker, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. 2006. ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia Vol. 57 (2006).

[40]

Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. A Transition-based Algorithm for AMR Parsing. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[41]

William Yang Wang, Yashar Mehdad, Dragomir R Radev, and Amanda Stent. 2016. A low-rank approximation approach to learning joint embeddings of news stories and images for timeline summarization. In in Proceedings of Conference of the North American Chapter of the Association for COmputational Linguistics: Human Language Technologies.

[42]

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics Vol. 2 (2014).

[43]

Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, and Tat-Seng Chua. 2017. Visual translation embedding network for visual relation detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (2017).

Cited By

Liu XChen XZhu YWu B(2024)Prompt-Enhanced Prototype Framework for Few-shot Event Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651359(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651359
Knez TŽitnik S(2023)Event-Centric Temporal Knowledge Graph Construction: A SurveyMathematics10.3390/math1123485211:23(4852)Online publication date: 2-Dec-2023
https://doi.org/10.3390/math11234852
Chen YGe XYang SHu LLi JZhang J(2023)A Survey on Multimodal Knowledge Graphs: Construction, Completion and ApplicationsMathematics10.3390/math1108181511:8(1815)Online publication date: 11-Apr-2023
https://doi.org/10.3390/math11081815
Show More Cited By

Index Terms

Improving Event Extraction via Multimodal Integration
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
    2. Natural language processing
      1. Information extraction

Recommendations

Seq2EG: a novel and effective event graph parsing approach for event extraction
Abstract
Event extraction is a fundamental task in information extraction. Most previous approaches typically transform event extraction into two subtasks: trigger classification and argument classification, and solve them via classification-based methods, ...
EABERT: An Event Annotation Enhanced BERT Framework for Event Extraction
Abstract
Event extraction(EE) is a challenging task of information extraction, which aims to extract structured event information from text. Existing methods usually achieve state-of-the-art performance based on pre-trained language models(PLMs) that ...
Label Semantic Extension for Chinese Event Extraction
Natural Language Processing and Chinese Computing
Abstract
Event extraction (EE) is an essential yet challenging information extraction task, which aims at extracting event structures from unstructured text. Recent work on Chinese event extraction has achieved state-of-the-art performance by modeling ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
566
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)10

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XChen XZhu YWu B(2024)Prompt-Enhanced Prototype Framework for Few-shot Event Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651359(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651359
Knez TŽitnik S(2023)Event-Centric Temporal Knowledge Graph Construction: A SurveyMathematics10.3390/math1123485211:23(4852)Online publication date: 2-Dec-2023
https://doi.org/10.3390/math11234852
Chen YGe XYang SHu LLi JZhang J(2023)A Survey on Multimodal Knowledge Graphs: Construction, Completion and ApplicationsMathematics10.3390/math1108181511:8(1815)Online publication date: 11-Apr-2023
https://doi.org/10.3390/math11081815
Hu RLiu HZhou H(2023)Role Knowledge Prompting for Document-Level Event Argument ExtractionApplied Sciences10.3390/app1305304113:5(3041)Online publication date: 27-Feb-2023
https://doi.org/10.3390/app13053041
Moghimifar FShiri FHaffari RLi YNguyen V(2023)Few-shot Domain-Adaptative Visually-fused Event Detection from Text2023 26th International Conference on Information Fusion (FUSION)10.23919/FUSION52260.2023.10224213(1-8)Online publication date: 28-Jun-2023
https://doi.org/10.23919/FUSION52260.2023.10224213
Li DYan LMa Z(2023)Dependency-based BERT for Chinese Event Argument ExtractionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363330622:12(1-21)Online publication date: 21-Nov-2023
https://dl.acm.org/doi/10.1145/3633306
Du ZLi YGuo XSun YLi BEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Training Multimedia Event Extraction With Generated Images and CaptionsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612526(5504-5513)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612526
Zhao YFei HCao YLi BZhang MWei JZhang MChua TEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role LabelingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612096(5281-5291)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612096
Zhang XWang ZLi P(2023)Multimodal Chinese Event Extraction on Text and Audio2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191258(1-8)Online publication date: 18-Jun-2023
https://doi.org/10.1109/IJCNN54540.2023.10191258
Wu HLi PWang Z(2023)Multimodal Event Classification in Social MediaNeural Information Processing10.1007/978-981-99-8178-6_26(338-350)Online publication date: 30-Nov-2023
https://doi.org/10.1007/978-981-99-8178-6_26
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents