Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers

Boros, Emanuela; Cabrera-Diego, Luis Adrián; Doucet, Antoine

doi:10.1007/978-3-031-21756-2_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13636))

Included in the following conference series:

International Conference on Asian Digital Libraries

857 Accesses

Abstract

To prevent historical knowledge’s fading, research in event detection could facilitate access to digitized collections. In this paper, we propose a method for annotating multilingual historical documents for event detection in an unsupervised manner by leveraging entities and semantic notions of event types. We automatically annotate the documents by relying on dependency parse trees and automatic semantic mapping to event-based frames, with a focus on the multilingual transfer between frames and candidate events. The documents are afterward verified by native speakers, Digital Humanities researchers. We also report on experimental results of event detection in historical newspapers with a state-of-the-art model. We demonstrate that our approach allows for easy language adaptation by presenting two study cases with knowledge extracted from German newspapers from 1911 to 1933 regarding events surrounding International Women’s Day and from French newspapers between 1900 and 1944 related to the abolition of guillotine executions in France. Our preliminary findings show that this type of approach could alleviate the need for manual annotation by also providing a practical course of action toward unsupervised event detection from multilingual digitized and historical documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection

A Survey of Multilingual Event Extraction from Text

Construction of Event Annotation Corpus for Political News Texts

Notes

1.
https://www.newseye.eu/.
2.
https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf.
3.
An example of the Execution frame can be viewed at Framenet2 website.
4.
We chose movements, conflictual events, and membership in organizations.
5.
We used spaCy 3.1+ [23] with the model xx_ent_wiki_sm https://spacy.io/models/xx.
6.
We are aware that BERT was trained for representing true sentences rather than pseudo-sentences. However, we consider that BERT might generate an embedding that represents the context in which all the event triggers are frequently used.
7.
As we use multilingual BERT, even when the training is English, the model should be able to predict events in other languages in a zero-shot manner.
8.
Translation: In order to bring about this desired state, we send our greetings to our sisters all over the world and call on them to demonstrate together with us against the continuation of the war on International Women’s Day.
9.
This threshold was chosen experimentally after we verified the dismissed articles.

References

Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley frameNet project. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, pp. 86–90 (1998)
Google Scholar
Bedi, H., Patil, S., Hingmire, S., Palshikar, G.: Event timeline generation from history textbooks. In: Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017), pp. 69–77 (2017)
Google Scholar
Boros, E., et al.: Alleviating digitization errors in named entity recognition for historical documents. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 431–441. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.conll-1.35
Boros, E., et al.: Robust named entity recognition and linking on historical multilingual documents. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) CLEF 2020 Working Notes. Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum. CEUR-WS (2020)
Google Scholar
Boros, E., Moreno, J.G., Doucet, A.: Event detection with entity markers. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 233–240. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_20
Chapter Google Scholar
Boschee, E., Natarajan, P., Weischedel, R.: Automatic extraction of events from open source text for predictive forecasting. In: Subrahmanian, V. (ed.) Handbook of Computational Approaches to Counterterrorism, pp. 51–67. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5311-6_3
Boschetti, F., et al.: Computational analysis of historical documents: an application to Italian war bulletins in World War I and II. In: Workshop on Language resources and technologies for processing and linking historical documents and archives (LRT4HDA 2014), pp. 70–75. ELRA (2014)
Google Scholar
Bronstein, O., Dagan, I., Li, Q., Ji, H., Frank, A.: Seed-based event trigger labeling: how far can event descriptions get us? In: ACL, vol. 2, pp. 372–376 (2015)
Google Scholar
Chen, Y., Xu, L., Liu, K., Zeng, D., Zhao, J.: Event extraction via dynamic multi-pooling convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, pp. 167–176 (2015)
Google Scholar
Cybulska, A., Vossen, P.: Historical event extraction from text. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 39–43 (2011)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ehrmann, M., Romanello, M., Bircher, S., Clematide, S.: Introducing the CLEF 2020 HIPE shared task: named entity recognition and linking on historical newspapers. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 524–532. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_68
Chapter Google Scholar
Ehrmann, M., Romanello, M., Doucet, A., Clematide, S.: Introducing the HIPE 2022 shared task: named entity recognition and linking in multilingual historical documents. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 347–354. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_44
Chapter Google Scholar
Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Overview of CLEF HIPE 2020: named entity recognition and linking on historical newspapers. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 288–310. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_21
Chapter Google Scholar
Ehrmann, M., Romanello, M., Najem-Meyer, S., Doucet, A., Clematide, S.: Overview of HIPE-2022: named entity recognition and linking in multilingual historical documents. In: Barrón-Cedeño, A., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. LNCS , vol. 13390. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_26
Fellbaum, C.: Wordnet. In: Poli, R., Healy, M., Kameas, A. (eds) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Feng, X., Huang, L., Tang, D., Ji, H., Qin, B., Liu, T.: A language-independent neural network for event detection. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 2: Short Papers), vol. 2, pp. 66–71 (2016)
Google Scholar
Filatova, E., Hatzivassiloglou, V.: Event-based extractive summarization (2004)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING 1996, pp. 466–471 (1996)
Google Scholar
Hamdi, A., Jean-Caurant, A., Sidere, N., Coustaty, M., Doucet, A.: An analysis of the performance of named entity recognition over OCRed documents. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 333–334. IEEE, Illinois, USA (2019)
Google Scholar
Hong, Y., Zhang, J., Ma, B., Yao, J., Zhou, G., Zhu, Q.: Using cross-entity inference to improve event extraction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-vol. 1, pp. 1127–1136. Association for Computational Linguistics (2011)
Google Scholar
Hong, Y., Zhou, W., Zhang, J., Zhou, G., Zhu, Q.: Self-regulation: employing a generative adversarial network to improve event detection. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 515–526 (2018)
Google Scholar
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength natural language processing in python (2020). https://doi.org/10.5281/zenodo.1212303
Huang, R., Riloff, E.: Peeling back the layers: detecting event role fillers in secondary contexts. In: ACL 2011, pp. 1137–1147 (2011)
Google Scholar
Ide, N., Woolner, D.: Exploiting semantic web technologies for intelligent access to historical documents. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA), Lisbon, Portugal (2004). https://www.lrec-conf.org/proceedings/lrec2004/pdf/248.pdf
Jean-Caurant, A., Doucet, A.: Accessing and investigating large collections of historical newspapers with the NewsEye platform. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pp. 531–532 (2020)
Google Scholar
Li, Q., Ji, H., Huang, L.: Joint event extraction via structured prediction with global features. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 73–82. Association for Computational Linguistics, Sofia, Bulgaria (2013). https://www.aclweb.org/anthology/P13-1008
Li, W., Cheng, D., He, L., Wang, Y., Jin, X.: Joint event extraction based on hierarchical event schemas from FrameNet. IEEE Access 7, 25001–25015 (2019)
Article Google Scholar
Liu, J., Chen, Y., Liu, K., Bi, W., Liu, X.: Event extraction as machine reading comprehension. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1641–1651 (2020)
Google Scholar
Liu, M., Li, W., Wu, M., Lu, Q.: Extractive summarization based on event term clustering. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 185–188 (2007)
Google Scholar
Liu, S., et al.: Leveraging FrameNet to improve automatic event detection (2016)
Google Scholar
Liu, S., Chen, Y., Liu, K., Zhao, J.: Exploiting argument information to improve event detection via supervised attention mechanisms. In: 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), pp. 1789–1798. Vancouver, Canada (2017)
Google Scholar
Liu, S., et al.: Exploiting argument information to improve event detection via supervised attention mechanisms (2017)
Google Scholar
Miller, D., Boisen, S., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from noisy input: speech and OCR. In: Proceedings of the sixth conference on Applied natural language processing, pp. 316–324. Association for Computational Linguistics, Seattle, Washington, USA (2000)
Google Scholar
Mutuvi, S., Doucet, A., Odeo, M., Jatowt, A.: Evaluating the impact of ocr errors on topic modeling. In: Dobreva, M., Hinze, A., Žumer, M. (eds.) ICADL 2018. LNCS, vol. 11279, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04257-8_1
Chapter Google Scholar
Nguyen, T.H., Cho, K., Grishman, R.: Joint event extraction via recurrent neural networks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 300–309 (2016)
Google Scholar
Nguyen, T.H., Grishman, R.: Event detection and domain adaptation with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (vol. 2: Short Papers), pp. 365–371. Association for Computational Linguistics, Beijing, China (2015). https://doi.org/10.3115/v1/P15-2060
Oberbichler, S., et al.: Integrated interdisciplinary workflows for research on historical newspapers: perspectives from humanities scholars, computer scientists, and librarians. J. Assoc. Inf. Sci. Technol. 73(2), 225–239 (2021)
Google Scholar
Riloff, E.: Automatically generating extraction patterns from untagged text. In: AAAI1996, pp. 1044–1049 (1996)
Google Scholar
Riloff, E.: An empirical study of automated dictionary construction for information extraction in three domains. Artif. Intell. 85(1), 101–134 (1996)
Article Google Scholar
Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of named entity recognition tools for raw OCR text. In: Jancsary, J. (ed.) 11th Conference on Natural Language Processing, KONVENS 2012, Empirical Methods in Natural Language Processing, 19–21 Sept 2012. Scientific series of the ÖGAI, vol. 5, pp. 410–414. ÖGAI, Wien, Österreich, Vienna, Austria (2012). https://www.oegai.at/konvens2012/proceedings/60_rodriquez12w/
Rovera, M., Nanni, F., Ponzetto, S.P.: Event-Based access to historical Italian war memoirs. J. Comput. Cult. Heritage 14(1), 1-23 (2021). https://doi.org/10.1145/3406210
Saurí, R., Knippen, R., Verhagen, M., Pustejovsky, J.: Evita: a robust event recognizer for QA systems. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 700–707. Association for Computational Linguistics, Vancouver, British Columbia, Canada (2005). https://aclanthology.org/H05-1088
Shaw, R.B.: Events and periods as concepts for organizing historical knowledge. University of California, Berkeley (2010)
Google Scholar
Sprugnoli, R.: Event Detection and Classification for the Digital Humanities, Ph. D. thesis, University of Trento (2018)
Google Scholar
van Strien, D., Beelen, K., Ardanuy, M.C., Hosseini, K., McGillivray, B., Colavizza, G.: Assessing the impact of OCR quality on downstream NLP tasks. In: ICAART 2020 - Proceedings of the 12th International Conference on Agents and Artificial Intelligence vol. 1, pp. 484–496 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Walker, C., Stephanie, S., Julie, M., Kazuaki, M.: ACE 2005 multilingual training corpus. Linguistic Data Consortium, Technical report (2005)
Google Scholar
Yang, S., Feng, D., Qiao, L., Kan, Z., Li, D.: Exploring pre-trained language models for event extraction and generation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5284–5294 (2019)
Google Scholar
Yangarber, R., Grishman, R., Tapanainen, P., Huttunen, S.: Automatic acquisition of domain knowledge for information extraction. In: 18th International Conference on Computational Linguistics (COLING 2000), pp. 940–946 (2000)
Google Scholar
Zhang, T., Ji, H., Sil, A.: Joint entity and event extraction with generative adversarial imitation learning. Data Intell. 1(2), 99–120 (2019)
Article Google Scholar

Download references

Acknowledgments

This work has been supported by the European Union’s Horizon 2020 research and innovation program under grants 770299 (NewsEye) and 825153 (Embeddia). Also, it has been supported by the ANNA (2019-1R40226) and TERMITRAD (2020–2019-8510010) projects funded by the Nouvelle-Aquitaine Region, France.

Author information

Authors and Affiliations

University of La Rochelle, L3i, F-17000, La Rochelle, France
Emanuela Boros, Luis Adrián Cabrera-Diego & Antoine Doucet
Jus Mundi, F-75008, Paris, France
Luis Adrián Cabrera-Diego

Authors

Emanuela Boros
View author publications
You can also search for this author in PubMed Google Scholar
Luis Adrián Cabrera-Diego
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Doucet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuela Boros .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Yuen-Hsien Tseng
Doshisha University, Kyoto, Japan
Marie Katsurai
VNU University of Engineering and Technology, Hanoi, Vietnam
Hoa N. Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boros, E., Cabrera-Diego, L.A., Doucet, A. (2022). Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers. In: Tseng, YH., Katsurai, M., Nguyen, H.N. (eds) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol 13636. Springer, Cham. https://doi.org/10.1007/978-3-031-21756-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-21756-2_15
Published: 07 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21755-5
Online ISBN: 978-3-031-21756-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection

A Survey of Multilingual Event Extraction from Text

Construction of Event Annotation Corpus for Political News Texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Experimenting with Unsupervised Multilingual Event Detection in Historical Newspapers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection

A Survey of Multilingual Event Extraction from Text

Construction of Event Annotation Corpus for Political News Texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation