Abstract
Protest event databases are key sources that sociologists need to study the collective action dynamics and properties. This paper describes a finite-state approach to protest event features collection from short texts (news lead sentences) in several European languages (Bulgarian, French, Polish, Russian, Spanish, Swedish) using the General Architecture for Text Engineering (GATE). The results of the annotation performance evaluation are presented.
Under partial support of the Government of the Russian Federation Grant 074-U01.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Event/Location: Dataset In A Box, Linux-Option: http://openeventdata.github.io/eldiablo/.
- 2.
World-Wide Integrated Crisis Early Warning System: http://www.lockheedmartin.com/us/products/W-ICEWS/iData.html.
- 3.
Rosette Linguistics Platform: http://www.basistech.com/text-analytics/rosette/.
- 4.
General Architecture for Text Engineering: http://gate.ac.uk.
- 5.
Python-based Scrapy crawling framework: http://scrapy.org.
- 6.
Java Annotation Patterns Language:
- 7.
DBpedia: http://dbpedia.org.
- 8.
Protégé ontology editor: http://protege.stanford.edu.
- 9.
- 10.
Linked Data Gazetteer PR: https://confluence.ontotext.com/display/SWS/Linked+ Data+Gazetteer+PR.
- 11.
- 12.
The Stockholm Tagger:
http://www.ling.su.se/english/nlp/tools/stagger/stagger-the-stockholm-tagger-1.98986.
- 13.
BWP Gazetteer: https://sourceforge.net/projects/bwp-gazetteer/.
- 14.
- 15.
References
Boschee, E., Weischedel, R., Zamanian, A.: Automatic information extraction. In: Proceedings of the International Conference on Intelligence Analysis (2005)
Danilova, V., Alexandrov, M., Blanco, X.: A survey of multilingual event extraction from text. In: Métais, E., Roche, M., Teisseire, M. (eds.) NLDB 2014. LNCS, vol. 8455, pp. 85–88. Springer, Heidelberg (2014)
Danilova, V.: A pipeline for multilingual protest event selection and annotation. In: Proceedings of TIR15 Workshop (Text-based Information Retrieval), pp. 309–314. IEEE (2015)
Danilova, V., Popova, S.: Socio-political event extraction using a rule-based approach. In: Meersman, R., et al. (eds.) OTM 2014. LNCS, vol. 8842, pp. 537–546. Springer, Heidelberg (2014)
Dobrovolskiy, D., Pöppel, L.: Lexical synonymy within the semantic field POWER. In: Chahine, I.K. (ed.) Current Studies in Slavic Linguistics, pp. 281–295 (2013)
Eriksson, M., Gambäck, B.: SVENSK: a toolbox of Swedish language processing resources. In: Proceedings of the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP), Tzigov Chark, Bulgaria, pp. 336–341 (1997)
Hayes, M., Nardulli, P.F.: SPEEDs Societal Stability Protocol and the Study of Civil Unrest: an Overview and Comparison with Other Event Data Projects, October 2011. Cline Center for Democracy University of Illinois at Urbana-Champaign (2011)
Hogenboom, F., Frasincar, F., Kaymak, U., de Jong, F.: An overview of event extraction from text. In: Workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE 2011), vol. 779, pp. 48–57. CEUR Workshop Proceedings (2011)
Leetaru, K.: Automatic Document Categorization for Highly Nuanced Topics in Massive-Scale Document Collections: The SPEED BIN Program. Cline Center for Democracy University of Illinois at Urbana-Champaign, March 2011
Lejeune, G.: Structure patterns in information extraction: a multilingual solution? In: Advances in Method of Information and Communication Technology AMICT 2009, Petrozavodsk, Russia, vol. 11, pp. 105–111 (2009)
Llorens, H., Derczynski, L., Gaizauskas, R.J., Saquete, E.: TIMEN: an open temporal expression normalisation resource. In: LREC, vol. 34, pp. 3044–3051 (2012)
Muthiah, S., Huang, B., Arredondo, J., Mares, D., Getoor, L., Katz, G., Ramakrishnan, N.: Planned protest modeling in news and social media. In: AAAI 2005, pp. 3920–3927 (2015)
Schrodt, P.A.: KEDS: Kansas Event Data System, version 1.0 (1998)
Schrodt, P.A.: CAMEO: Conflict and mediation event observations. event and actor codebook. Department of Political Science, Pennsylvania State University, Version: 1.1b3, March 2012
Schrodt, P.A.: GDELT: global data on events, location and tone. In: Workshop at the Conflict Research Society, Essex University, 17 September 2013
Danilova, V.: Linguistic support for protest event data collection. Ph.D. Thesis, Autonomous University of Barcelona, 27th November 2015. http://www.tdx.cat/handle/10803/374232
Wunderwald, M.: Event Extraction from News Articles (Diploma Thesis), Dresden University of Technology, Department of Computer Science (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Danilova, V., Popova, S., Alexandrov, M. (2016). Multilingual Protest Event Data Collection with GATE. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science(), vol 9612. Springer, Cham. https://doi.org/10.1007/978-3-319-41754-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-41754-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41753-0
Online ISBN: 978-3-319-41754-7
eBook Packages: Computer ScienceComputer Science (R0)