Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3155902.3155903acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgirConference Proceedingsconference-collections
research-article

Named entity similarity computation: The case of social event entities

Published: 30 November 2017 Publication History

Abstract

We consider events as complex named entities, recursively consisting of simple named entities (people, places, organizations, dates, etc.) and/or other complex named entities. We have developed a processing chain dedicated to extracting, indexing and searching for social events in Web pages. In this context, we are proposing a generic similarity computation function that targets any type of complex named entity. In this paper, we implement this function according to three approaches which we shall describe and then experiment on a set of social events.

References

[1]
Alan Agresti and Maria Kateri. 2011. Categorical Data Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 206--208.
[2]
John H Aldrich and Forrest D Nelson. 1984. Linear probability, logit, and probit models. Vol. 45. Sage.
[3]
Kate Beard and Vyjayanti Sharma. 1997. Multidimensional ranking for data in digital spatial libraries. International Journal on Digital Libraries 1, 2 (01 Sep 1997), 153--160.
[4]
Hila Becker, Mor Naaman, and Luis Gravano. 2010. Learning similarity metrics for event identification in social media. In Proceedings of the third ACM international conference on Web search and data mining. ACM, 291--300.
[5]
Henrik Bulskov, Rasmus Knappe, and Troels Andreasen. 2002. On measuring similarity for conceptual querying. In International Conference on Flexible Query Answering Systems. Springer, 100--111.
[6]
Nitin R Chopde and M Nichat. 2013. Landmark based shortest path detection by using a* and haversine formula. International Journal of Innovative Research in Computer and Communication Engineering 1, 2 (2013), 298--302.
[7]
William Cohen, Pradeep Ravikumar, and Stephen Fienberg. 2003. A comparison of string metrics for matching names and records. In Kdd workshop on data cleaning and object consolidation, Vol. 3. 73--78.
[8]
Céliada Costa Pereira, Mauro Dragoni, and Gabriella Pasi. 2009. Multidimensional Relevance: A New Aggregation Criterion. In Advances in Information Retrieval, 31th European Conference on IR Research, ECIR 2009, Toulouse, France, April 6-9, 2009. Proceedings (Lecture Notes in Computer Science), Mohand Boughanem, Catherine Berrut, Josiane Mothe, and Chantal Soulé-Dupuy (Eds.), Vol. 5478. Springer, 264--275.
[9]
Céliada Costa Pereira, Mauro Dragoni, and Gabriella Pasi. 2012. Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting. Inf. Process. Manage. 48, 2 (2012), 340--357.
[10]
Pierre Dagnelie. 1998. Statistique th??orique et appliqu??e. Vol. 2. De Boeck Universit??
[11]
Michel Marie Deza and Elena Deza. 2009. Encyclopedia of distances. In Encyclopedia of Distances. Springer, 1--583.
[12]
Maud Ehrmann, Damien Nouvel, and Sophie Rosset. 2016. Named Entity Resources - Overview and Outlook. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016., Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/summaries/987.html
[13]
Herbert Federer. 2014. Geometric measure theory. Springer.
[14]
John Foley, Michael Bendersky, and Vanja Josifovski. 2015. Learning to Extract Local Events from the Web. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, August 9-13, 2015, Ricardo A. Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier A. Ribeiro-Neto (Eds.). ACM, 423--432.
[15]
Edward A. Fox and Joseph A. Shaw. 1993. Combination of Multiple Searches. In TREC-1: Proceedings of the First Text REtrieval Conference, Donna K. Harman (Ed.). NIST, Gaithersburg, MD, USA, 243--252.
[16]
J Friedman and Bogdan E Popescu. 2003. Gradient directed regularization for linear regression and classification. Technical Report. Citeseer.
[17]
Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, and Michalis Vazirgiannis. 2003. THESUS: Organizing Web Document Collections Based on Link Semantics. The VLDB Journal 12, 4 (Nov. 2003), 320--332.
[18]
L.L. Hill. 1990. Access to Geographic Concepts in Online Bibliographic Files: effectiveness of current practices and the potential of a graphic interface. Ph.D. Dissertation. University of Pittsburgh, U.S.A.
[19]
Paul Jaccard. 1901. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles 37 (1901), 547--579.
[20]
Matthew A Jaro. 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Amer. Statist. Assoc. 84, 406 (1989), 414--420.
[21]
Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. 2010. Boilerplate detection using shallow text features. In Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM 2010, New York, NY, USA, February 4-6, 2010, Brian D. Davison, Torsten Suel, Nick Craswell, and Bing Liu (Eds.). ACM, 441--450.
[22]
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.
[23]
Dekang Lin et al. 1998. An information-theoretic definition of similarity. In ICML, Vol. 98. Citeseer, 296--304.
[24]
Andrew McCallum and Wei Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 188--191.
[25]
Alvaro E Monge, Charles Elkan, et al. 1996. The Field Matching Problem: Algorithms and Applications. In KDD. 267--270.
[26]
Van Tien Nguyen, Christian Sallaberry, and Mauro Gaio. 2013. Mesure de la similarité entre termes et labels de concepts ontologiques. In CORIA 2013 - Conférence en Recherche d'Infomations et Applications - 10th French Information Retrieval Conference, Neuchâtel, Suisse, April 3-5, 2013., Catherine Berrut (Ed.). UNINE, 415--430.
[27]
NIST. 2005. The ACE 2005 (ACE05) Evaluation Plan.
[28]
A. Le Parc-Lacayrelle, Mauro Gaio, and Christian Sallaberry. 2007. La composante temps dans l'information géographique textuelle. Extraction et recherche d'information dans des fonds documentaires patrimoniaux numérisés. Document Numérique 10, 2 (2007), 129--148.
[29]
Roy Rada, Hafedh Mili, Ellen Bicknell, and Maria Blettner. 1989. Development and application of a metric on semantic nets. IEEE transactions on systems, man, and cybernetics 19, 1 (1989), 17--30.
[30]
Calin Railean and Alexandra Moraru. 2013. Discovering popular events from tweets. In Proceedings of the 16th International Multiconference Information Society, Ljubljana, Slovenia.
[31]
Yves Raimond, Samer A. Abdallah, Mark Sandler, and Frederick Giasson. 2007. The Music Ontology. In Proceedings of the 8th International Conference on Music Information Retrieval. Vienna, Austria, 417--422.
[32]
Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995).
[33]
S. R. Safavian and D. Landgrebe. 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21, 3 (May 1991), 660--674.
[34]
Christian Sallaberry, Mauro Gaio, Damien Palacio, and Julien Lesbegueries. 2008. Fuzzying GIS topological functions for GIR needs. In Proceedings of the 5th ACM Workshop On Geographic Information Retrieval, GIR 2008, Napa Valley, California, USA, October 29-30, 2008, Christopher B. Jones and Ross Purves (Eds.). ACM, 1--8.
[35]
Gerard Salton. 1983. Introduction to modern information retrieval. McGraw-Hill (1983).
[36]
Laurie Serrano, Maroua Bouzid, Thierry Charnois, Stephan Brunessaux, and Bruno Grilhères. 2013. Events Extraction and Aggregation for Open Source Intelligence: From Text to Knowledge. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon, VA, USA, November 4-6, 2013. IEEE Computer Society, 518--523.
[37]
Claude E. Shannon. 1948. A mathematical theory of communication, bell System technical Journal 27: 379--423 and 623--656. Mathematical Reviews (MathSciNet): MR10, 133e (1948).
[38]
Ryan Shaw, Raphaël Troncy, and Lynda Hardman. 2009. LODE: Linking Open Descriptions of Events. ASWC 9 (2009), 153--167.
[39]
Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters 9, 3 (1999), 293--300.
[40]
Willem Robert van Hage, V??ronique Malais??, Roxane H Segers, Laura Hollink, and Guus Schreiber. 2011. Design and use of the Simple Event Model (SEM). Web Semantics: Science, Services and Agents on the World Wide Web 9, 2 (2011).
[41]
D. Walker, I. Newman, D. Medyckyj-Scott, and C. Ruggles. 1992. A system for identifying datasets for gis users. International Journal of Geographical Information Systems 6, 6 (1992), 511--527.
[42]
J. Wang, K.Y. Chen, E. Kayis, G. Gallego, J.L.B. Guerrero, R. Wang, and S.K. Jain. 2013. Tree-based regression. (2013). https://www.google.com/patents/US20130346033 US Patent App. 13/528,972.
[43]
Sujing Wang and Christoph F. Eick. 2014. A Polygon-based Clustering and Analysis Framework for Mining Spatial Datasets. Geoinformatica 18, 3 (July 2014), 569--594.
[44]
William E Winkler. 1999. The state of record linkage and current research problems. In Statistical Research Division, US Census Bureau. Citeseer.
[45]
Krist Wongsuphasawat, Catherine Plaisant, Meirav Taieb-Maimon, and Ben Shneiderman. 2012. Querying event sequences by exact match or similarity search: Design and empirical evaluation. Interacting with computers 24, 2 (2012), 55--68.
[46]
Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138.
[47]
Ha??fa Zargayouna and Sylvie Salotti. 2004. Mesure de similarit?? dans une ontologie pour l'indexation s??mantique de documents XML. In 15??mes Journ??es francophones d'Ing??nierie des Connaissances. Presses universitaires de Grenoble, 249--260.

Cited By

View all
  • (2020)Understanding the spatial dimension of natural language by measuring the spatial semantic similarity of words through a scalable geospatial context windowPLOS ONE10.1371/journal.pone.023634715:7(e0236347)Online publication date: 23-Jul-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
GIR'17: Proceedings of the 11th Workshop on Geographic Information Retrieval
November 2017
64 pages
ISBN:9781450353380
DOI:10.1145/3155902
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Event Extraction
  2. Event Similarity Computation
  3. Information retrieval
  4. Named Entity Similarity Computation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

GIR'17
GIR'17: 11th Workshop on Geographic Information Retrieval
November 30 - December 1, 2017
Heidelberg, Germany

Acceptance Rates

GIR'17 Paper Acceptance Rate 11 of 13 submissions, 85%;
Overall Acceptance Rate 46 of 61 submissions, 75%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Understanding the spatial dimension of natural language by measuring the spatial semantic similarity of words through a scalable geospatial context windowPLOS ONE10.1371/journal.pone.023634715:7(e0236347)Online publication date: 23-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media