Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3289600.3291020acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia Articles

Published: 30 January 2019 Publication History

Abstract

A supervised method relies on simple, lightweight features in order to distinguish Wikipedia articles that are classes (Shield volcano) from other articles (Kilauea). The features are lexical or semantic in nature. Experimental results in multiple languages over multiple evaluation sets demonstrate the superiority of the proposed method over previous work.

References

[1]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. 2009. DBpedia - a Crystallization Point for the Web of Data. Journal of Web Semantics, Vol. 7, 3 (2009), 154--165.
[2]
R. Blanco, G. Ottaviano, and E. Meij. 2015. Fast and Space-Efficient Entity Linking in Queries. In Proceedings of the 8th ACM Conference on Web Search and Data Mining (WSDM-15). Shanghai, China, 179--188.
[3]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 International Conference on Management of Data (SIGMOD-08) . Vancouver, Canada, 1247--1250.
[4]
D. Chen, A. Fisch, J. Weston, and A. Bordes. 2017. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL-17) . Vancouver, Canada, 1870--1879.
[5]
A. Chisholm and B. Hachey. 2015. Entity disambiguation with Web links. Transactions of the Association for Computational Linguistics, Vol. 3 (2015), 145--156.
[6]
P. Downing. 1977. On the Creation and Use of English Compound Nouns. Language, Vol. 53 (1977), 810--842.
[7]
X. Du and C. Cardie. 2018. Harvesting Paragraph-level Question-Answer Pairs from Wikipedia. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-18) . Melbourne, Australia, 1907--1917.
[8]
F. Ensan and E. Bagheri. 2017. Document Retrieval Model Through Semantic Linking. In Proceedings of the 10th ACM Conference on Web Search and Data Mining (WSDM-17). Cambridge, United Kingdom, 181--190.
[9]
P. Ernst, A. Siu, and G. Weikum. 2018. HighLife: Higher-Arity Fact Harvesting. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1013--1022.
[10]
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. 2011. Open Information Extraction: The Second Generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11) . Barcelona, Spain, 3--10.
[11]
A. Fader, S. Soderland, and O. Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP-11) . Edinburgh, Scotland, 1535--1545.
[12]
C. Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database and Some of its Applications .MIT Press.
[13]
T. Flati, D. Vannella, T. Pasini, and R. Navigli. 2014. Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14). Baltimore, Maryland, 945--955.
[14]
O. Ganea, M. Ganea, A. Lucchi, C. Eickhoff, and T. Hofmann. 2016. Probabilistic Bag-Of-Hyperlinks Model for Entity Linking. In Proceedings of the 25th World Wide Web Conference (WWW-16). Montreal, Canada, 927--938.
[15]
O. Ganea and T. Hofmann. 2017. Deep Joint Entity Disambiguation with Local Neural Attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-17) . Copenhagen, Denmark, 2619--2629.
[16]
A. Gupta, R. Lebret, H. Harkous, and K. Aberer. 2018. 280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification. In Proceedings of the 32nd National Conference on Artificial Intelligence (AAAI-18). New Orleans, Louisiana, 4824--4831.
[17]
J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: a Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence Journal. Special Issue on Artificial Intelligence, Wikipedia and Semi-Structured Resources, Vol. 194 (2013), 28--61.
[18]
J. Hu, G. Wang, F. Lochovsky, J. Sun, and Z. Chen. 2009. Understanding User's Query Intent with Wikipedia. In Proceedings of the 18th World Wide Web Conference (WWW-09). Madrid, Spain, 471--480.
[19]
A. Konovalov, B. Strauss, A. Ritter, and B. O'Connor. 2017. Learning to Extract Events from Knowledge Base Revisions. In Proceedings of the 26th World Wide Web Conference (WWW-17). Perth, Australia, 1007--1014.
[20]
J. Langford, A. Strehl, and L. Li. 2007. Vowpal Wabbit. http://hunch.net/ vw.
[21]
D. Lenat. 1995. CYC: a Large-Scale Investment in Knowledge Infrastructure. Commun. ACM, Vol. 38, 11 (1995), 32--38.
[22]
D. Ma, Y. Chen, K. Chang, and X. Du. 2018. Leveraging Fine-Grained Wikipedia Categories for Entity Search. In Proceedings of the 2018 Web Conference (WWW-18). Lyon, France, 1623--1632.
[23]
Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. 2012. Open Language Learning for Information Extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-12). Jeju Island, Korea, 523--534.
[24]
R. Mihalcea. 2007. Using Wikipedia for Automatic Word Sense Disambiguation. In Proceedings of the 2007 Conference of the North American Association for Computational Linguistics (NAACL-HLT-07). Rochester, New York, 196--203.
[25]
V. Nastase and M. Strube. 2008. Decoding Wikipedia Categories for Knowledge Acquisition. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI-08). Chicago, Illinois, 1219--1224.
[26]
V. Nastase and M. Strube. 2013. Transforming Wikipedia into a Large Scale Multilingual Concept Network. Artificial Intelligence, Vol. 194 (2013), 62--85.
[27]
M. Pacsca. 2018. Finding Needles in an Encyclopedic Haystack: Detecting Classes Among Wikipedia Articles. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1267--1276.
[28]
M. Pacsca and H. Buisman. 2015. Dissecting German Grammar and Swiss Passports: Open-Domain Decomposition of Compositional Entries in Large-Scale Knowledge Repositories. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI-15) . Buenos Aires, Argentina, 896--902.
[29]
X. Pan, T. Cassidy, U. Hermjakob, H. Ji, and K. Knight. 2015. Unsupervised Entity Linking with Abstract Meaning Representation. In Proceedings of the 2015 Conference of the North American Association for Computational Linguistics (NAACL-HLT-15). Denver, Colorado, 1130--1139.
[30]
T. Piccardi, M. Catasta, L. Zia, and R. West. 2018. Structuring Wikipedia Articles with Section Recommendations. In Proceedings of the 41st International Conference on Research and Development in Information Retrieval (SIGIR-18). Ann Arbor, Michigan, 665--674.
[31]
S. Ponzetto and R. Navigli. 2009. Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09) . Pasadena, California, 2083--2088.
[32]
S. Ponzetto and M. Strube. 2007. Deriving a Large Scale Taxonomy from Wikipedia. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI-07). Vancouver, British Columbia, 1440--1447.
[33]
M. Qu, X. Ren, Y. Zhang, and J. Han. 2018. Weakly-Supervised Relation Extraction by Pattern-Enhanced Embedding Learning. In Proceedings of the 2018 Web Conference (WWW-18) . Lyon, France, 1257--1266.
[34]
L. Ratinov and D. Roth. 2012. Learning-Based Multi-Sieve Co-Reference Resolution with Knowledge. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-12) . Jeju Island, Korea, 1234--1244.
[35]
L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-11) . Portland, Oregon, 1375--1384.
[36]
M. Remy. 2002. Wikipedia: The Free Encyclopedia. Online Information Review, Vol. 26, 6 (2002), 434.
[37]
Z. Bouraoui S. Jameel and S. Schockaert. 2017. MEmbER: Max-Margin Based Embeddings. In Proceedings of the 40th International Conference on Research and Development in Information Retrieval (SIGIR-17) . Tokyo, Japan, 783--792.
[38]
U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita. 2012. Topical Clustering of Search Results. In Proceedings of the 5th ACM Conference on Web Search and Data Mining (WSDM-12). Seattle, Washington, 223--232.
[39]
A. Singhal. 2012. Introducing the Knowledge Graph: Things, not Strings. Corporate blog.
[40]
M. Sun, X. Li, X. Wang, M. Fan, Y. Feng, and P. Li. 2018. Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction. In Proceedings of the 11th ACM Conference on Web Search and Data Mining (WSDM-18) . Marina del Rey, California, 556--564.
[41]
C. Tan, F. Wei, P. Ren, W. Lv, and M. Zhou. 2017. Entity Linking for Queries by Searching Wikipedia Sentences. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP-17) . Copenhagen, Denmark, 68--77.
[42]
D. Tsurel, D. Pelleg, I. Guy, and D. Shahaf. 2017. Fun Facts: Automatic Trivia Fact Extraction from Wikipedia. In Proceedings of the 10th ACM Conference on Web Search and Data Mining (WSDM-17) . Cambridge, United Kingdom, 345--354.
[43]
D. Vrandeucić and M. Krötzsch. 2014. Wikidata: A Free Collaborative Knowledge Base. Commun. ACM, Vol. 57 (2014), 78--85.
[44]
Z. Wang, Z. Li, J. Li, J. Tang, and J. Pan. 2013. Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-13). Sofia, Bulgaria, 641--650.
[45]
F. Wu and D. Weld. 2010. Open Information Extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10) . Uppsala, Sweden, 118--127.
[46]
W. Wu, H. Li, H. Wang, and K. Zhu. 2012. Probase: a Probabilistic Taxonomy for Text Understanding. In Proceedings of the 2012 International Conference on Management of Data (SIGMOD-12) . Scottsdale, Arizona, 481--492.
[47]
Y. Yan, N. Okazaki, Y. Matsuo, Z. Yang, and M. Ishizuka. 2009. Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-IJCNLP-09) . Singapore, 1021--1029.
[48]
X. Yao and B. Van Durme. 2014. Information Extraction over Structured Data: Question Answering with Freebase. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-14) . Baltimore, Maryland, 956--966.
[49]
S. Zhang and K. Balog. 2018. Ad Hoc Table Retrieval Using Semantic Similarity. In Proceedings of the 2018 Web Conference (WWW-18). Lyon, France, 1553--1562.
[50]
C. Zirn, V. Nastase, and M. Strube. 2008. Distinguishing Between Instances and Classes in the Wikipedia Taxonomy. In Proceedings of the 5th European Semantic Web Conference (ESWC-08). Tenerife, Spain, 376--387.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
January 2019
874 pages
ISBN:9781450359405
DOI:10.1145/3289600
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classes
  2. knowledge acquisition
  3. open-domain information extraction
  4. semantics
  5. topic classification

Qualifiers

  • Research-article

Conference

WSDM '19

Acceptance Rates

WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 311
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media