Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2433396.2433441acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Wiki3C: exploiting wikipedia for context-aware concept categorization

Published: 04 February 2013 Publication History

Abstract

Wikipedia is an important human generated knowledge base containing over 21 million articles organized by millions of categories. In this paper, we exploit Wikipedia for a new task of text mining: Context-aware Concept Categorization. In the task, we focus on categorizing concepts according to their context. We exploit article link feature and category structure in Wikipedia, followed by introducing Wiki3C, an unsupervised and domain independent concept categorization approach based on context. In the approach, we investigate two strategies to select and filter Wikipedia articles for the category representation. Besides, a probabilistic model is employed to compute the semantic relatedness between two concepts in Wikipedia. Experimental evaluation using manually labeled ground truth shows that our proposed Wiki3C can achieve a noticeable improvement over the baselines without considering contextual information.

References

[1]
X.-H. Phan, et al. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th international conference on World Wide Web, Beijing, China, pages 91--100, 2008.
[2]
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In proceedings of the 21st national conference on Artificial intelligence - Volume 2, Boston, Massachusetts, pages 1301--1306, 2006.
[3]
X. Hu, et al. Exploiting Wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, pages 389--396, 2009.
[4]
D. Carmel, et al. Enhancing cluster labeling using wikipedia. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA, pages 139--146, 2009.
[5]
S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pages 708--716, 2007.
[6]
R. Bunescu and M. Pasc. Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, pages 2006.
[7]
J. Pehcevski, et al. Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction. Inf. Retr., 13(5):568--600, 2010.
[8]
Y. Li, et al. Improving weak ad-hoc queries using wikipedia as external corpus. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands, pages 797--798, 2007.
[9]
G. Katz, et al. Using Wikipedia to boost collaborative filtering techniques. In Proceedings of the fifth ACM conference on Recommender systems, Chicago, Illinois, USA, pages 285--288, 2011.
[10]
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on Artifical intelligence, Hyderabad, India, pages 1606--1611, 2007.
[11]
S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from Wikipedia. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, Vancouver, British Columbia, Canada, pages 1440--1445, 2007.
[12]
D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management, Napa Valley, California, USA, pages 509--518, 2008.
[13]
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM conference on Conference on information and knowledge management, Lisbon, Portugal, pages 233--242, 2007.
[14]
F. Wu and D. S. Weld. Open Information Extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010.
[15]
E. Hovy, et al. Toward completeness in concept extraction and classification. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2, Singapore, pages 948--957, 2009.
[16]
J. Kamps and M. Koolen. Is Wikipedia link structure different? In Proceedings of the Second ACM International Conference on Web Search and Data Mining, Barcelona, Spain, pages 232--241, 2009.
[17]
B. Chris and M. V. Ellen. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Sheffield, United Kingdom, pages 25--32, 2004.
[18]
M. Völkel, et al. Semantic Wikipedia. In Proceedings of the 15th international conference on World Wide Web, Edinburgh, Scotland, pages 585--594, 2006.
[19]
M. Strube and S. P. Ponzetto. WikiRelate! computing semantic relatedness using wikipedia. In proceedings of the 21st national conference on Artificial intelligence - Volume 2, Boston, Massachusetts, pages 1419--1424, 2006.
[20]
S. Cucerzan. Large Scale Named Entity Disambiguation Based on Wikipedia Data. In The EMNLP-CoNLL Joint Conference, Prague, 2007.
[21]
S. Banerjee, et al. Clustering short texts using wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands, pages 787--788, 2007.
[22]
D. N. Milne, et al. A knowledge-based search engine powered by wikipedia. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, pages 445--454, 2007.
[23]
D. P. T. Nguyen, et al. Relation extraction from wikipedia using subtree mining. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, Vancouver, British Columbia, Canada, pages 1414--1420, 2007.
[24]
K. Balog, et al. Entity search: building bridges between two worlds. In Proceedings of the 3rd International Semantic Search Workshop, Raleigh, North Carolina, pages 1--5, 2010.
[25]
Y. Yan, et al. Unsupervised relation extraction by mining Wikipedia texts using information from the web. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, Suntec, Singapore, pages 1021--1029, 2009.
[26]
G. Demartini, et al. Overview of the INEX 2009 entity ranking track. In INEX 2009, pages 256--264, 2009.
[27]
P. V. Arjen, et al. Overview of the INEX 2007 Entity Ranking Track. In Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, pages 245--251, 2008.
[28]
B. Krisztian, et al. Overview of the TREC 2009 Entity Track. In Proceedings of TREC-2009, Gaithersburg, USA, 2009.
[29]
D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, pages 25--30, 2008.

Cited By

View all
  • (2019)Building concept maps by adapting semantic distance metrics to WikipediaEducation for Information10.3233/EFI-19027935:3(209-240)Online publication date: 2-Aug-2019
  • (2015)Dynamic Facet Hierarchy Constructing for Browsing Web Search Results EfficientlyProceedings of the 28th International Conference on Current Approaches in Applied Artificial Intelligence - Volume 910110.1007/978-3-319-19066-2_29(293-304)Online publication date: 10-Jun-2015
  • (2013)Entity extraction, linking, classification, and tagging for social mediaProceedings of the VLDB Endowment10.14778/2536222.25362376:11(1126-1137)Online publication date: 1-Aug-2013
  • Show More Cited By

Index Terms

  1. Wiki3C: exploiting wikipedia for context-aware concept categorization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '13: Proceedings of the sixth ACM international conference on Web search and data mining
    February 2013
    816 pages
    ISBN:9781450318693
    DOI:10.1145/2433396
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. context-aware concept categorization
    2. text mining
    3. wikipedia

    Qualifiers

    • Research-article

    Conference

    WSDM 2013

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Building concept maps by adapting semantic distance metrics to WikipediaEducation for Information10.3233/EFI-19027935:3(209-240)Online publication date: 2-Aug-2019
    • (2015)Dynamic Facet Hierarchy Constructing for Browsing Web Search Results EfficientlyProceedings of the 28th International Conference on Current Approaches in Applied Artificial Intelligence - Volume 910110.1007/978-3-319-19066-2_29(293-304)Online publication date: 10-Jun-2015
    • (2013)Entity extraction, linking, classification, and tagging for social mediaProceedings of the VLDB Endowment10.14778/2536222.25362376:11(1126-1137)Online publication date: 1-Aug-2013
    • (2013)A framework for benchmarking entity-annotation systemsProceedings of the 22nd international conference on World Wide Web10.1145/2488388.2488411(249-260)Online publication date: 13-May-2013
    • (2013)Extracting Fine-Grained Entities Based on Coordinate GraphNatural Language Processing and Information Systems10.1007/978-3-642-38824-8_40(367-371)Online publication date: 2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media