poster

Supervised identification and linking of concept mentions to a domain-specific ontology

Authors:

Gabor Melli,

Martin EsterAuthors Info & Claims

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Pages 1717 - 1720

https://doi.org/10.1145/1871437.1871712

Published: 26 October 2010 Publication History

Get Access

Abstract

We propose a pipelined supervised learning approach named SDOI to the task of interlinking the concepts mentioned within a document to the concepts within an ontology. Concept mention identification is performed by training a sequential tagging model. Each identified concept mention is then associated with a set of candidate ontology concepts along with a feature vector based on features proposed in the literature and novel ones based on new data sources, such as from the training corpus itself. An iterative algorithm is defined for handling collective features. We show a lift in performance over applicable baselines against the ability to identify the concept mentions within the 139 KDD-2009 conference paper abstracts, and to link these concept mentions to a domain-specific ontology for the field of data mining. Additional experiments of 22 ICDM-2009 abstracts suggest that the trained models are portable both in terms of accuracy and in their ability to reduce annotation time.

References

[1]

Satanjeev Banerjee, and Ted Pedersen. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Proceedings of CICLing (2002). Lecture Notes In Computer Science; Vol. 2276.

Digital Library

Google Scholar

[2]

Rudi L. Cilibrasi, and Paul M. Vitanyi. (2007). The Google Similarity Distance. In: IEEE Transactions on Knowledge and Data Engineering 19(3).

Digital Library

Google Scholar

[3]

Eugene Charniak. (2000). A Maximum-Entropy-Inspired Parser. In: Proc. of NAACL Conference (NAACL 2000).

Digital Library

Google Scholar

[4]

Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. (2009). Collective Annotation of Wikipedia Entities in Web Text. In: Proc. of ACM SIGKDD Conference (KDD 2009).

Digital Library

Google Scholar

[5]

Andrew McCallum, and Wei Li. (2003). Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In: Proc. of Conference on Natural Language Learning (CoNLL 2003).

Digital Library

Google Scholar

[6]

Gabor Melli. (2010a). "Concept Mentions within KDD-2009 Abstracts (kdd09cma1) Linked to a KDD Ontology (kddo1)." In: Proceedings of LREC 2010.

Google Scholar

[7]

Gabor Melli. (2010b). Supervised Document to Ontology Interlinking. PhD Thesis, Simon Fraser University.

Google Scholar

[8]

Rada Mihalcea, and Andras Csomai. (2007). Wikify!: Linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007).

Digital Library

Google Scholar

[9]

David N. Milne, and Ian H. Witten. (2008). Learning to Link with Wikipedia. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, (CIKM 2008).

Digital Library

Google Scholar

[10]

Roberto Navigli, Paola Velardi, and Aldo Gangemi. (2003). Ontology Learning and Its Application to Automated Terminology Translation. In: IEEE Int. Systems, 18(1).

Digital Library

Google Scholar

[11]

Jennifer Neville, and David Jensen. (2000). Iterative Classification in Relational Data. In: Proceedings of the Workshop on Statistical Relational Learning.

Google Scholar

[12]

Francesco Sclano, and Paola Velardi. (2007). TermExtractor: A web application to learn the common terminology of interest groups and research communities. In: Proc. of the 9th Conference on Terminology and AI (TIA 2007).

Google Scholar

[13]

Fei Sha, and Fernando Pereira. (2003). Shallow Parsing with Conditional Random Fields. In: Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003).

Digital Library

Google Scholar

[14]

Pierre Zweigenbaum, Dina Demner-Fushman, Hong Yu, and Kevin B. Cohen. (2007). Frontiers of Biomedical Text Mining: current progress. In: Briefings in Bioinformatics 2007, 8(5). Oxford Univ Press.

Google Scholar

Cited By

View all

Theocharopoulou GGiannakis K(2012)Web Mining to Create Semantic Content: A Case Study for the EnvironmentArtificial Intelligence Applications and Innovations10.1007/978-3-642-33412-2_42(411-420)Online publication date: 2012
https://doi.org/10.1007/978-3-642-33412-2_42

Index Terms

Supervised identification and linking of concept mentions to a domain-specific ontology
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Populating knowledge base with collective entity mentions: a graph-based approach
ASONAM '14: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Populating a knowledge base with new entity mentions extracted from unstructured text can help enhance its coverage and freshness. It naturally consists of two subtasks, namely, fine-grained entity classification and entity linking. Existing studies ...
Domain Specific Facts Extraction Using Weakly Supervised Active Learning Approach
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01

An ontology is defined using concepts and relationships between the concepts. In this paper, we focus on second problem: relation extraction from plain text. Generic Knowledge Bases like YAGO, Freebase, and DBPedia have made accessible huge collections ...
Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems

In this paper we study whether state-of-the-art techniques for multi-domain and multilingual entity linking can be ported to the clinical domain. To do so, we compare two known entity linking systems, BabelFly and TagMe, that leverage on Wikipedia and ...

Comments

Information & Contributors

Information

Published In

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

October 2010

2036 pages

ISBN:9781450300995

DOI:10.1145/1871437

General Chair:
Jimmy Huang
York University, Canada
,
Program Chairs:
Nick Koudas
University of Toronto, Canada
,
Gareth Jones
Dublin City University, Ireland
,
Xindong Wu
University of Vermont, USA
,
Kevyn Collins-Thompson
Microsoft Research, USA
,
Aijun An
York University, Canada

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '10

Sponsor:

CIKM '10: International Conference on Information and Knowledge Management

October 26 - 30, 2010

ON, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
199
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Theocharopoulou GGiannakis K(2012)Web Mining to Create Semantic Content: A Case Study for the EnvironmentArtificial Intelligence Applications and Innovations10.1007/978-3-642-33412-2_42(411-420)Online publication date: 2012
https://doi.org/10.1007/978-3-642-33412-2_42

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Populating knowledge base with collective entity mentions: a graph-based approach

Domain Specific Facts Extraction Using Weakly Supervised Active Learning Approach

Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations