Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Semantic annotation, indexing, and retrieval

Published: 01 December 2004 Publication History

Abstract

The Semantic Web realization depends on the availability of a critical mass of metadata for the web content, associated with the respective formal knowledge about the world. We claim that the Semantic Web, at its current stage of development, is in a state of a critical need of metadata generation and usage schemata that are specific, well-defined and easy to understand. This paper introduces our vision for a holistic architecture for semantic annotation, indexing, and retrieval of documents with regard to extensive semantic repositories. A system (called KIM), implementing this concept, is presented in brief and it is used for the purposes of evaluation and demonstration. A particular schema for semantic annotation with respect to real-world entities is proposed. The underlying philosophy is that a practical semantic annotation is impossible without some particular knowledge modelling commitments. Our understanding is that a system for such semantic annotation should be based upon a simple model of real-world entity classes, complemented with extensive instance knowledge. To ensure the efficiency, ease of sharing, and reusability of the metadata, we introduce an upper-level ontology (of about 250 classes and 100 properties), which starts with some basic philosophical distinctions and then goes down to the most common entity types (people, companies, cities, etc.). Thus it encodes many of the domain-independent commonsense concepts and allows straightforward domain-specific extensions. On the basis of the ontology, a large-scale knowledge base of entity descriptions is bootstrapped, and further extended and maintained. Currently, the knowledge bases usually scales between 10^5 and 10^6 descriptions. Finally, this paper presents a semantically enhanced information extraction system, which provides automatic semantic annotation with references to classes in the ontology and to instances. The system has been running over a continuously growing document collection (currently about 0.5 million news articles), so it has been under constant testing and evaluation for some time now. On the basis of these semantic annotations, we perform semantic based indexing and retrieval where users can mix traditional information retrieval (IR) queries and ontology-based ones. We argue that such large-scale, fully automatic methods are essential for the transformation of the current largely textual web into a Semantic Web.

References

[1]
K. Bontcheva, A. Kiryakov, H. Cunningham, B. Popov, M. Dimitrov, Semantic web enabled, open source language technology, in: Proceedings of EACL Workshop "Language Technology and the Semantic Web", NLPXML-2003, 13 April 2003.
[2]
D. Brickley, R.V. Guha (Eds.), Resource Description Framework (RDF) Schemas, W3C http://www.w3.org/TR/2000/CR-rdf-schema-20000327/.
[3]
L. Carr, S. Bechhofer, C. Goble, W. Hall, Conceptual linking: ontology-based open hypermedia, in: The WWW10 Conference, Hong Kong, May, pp. 334-342.
[4]
H. Cunningham, Information Extraction: a User Guide (revised version). Department of Computer Science, University of Sheffield, May 1999.
[5]
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, GATE: a framework and graphical development environment for robust NLP tools and applications, in: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
[6]
Collier, N., Takeuchi, K. and Kawazoe, A., Open ontology forge: an environment for text mining in a semantic web world. In: Proceedings of the International Workshop on Semantic Web Foundations and Application Technologies,
[7]
M. Dean, D. Connolly, F. van Harmelen, J. Hendler, I. Horrocks, D. McGuinness, P. Patel-Schneider, L.A. Stein, Web Ontology Language (OWL) Reference Version 1.0, W3C Working Draft 12 November 2002. http://www.w3.org/TR/2002/WD-owl-ref-20021112/.
[8]
Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R. and Robbins, D., Stuff I've seen: a system for personal information retrieval and re-use. In: Proceedings of SIGIR'03, ACM Press.
[9]
D. Fensel, Ontology Language, v.2 (Welcome to OIL), Deliverable 2, On-To-Knowledge project, December 2001. http://www.ontoknowledge.org/downl/del2.pdf.
[10]
S. Handschuh, St. Staab, F. Ciravegna, S-CREAM - semi-automatic creation of metadata, in: A. Gomez-Perez (Ed.), The 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Springer Verlag, 2002.
[11]
J. Kahan, M. Koivunen, E. Prud'Hommeaux, R. Swick, Annotea: an open RDF infrastructure for shared web annotations, in: The WWW10 Conference, Hong Kong, May, pp. 623-632.
[12]
Kampman, A., Harmelen, F. and Broekstra, J., Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Proceedings of ISWC2002,
[13]
A. Kiryakov, K.Iv. Simov, D. Ognyanov, Ontology Middleware: Analysis and Design Del. 38, On-To-Knowledge, March 2002. http://www.ontoknowledge.org/downl/del38.pdf.
[14]
Kiryakov, A. and Simov, K.Iv., Ontologically supported semantic matching. In: Proceedings of "NODALIDA'99: Nordic Conference on Comp. Linguistics",
[15]
Landauer, T. and Dumais, S., A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. v104 i2.
[16]
Maedche, A., Motik, B., Stojanovic, L., Studer, R. and Volz, R., Ontologies for enterprise knowledge management. IEEE Intell. Syst. v18 i2. 26-33.
[17]
K. Mahesh, J. Kud, P. Dixon, Oracle at TREC8: a lexical approach, in: Proceedings of the Eighth Text Retrieval Conference (TREC-8), 1999.
[18]
D. Manov, A. Kiryakov, B. Popov, K. Bontcheva, D. Maynard, H. Cunningham, Experiments with geographic knowledge for information extraction, NAACL-HLT 2003, Canada, Workshop on the Analysis of Geographic References, Edmonton, Alberta, 31 May 2003.
[19]
D. Maynard, V. Tablan, K. Bontcheva, H. Cunningham, Y. Wilks, Multi-source entity recognition - an information extraction system for diverse text types, Technical report CS-02-03, University of Sheffield, Department of CS, 2003. http://www.gate.ac.uk/gate/doc/papers.html.
[20]
Moldovan, D. and Mihalcea, R., Document indexing using named entities. Stud. Informatics Control. v10 i1.
[21]
Noy, N., Musen, M. and Ontology, Versioning as an element of an ontology-management framework. IEEE Intell. Syst.
[22]
Pustejovsky, J., Boguraev, B., Verhagen, M., Buitelaar, P. and Johnston, M., Semantic indexing and typed hyperlinking. In: Proceedings of the AAAI Conference, Spring Symposium, NLP for WWW, pp. 120-128.
[23]
van Ossenbruggen, J., Hardman, L. and Rutledge, L., Hypermedia and the semantic web: a research agenda. J. Digital Information. v3 i1.
[24]
M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, F. Ciravegna, MnM: ontology driven semi-automatic and automatic support for semantic markup, in: A. Gomez-Perez (Ed.), Proceedings of EKAW 2002, Springer Verlag, 2002.
[25]
Voorhees, E., Using word net for text retrieval. In: Fellbaum, C. (Ed.), WordNet: An Electronic Lexical Database, MIT Press.
[26]
N. Chinchor, P. Robinson, MUC-7 Named Entity Task Definition (version 3.5), in: Proceedings of the MUC-7, 1998, in press.
[27]
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A. and Zien, J.Y., SemTag and Seeker: bootstrapping the semantic web via automated semantic annotation. In: Proceedings of the 12th International Conference on World Wide Web (WWW'03),
[28]
R. Guha, R. McCool, TAP: Towards a Web of data. http://tap.stanford.edu.
[29]
F. Ciravegna, A. Dingli, D. Petrelli, Y. Wilks, User-system cooperation in document annotation based on information extraction, in: Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), Springer, 2002.
[30]
P. Kogut, W. Holmes, AeroDAML: Applying information extraction to generate DAML annotations from web pages, in: First International Conference on Knowledge Capture (K-CAP'01), 2001.
[31]
Kushmerick, N., Wrapper induction: efficiency and expressiveness. Artif. Intell. J. v118 i1-2. 15-68.
[32]
Gruber, T.R., Toward principles for the design of ontologies used for knowledge sharing. In: Guarino, N., Poli, R. (Eds.), International Workshop on Formal Ontology,
[33]
L. Peikoff, The ominous parallels, Plume Books, 1997. Also at http://www.aynrand.org/objectivism/
[34]
K. Mahesh, S. Nirenburg, J. Cowie, D. Farwell, An assessment of Cyc for natural language processing, MCCS Report, New Mexico State University, 1996.
[35]
J. Davis, et al., QuizRDF: search technology for the semantic web, in: J. Davies, D. Fensel, F. van Harmelen (Eds.), "Towards the Semantic Web: Onto logy-Driven Knowledge Management", John Wiley & Sons, Europe, 2002.
[36]
B. Popov, A. Kiryakov, D. Ognyanoff, D. Manov, A. Kirilov, M. Goranov, Towards Semantic Web Information Extraction, Human Language Technologies Workshop at the 2nd International Semantic Web Conference (ISWC2003), Florida, USA, 20 October 2003.

Cited By

View all
  • (2023)A Decentralized Architecture for Semantic Interoperability of Personal Dental Data Based on FHIRInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.33363319:1(1-16)Online publication date: 14-Nov-2023
  • (2022)Semantic Knowledge Graphs for the News: A ReviewACM Computing Surveys10.1145/354350855:7(1-38)Online publication date: 15-Dec-2022
  • (2022)Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19Companion Proceedings of the Web Conference 202210.1145/3487553.3524675(740-750)Online publication date: 25-Apr-2022
  • Show More Cited By
  1. Semantic annotation, indexing, and retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 December 2004

    Author Tags

    1. Information retrieval
    2. Semantic annotation
    3. Semantic metadata

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Decentralized Architecture for Semantic Interoperability of Personal Dental Data Based on FHIRInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.33363319:1(1-16)Online publication date: 14-Nov-2023
    • (2022)Semantic Knowledge Graphs for the News: A ReviewACM Computing Surveys10.1145/354350855:7(1-38)Online publication date: 15-Dec-2022
    • (2022)Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19Companion Proceedings of the Web Conference 202210.1145/3487553.3524675(740-750)Online publication date: 25-Apr-2022
    • (2021)The impact of semantic annotation techniques on content-based video lecture recommendationJournal of Information Science10.1177/016555152093173247:6(740-752)Online publication date: 1-Dec-2021
    • (2021)Marketing Communications and the Semantic Web: Theoretical Intersections and Practical ImplicationsCompanion Proceedings of the Web Conference 202110.1145/3442442.3453704(717-718)Online publication date: 19-Apr-2021
    • (2019)A linked open data framework to enhance the discoverability and impact of culture heritageJournal of Information Science10.1177/016555151881265845:6(756-766)Online publication date: 1-Dec-2019
    • (2019)Unsupervised Approaches for Textual Semantic Annotation, A SurveyACM Computing Surveys10.1145/332447352:4(1-45)Online publication date: 30-Aug-2019
    • (2019)Hybrid molecule-based information retrievalProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297358(808-815)Online publication date: 8-Apr-2019
    • (2018)GeoHbbTVMultimedia Tools and Applications10.5555/3287850.328790077:21(28023-28048)Online publication date: 1-Nov-2018
    • (2018)Semantic Web Services DiscoveryInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.201810010314:4(57-72)Online publication date: 1-Oct-2018
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media