Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

An Ontology-Driven Annotation of Data Tables

  • Conference paper
Web Information Systems Engineering – WISE 2007 Workshops (WISE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4832))

Included in the following conference series:

Abstract

This paper deals with the integration of data extracted from the web into an existing data warehouse indexed by a domain ontology. We are specially interested in data tables extracted from scientific publications found on the web. We propose a way to annotate data tables from the web according to a given domain ontology. In this paper we present the different steps of our annotation process. The columns of a web data table are first segregated according to whether they represent numeric or symbolic data. Then, we annotate the numeric (resp.symbolic) columns with their corresponding numeric (resp. symbolic) type found in the ontology. Our approach combines different evidences from the column contents and from the column title to find the best corresponding type in the ontology. The relations represented by the web data table are recognized using both the table title and the types of the columns that were previously annotated. We give experimental results of our annotation process, our application domain being food microbiology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Buche, P., Dervin, C., Haemmerlé, O., Thomopoulos, R.: Fuzzy querying of incomplete, imprecise, and heterogeneously structured data in the relational model using ontologies and rules. IEEE T. Fuzzy Systems 13(3), 373–383 (2005)

    Article  Google Scholar 

  2. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal on Document Analysis and Recognition 7, 1–16 (2004)

    Google Scholar 

  3. Pivk, A., Cimiano, P., Sure, Y.: From tables to frames. In: Third International Semantic Web Conference, pp. 116–181 (2004)

    Google Scholar 

  4. Tenier, S., Toussaint, Y., Napoli, A., Polanco, X.: Instantiation of relations for semantic annotation. In: International Conference on Web Intelligence, pp. 463–472 (2006)

    Google Scholar 

  5. Embley, D.W., Tao, C., Liddle, S.W.: Automatically extracting ontologically specified data from html tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: 17th National Conference on Artificial Intelligence, pp. 577–583 (2000)

    Google Scholar 

  7. Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: International Conference on Very Large Data Bases, pp. 119–128 (2001)

    Google Scholar 

  8. Gagliardi, H., Haemmerlé, O., Pernelle, N., Saïs, F.: An automatic ontology-based approach to enrich tables semantically. In: AAAI Context and Ontologies Workshop (2005)

    Google Scholar 

  9. Lin, D.: An information-theoretic definition of similarity. In: International Conference on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  10. Hignette, G., Buche, P., Dervin, C., Dibie-Barthélemy, J., Haemmerlé, O., Soler, L.: Fuzzy semantic approach for data integration applied to risk in food: an example about the cold chain. In: Proceedings of the 13th World Congress of Food Science and Technology, Food is Life (2006)

    Google Scholar 

  11. Van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Dept. of Computer Science, University of Glasgow (1979)

    Google Scholar 

  12. Yangarber, R., Lin, W., Grishman, R.: Unsupervised learning of generalized names. In: International Conference on Computational Linguistics, pp. 1–7 (2002)

    Google Scholar 

  13. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  14. Zadeh, L.: Fuzzy sets. Information and control 8, 338–353 (1965)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mathias Weske Mohand-Saïd Hacid Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O. (2007). An Ontology-Driven Annotation of Data Tables. In: Weske, M., Hacid, MS., Godart, C. (eds) Web Information Systems Engineering – WISE 2007 Workshops. WISE 2007. Lecture Notes in Computer Science, vol 4832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77010-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77010-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77009-1

  • Online ISBN: 978-3-540-77010-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics