Towards a Wrapper-Driven Ontology-Based Framework for Knowledge Extraction

Sun, Jigui; Bai, Xi; Li, Zehai; Che, Haiyan; Liu, Huawen

doi:10.1007/978-3-540-76719-0_25

Jigui Sun^1,2,
Xi Bai^1,2,
Zehai Li^1,2,
Haiyan Che^1,2 &
…
Huawen Liu^1,2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4798))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1282 Accesses
2 Citations

Abstract

Since Web resources are formatted in diverse ways for human viewing, the accuracy of extracting information is not satisfactory and, further, it is not convenient for users to query information extracted by traditional techniques. This paper proposes WebKER, a wrapper-driven system for extracting knowledge from Web pages in Chinese based on domain ontologies. Wrappers are first learned through suffix arrays. Based on HowNet, a novel approach is proposed to automatically align the raw data extracted by wrappers. Then knowledge is generated and described with Resource Description Framework (RDF) statements. After merged, knowledge is finally added to the Knowledge Base (KB). A prototype of WebKER is implemented and in the experiments, the performance of our system and the comparison between querying information stored in the KB and querying information extracted with traditional techniques are given, indicating the superiority of our system. In addition, the evaluation of the outstanding wrapper and the method for merging knowledge are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Comprehensive structured knowledge base system construction with natural language presentation

Article Open access 10 June 2019

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

Research Directions Under the Parasol of Ontology Based Semantic Web Structure

References

Pinto, D., McCallum, A., Wei, X., Croft, W.B: Table Extraction Using Conditional Random Fields. In: Proceedings of the SIGIR 2003, pp. 235–242. ACM Press, New York (2003)
Chapter Google Scholar
Cowie, J., Lehnert, W.: Information Extraction. Communications of the ACM 39, 80–91 (1996)
Article Google Scholar
Cohen, W.W., Hurst, M., Jensen, L.S.: A Flexible Learning System for Wrapping Tables and Lists in HTML Documents. In: Proceedings of the WWW 2002, pp. 232–241. ACM Press, New York (2002)
Chapter Google Scholar
Florescu, D., Levy, A.Y., Mendelzon, A.O.: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27, 59–74 (1998)
Article Google Scholar
Soderland, S.: Learning to Extract Text-based Information from the World Wide Web. In: Proceedings of the KDD 1997, pp. 251–254. Springer, Heidelberg (1997)
Google Scholar
McDowell, L.K., Cafarella, M.: Ontology-driven Information Extraction with OntoSyphon. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 428–444. Springer, Heidelberg (2006)
Chapter Google Scholar
Welty, C., Murdock, J.W.: Towards Knowledge Acquisition from Information Extraction. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 709–722. Springer, Heidelberg (2006)
Chapter Google Scholar
Kushmerick, N.: Wrapper Induction for Information Extraction. Technical Report UW-CSE-97-11-04, University of Washington (1997)
Google Scholar
Habegger, B., Quafafou, M.: WetDL: A Web Information Extraction Language. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 128–138. Springer, Heidelberg (2004)
Google Scholar
Zhai, Y.H., Liu, B.: Web Data Extraction Based on Partial Tree Alignment. In: Proceedings of the WWW 2005, pp. 76–85. ACM Press, New York (2005)
Chapter Google Scholar
Pek, E.H., Li, X., Liu, Y.Z.: Web Wrapper Validation. In: Goos, G., Hartmanis, J., Leeuwen, J.V. (eds.) APWeb 2003. LNCS, vol. 2642, pp. 388–393. Springer, Heidelberg (2003)
Chapter Google Scholar
Chidlovskii, B., Ragetli, J., Rijke, M.D.: Wrapper Generation Via Grammar Induction. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 96–108. Springer, Heidelberg (2000)
Chapter Google Scholar
Habegger, B., Debarbieux, D.: Integrating Data from the Web by Machine-Learning Tree-Pattern Queries. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. LNCS, vol. 4275, pp. 941–948. Springer, Heidelberg (2006)
Chapter Google Scholar
Deng, X.B., Zhu, Y.Y.: L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises. Computer Science and Technology 20, 763–773 (2006)
Article MathSciNet Google Scholar
Schindler, C., Arya, P., Rath, A., Slany, W.: HtmlButler–Wrapper Usability Enhancement Through Ontology Sharing and Large Scale Cooperation. Adaptive and Personalized Semantic Web 14, 85–94 (2006)
Article Google Scholar
Lewis, D.D.: Naive Bayes at Forty: the Independence Assumption in Information Retrieval. In: Carbonell, J.G., Siekmann, J. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–5. Springer, Heidelberg (1998)
Google Scholar
Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line Search. SIAM Journal on Computing 22, 935–948 (1993)
Article MATH MathSciNet Google Scholar
HTML Tidy Project, http://www.w3.org/People/Raggett/tidy
Gan, K.W., Wong, P.W.: Annotating Information Structures in Chinese Texts Using HowNet. In: Palmer, M., Marcus, M., Joshi, A., Xia, F. (eds.) Proceedings of the second workshop on Chinese language processing, pp. 85–92 (2000)
Google Scholar
RDF Primer, http://www.w3.org/TR/rdf-primer
Jena Semantic Web Toolkit, http://www.hpl.hp.com/semweb/jena.htm

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun 130012, China
Jigui Sun, Xi Bai, Zehai Li, Haiyan Che & Huawen Liu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun 130012, China
Jigui Sun, Xi Bai, Zehai Li, Haiyan Che & Huawen Liu

Authors

Jigui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xi Bai
View author publications
You can also search for this author in PubMed Google Scholar
Zehai Li
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Che
View author publications
You can also search for this author in PubMed Google Scholar
Huawen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zili Zhang Jörg Siekmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, J., Bai, X., Li, Z., Che, H., Liu, H. (2007). Towards a Wrapper-Driven Ontology-Based Framework for Knowledge Extraction. In: Zhang, Z., Siekmann, J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science(), vol 4798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76719-0_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-76719-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76718-3
Online ISBN: 978-3-540-76719-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards a Wrapper-Driven Ontology-Based Framework for Knowledge Extraction

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Comprehensive structured knowledge base system construction with natural language presentation

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

Research Directions Under the Parasol of Ontology Based Semantic Web Structure

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards a Wrapper-Driven Ontology-Based Framework for Knowledge Extraction

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Comprehensive structured knowledge base system construction with natural language presentation

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

Research Directions Under the Parasol of Ontology Based Semantic Web Structure

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation