Abstract
Web pages from a web site can often be associated with concepts in an ontology, and pairs of web pages can also be associated with relationships between concepts. With such associations, web pages can be searched, browsed or even reorganized based on their concept and relationship labels. In this paper, we investigate the problem of extracting link information of relationship instances from a web site. We define the notion of link chain and formulate the link chain extraction problem. An extraction method based on sequential covering has been proposed to solve the problem. This paper presents the proposed method and the experiments to evaluate its performance. We have applied the method to extract link chain information from the Yahoo! Movie Web Site with very promising results.
This work is partially supported by the SingAREN21 research grant M48020004.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, T.B., Hendler, J., Lassila, O.: The Semantic Web (May 2001), URL: http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html
Baluja, S., Mittal, V., Sukthankar, R.: Applying machine learning for high performance named-entity extraction. In: Computational Intelligence, November 16 (2000)
Brin, S.: Extracting patterns and relations from the world wide web. In: WebDB Workshop at 6th International Conference on Extending Database Technology (1998)
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T.M., Nigam, K., Slattery, S.: Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence 118(1-2), 69–113 (2000)
Eikvil, L.: Information Extraction from World Wide Web-A Survey. Technical Report 945, Norweigan Computing Center (1999)
Ciravegna, F.: Adaptive Information Extraction from Text by Rule Induction and Generalisation. In: Prodeedings of the 17th International Conference on Artificial Intelligence, Seattle, USA (August 2001)
Freitag, D.: Information extraction from HTML: Application of a general machine learning approach. In: Proc. of the 15th Conf. on Artificial Intelligence (AAAI 1998), pp. 517–523 (1998)
Habegger, B.: Multi-pattern wrappers for relation extraction from the Web. In: Proceedings of the Europeen Conference on Artificial Intelligence (2002)
Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)
Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Muslea, I., Minton, S., Knoblock, C.A.: Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems 4(1/2), 93–114 (2001)
Naing, M.M., Lim, E.-P., Goh, D.H.-L.: Ontology-based Web Annotation Framework for HyperLink Structures. In: Proceedings of the International Workshop on Data Semantics in Web Information Systems, Singapore (December 2002)
Soderland, S.: Learning Information Extraction Rules for Semi-structured and Free Text. Journal of Machine Learning 34(1-3), 233–272 (1999)
Sundaresan, N., Yi, J.: Mining the Web for Relations. In: Proceedings of the WWW9 Conference, pp. 699–711 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Naing, MM., Lim, EP., Goh, D.HL. (2003). On Extracting Link Information of Relationship Instances from a Web Site. In: Jeckle, M., Zhang, LJ. (eds) Web Services - ICWS-Europe 2003. ICWS-Europe 2003. Lecture Notes in Computer Science, vol 2853. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39872-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-39872-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20125-0
Online ISBN: 978-3-540-39872-1
eBook Packages: Springer Book Archive