Abstract
The World Wide Web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The Web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different Web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to Web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic Web is illustrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adelberg, B.: NoDoSE - a tool for semi-automatically extracting semi-structured data from text documents. In: Proc. of SIGMOD (1998)
P. Atzeni and G. Mecca. Cut and paste. In Proc. of PODS (1997)
Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: Proc. of VLDB (2001)
Baumgartner, R., Henze, N., Herzog, M.: The personal publication reader: Illustrating web data extraction, personalization and reasoning. In: Proc. of ESWC (2005)
Baumgartner, R., Herzog, M., Gottlob, G.: Visual programming of web data aggregation applications. In: Proc. of IIWeb 2003 (2003)
Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet White Paper, http://www.brightplanet.com/technology/deepweb.asp
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001)
Cabeza, D., Hermenegildo, M.: Distributed WWW programming using (Ciao- )Prolog and the PiLLoW library. TPLPÂ 1(3) (2001)
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)
Davulcu, H., Yang, G., Kifer, M., Ramakrishnan, I.: Computat. aspects of resilient data extract. from semistr. sources. In: Proc. of PODS (2000)
Dolog, P., Henze, N., Nejdl, W., Sintek, M.: The Personal Reader: Personalizing and Enriching Learning Resources using Semantic Web Technologies. In: Proccedings of the 3nd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2004), Eindhoven, The Netherlands (2004)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in Know It All (Preliminary Results). In: Proceedings of the World Wide Web Conference (2004)
Flesca, S., Manco, G., Masciari, E., Rende, E., Tagarelli, A.: Web wrapper induction: a brief survey. AI Communications 17(2) (2004)
Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for Web Information Extraction. In: Proc. of PODS (2002)
Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting semistructured information from the web. In: Proc. Workshop on Mang. of Semistructured Data (1997)
Hsu, C.-N., Dung, M.: Generating finite-state transducers for semistructured data extraction from the web. Information Systems 23(8) (1998)
Huck, G., Fankhauser, P., Aberer, K., Neuhold, E.: JEDI: Extracting and synthesizing information from the web. In: Proc. of COOPIS (1998)
Kahaner, L.: Competitive Intelligence: How to Gather, Analyse Information to Move your Business to the Top. Touchstone Press (1998)
Kapowtech. RoboSuite (2003), Published on http://www.kapowtech.com
Kuhlins, S., Tredwell, R.: Toolkits for generating wrappers. Net.Object Days (2002)
Kushmerick, N.: Wrapper verification. World Wide Web Journal (2000)
Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. of IJCAI (1997)
Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. Sigmod Record 31(2) (2002)
Liu, L., Pu, C., Han, W.: XWrap: An extensible wrapper construction system for internet information. In: Proc. of ICDE (2000)
Liu, Z., Li, F., Ng, W.K.: Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Proceedings of the 21st International Conference on Conceptual Modelling (ER 2002), October 7-11, Tempere, Finland (2002)
May, W., Himmeröder, R., Lausen, G., Ludäscher, B.: A unified framework for wrapping, mediating and restructuring information from the web. In: ER Workshops 1999. LNCS, vol. 1727. Springer, Heidelberg (1999)
Meng, X., Wang, H., Li, C., Kou, H.: A schema-guided toolkit for generating wrappers. In: Proc. of WEBSA 2003 (2003)
Miller, R.C., Myers, B.A.: LAPIS: Smart Editing with Text Structure. In: Proceedings of the CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, Minnesota, USA, Apr. 2002, pp. 496–497. ACM Press, New York (2002)
Muslea, I.: RISE: Repository of Online Information Sources Used in Information Extraction Tasks (1998), Published on http://www.isi.edu/info-agents/RISE/
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. of 3rd Intern. Conf. on Autonomous Agents (1999)
Raposo, J., Pan, A., Alvarez, M., Hidalgo, J., Vina, A.: The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes. In: Proceedings of DEXA 2002, Aix-en-Provence, France (2002)
Ribeiro-Neto, B., Laender, A.H.F., da Silva, A.S.: Extracting semi-structured data through examples. In: Proc. of CIKM (1999)
Sahuguet, A., Azavant, F.: Building light-weight wrappers for legacy web datasources using W4F. In: Proc. of VLDB (1999)
Thomas, B.: Anti-unification based learning of T-wrappers for information extraction. In: Workshop on Machine Learning for IE (1999)
Tiemeyer, E., Zsifkovitis, H.E.: Information als Führungsmittel: Executive Information Systems, 1st edn. Konzeption, Technologie, Produkte, Einführung (1995)
Tredwell, R., Kuhlins, S.: Wrapper Generating Tools (2003), Published on http://www.wifo.uni-mannheim.de/_kuhlins/wrappertools/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Baumgartner, R., Eiter, T., Gottlob, G., Herzog, M., Koch, C. (2005). Information Extraction for the Semantic Web. In: Eisinger, N., Małuszyński, J. (eds) Reasoning Web. Lecture Notes in Computer Science, vol 3564. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11526988_8
Download citation
DOI: https://doi.org/10.1007/11526988_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27828-3
Online ISBN: 978-3-540-31675-6
eBook Packages: Computer ScienceComputer Science (R0)