Abstract
The Web has established itself as the largest public data repository ever available. Even though the vast majority of information on the Web is formatted to be easily readable by the human eye, “meaningful information” is still largely inaccessible for the computer applications. In this paper, we present automated algorithms to gather meta-data and instance information by utilizing global regularities on the Web and incorporating the contextual information. Our system is distinguished since it does not require domain specific engineering. Experimental evaluations were successfully performed on the TAP knowledge base and the faculty-course home pages of computer science departments containing 16,861 Web pages.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)
Davulcu, H., Vadrevu, S., Nagarajan, S., Ramakrishnan, I.V.: Ontominer: Bootstrapping and populating ontologies from domain specific web sites. IEEE Intelligent Systems 18(5) (September 2003)
Vadrevu, S., Nagarajan, S., Gelgi, F., Davulcu, H.: Automated metadata and instance extraction from news web sites. In: The 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne University of Technology, France (2005) (to appear)
Ashish, N., Knoblock, C.A.: Semi-automatic wrapper generation for internet information sources. In: Conference on Cooperative Information Systems, pp. 160–169 (1997)
Kushmerick, N., Weld, D.S., Doorenbos, R.B.: Wrapper induction for information extraction. In: Intl. Joint Conference on Artificial Intelligence, pp. 729–737 (1997)
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)
Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: ACM SIGMOD, San Diego, USA (2003)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall. In: Intl. World Wide Web Conf. (2004)
Ciravegna, F., Chapman, S., Dingli, A., Wilks, Y.: Learning to harvest information for the semantic web. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 312–326. Springer, Heidelberg (2004)
Dill, S., Tomlin, J.A., Zien, J.Y., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A.: Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In: Twelth International Conference on World Wide Web, pp. 178–186 (2003)
Collins, A.M., Loftus, E.F.: A spreading activation theory of semantic processing. Psychological Review (82), 407–428 (1975)
Salton, G., Buckley, C.: On the use of spreading activation methods in automatic information. In: Proceedings of the 11th international ACM SIGIR conference on Research and development in information retrieval, pp. 147–160. ACM Press, New York (1988)
Guha, R.V., McCool, R.: Tap: A semantic web toolkit. Semantic Web Journal (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelgi, F., Vadrevu, S., Davulcu, H. (2005). Improving Web Data Annotations with Spreading Activation. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, JY., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2005. WISE 2005. Lecture Notes in Computer Science, vol 3806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581062_8
Download citation
DOI: https://doi.org/10.1007/11581062_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30017-5
Online ISBN: 978-3-540-32286-3
eBook Packages: Computer ScienceComputer Science (R0)