Abstract
Backed up by major Web players schema.org is the latest broad initiative for structuring Web information. Unfortunately, a representative analysis on a corpus of 733 million Web documents shows that, a year after its introduction, only 1.56% of documents featured any schema.org annotations. A probable reason is that providing annotations is quite tiresome, hindering wide-spread adoption. Here even state-of-the-art tools like Google’s Structured Data Markup Helper offer only limited support. In this paper we propose SASS, a system for automatically finding high quality schema suggestions for page content, to ease the annotation process. SASS intelligently blends supervised machine learning techniques with simple user feedback. Moreover, additional support features for binding attributes to values even further reduces the necessary effort. We show that SASS is superior to current tools for schema.org annotations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berners-Lee, T.: Linked Data. Design issues for the World Wide Web Consortium (2006), http://www.w3.org/DesignIssues/LinkedData.html
Bizer, C., et al.: Linked Data - The Story So Far. Int. J. Semant. Web Inf. Syst. (2009)
Cafarella, M.J., et al.: WebTables: Exploring the Power of Tables on the Web. PVLDB (2008)
Cafarella, M.J., Etzioni, O.: Navigating Extracted Data with Schema Discovery. Proc. of the 10th Int. Workshop on Web and Databases, WebDB (2007)
Finkel, J.R., et al.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proc. of Annual Meeting of the Assoc. for Comp. Linguistics, ACL (2005)
Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 55, 1 (1997)
Homoceanu, S., Wille, P., Balke, W.-T.: ProSWIP: Property-based Data Access for Semantic Web Interactive Programming. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 184–199. Springer, Heidelberg (2013)
Homoceanu, S., et al.: Review Driven Customer Segmentation for Improved E-Shopping Experience. In: Int. Conf. on Web Science, WebSci (2011)
Homoceanu, S., et al.: Will I Like It? Providing Product Overviews Based on Opinion Excerpts. IEEE (2011)
Homoceanu, S., Balke, W.-T.: A Chip Off the Old Block – Extracting Typical Attributes for Entities based on Family Resemblance (2013) (Under submission), http://www.ifis.cs.tu-bs.de/node/2859
Jain, P., et al.: Contextual ontology alignment of LOD with an upper ontology: A case study with proton. The Semantic Web: Research and Applications (2011)
Jain, P., et al.: Ontology Alignment for Linked Open Data. Information. Retrieval. Boston (2010)
Khalili, A., Auer, S.: WYSIWYM – Integrated Visualization, Exploration and Authoring of Un-structured and Semantic Content. In: WISE (2013)
Norbaitiah, A., Lukose, D.: Enriching Webpages with Semantic Information. In: Proc. Dublin Core and Metadata Applications (2012)
Suchanek, F.M., Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. In: WWW (2007)
Tversky, A.: Features of similarity. Psychol. Rev. 84, 4 (1977)
Veres, C., Elseth, E.: Schema. org for the Semantic Web with MaDaME. In: Proc. of I-SEMANTICS (2013)
Whitelaw, C., Kehlenbeck, A., Petrovic, N., Ungar, L.: Web-scale named entity recognition. In: CIKM (2008)
Wittgenstein, L.: Philosophical investigations. The MacMillan Company, New York (1953)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Homoceanu, S., Geilert, F., Pek, C., Balke, WT. (2014). Any Suggestions? Active Schema Support for Structuring Web Information. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-05813-9_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05812-2
Online ISBN: 978-3-319-05813-9
eBook Packages: Computer ScienceComputer Science (R0)