Abstract
The quality of data is a key factor that determines the performance of information systems, in particular with regard (1) to the amount of exceptions in the execution of business processes and (2) to the quality of decisions based on the output of the respective information system. Recently, the Semantic Web and Linked Data activities have started to provide substantial data resources that may be used for real business operations. Hence, it will soon be critical to manage the quality of such data. Unfortunately, we can observe a wide range of data quality problems in Semantic Web data. In this paper, we (1) evaluate how the state of the art in data quality research fits the characteristics of the Web of Data, (2) describe how the SPARQL query language and the SPARQL Inferencing Notation (SPIN) can be utilized to identify data quality problems in Semantic Web data automatically and this within the Semantic Web technology stack, and (3) evaluate our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)
Redman, T.C.: Data quality: the field guide. Digital Press, Boston (2001)
Redman, T.C.: Data quality for the information age. Artech House, Boston (1996)
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001)
Uschold, M., Gruninger, M.: Ontologies: Principles, Methods, and Applications. The Knowledge Engineering Review 11(2), 93–155 (1996)
BestBuy catalog in RDF, http://products.semweb.bestbuy.com/sitemap.xml
Hepp, M.: GoodRelations: An ontology for describing products and services offers on the web. In: Gangemi, A., Euzenat, J. (eds.) EKAW 2008. LNCS (LNAI), vol. 5268, pp. 329–346. Springer, Heidelberg (2008)
Oliveira, P., Rodrigues, F., Henriques, P.R.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (2005)
Leser, U., Naumann, F.: Informationsintegration: Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen. dpunkt-Verlag, Heidelberg (2007)
Oliveira, P., Rodrigues, F., Henriques, P.R., Galhardas, H.: A Taxonomy of Data Quality Problems. In: Proc. 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal (2005)
Rahm, E., Do, H.-H.: Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin 23(4), 3–13 (2000)
OpenLink Software: Sponger Technology, http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger
Olson, J.: Data quality: the accuracy dimension. Morgan Kaufmann Publishers, San Francisco (2003)
Wang, X., Hamilton, H.J., Bither, Y.: An ontology-based approach to data cleaning. Dept. of Computer Science, University of Regina, Regina (2005)
Grüning, F.: Datenqualitätsmanagement in der Energiewirtschaft. Oldenburger Verlag für Wirtschaft, Informatik und Recht, Oldenburg (2009)
Ji, Q., Haase, P., Qi, G., Hitzler, P., Stadtmüller, S.: RaDON – Repair and Diagnosis in Ontology Networks. In: 6th European Semantic Web Conference on The Semantic Web: Research and Applications (2009)
Knublauch, H.: SPIN – SPARQL Inferencing Notation (2009), http://spinrdf.org/ (retrieved December 4, 2009)
Alexiev, V., Breu, M., de Bruin, J., Fensel, D., Lara, R., Lausen, H.: Information integration with ontologies: experiences from an industrial showcase. Jon Wiley & Sons, Ltd., Chichester (2005)
Eckerson, W.: Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data. Report of The Data Warehousing Institute (2002)
Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41, 79–82 (1998)
Kedad, Z., Métais, E.: Ontology-Based Data Cleaning. In: Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers (2002)
Hartig, O.: Provenance Information in the Web of Data. In: Linked Data on the Web (LDOW 2009) Workshop at the World Wide Web Conference, WWW (2009)
Hartig, O.: Querying trust in RDF data with tSPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 5–20. Springer, Heidelberg (2009)
O’Reilly catalog in RDF, http://oreilly.com/catalog/9780596007683
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fürber, C., Hepp, M. (2010). Using SPARQL and SPIN for Data Quality Management on the Semantic Web. In: Abramowicz, W., Tolksdorf, R. (eds) Business Information Systems. BIS 2010. Lecture Notes in Business Information Processing, vol 47. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12814-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-12814-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12813-4
Online ISBN: 978-3-642-12814-1
eBook Packages: Computer ScienceComputer Science (R0)