Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Dimensional Clustering of Linked Data: Techniques and Applications

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8990))

Abstract

The plurality and heterogeneity of linked data features require appropriate solutions for accurate matching and clustering. In this paper, we propose a dimensional clustering approach to enforce (i) the capability to select the set of features to use for data matching and clustering, that are packaged into the so-called thematic dimension, and (ii) the capability to make explicit the cause of similarity that generates each cluster. Ensemble techniques for combining different single-dimension cluster sets into a sort of multi-dimensional view of the considered linked data are also presented as a further contribution of the paper. Application to linked data summarization and exploration is finally discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    For the sake of readability, only a subset of the available properties is reported (http://www.dbpedia.org).

  2. 2.

    More technical details about the construction of linked data items from the RDF statements of a repository \(\mathcal {R}\) are provided in [5].

  3. 3.

    Since \({\text {ldi-match}}^{\mathcal {D}}(ldi_i, ldi_j) = {\text {ldi-match}}^{\mathcal {D}}(ldi_j, ldi_i)\), we define \(\sigma M\) and \(\pi M\) as upper triangular matrices.

  4. 4.

    A detailed presentation of summarization techniques is out of the scope of this work. Here, we outline how to generate a summary-view over a cluster set \(CL\). For the interested reader, a more technical presentation of cluster essential definition, proximity-link specification, and prominence value calculation is provided in [5].

References

  1. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)

    Article  Google Scholar 

  2. Bae, E., Bailey, J.: COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, pp. 53–62 (2006)

    Google Scholar 

  3. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006)

    Google Scholar 

  4. Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  5. Castano, S., Ferrara, A., Montanelli, S.: Thematic clustering and exploration of linked data. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 157–175. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Drost, I., Bickel, S., Scheffer, T.: Discovering communities in linked data by multi-view clustering. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation, Magdeburg, Germany, pp. 342–349 (2005)

    Google Scholar 

  7. Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)

    Article  Google Scholar 

  8. Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the 3rd EDBT International Workshop on Linked Web Data Management (LWDM 2013), Genova, Italy (2013)

    Google Scholar 

  9. Giannakidou, E., Vakali, A.: Integrating web 2.0 data into linked open data cloud via clustering. In: Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly, Ghent, Belgium (2010)

    Google Scholar 

  10. Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: Proceedings of the IEEE SocialCom/PASSAT Conference, Minneapolis, Minnesota, USA, pp. 303–308 (2010)

    Google Scholar 

  11. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)

    Article  MATH  Google Scholar 

  12. Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. J. Web Semant. 7(3), 235–251 (2009)

    Article  Google Scholar 

  13. Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Lu, Q., Conrad, J.G., Al-Kofahi, K., Keenan, W.: Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK (2011)

    Google Scholar 

  15. Minaei-Bidgoli, B., Topchy, A.P., Punch, W.F.: A comparison of resampling methods for clustering ensembles. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2004), Las Vegas, Nevada, USA, pp. 939–945 (2004)

    Google Scholar 

  16. Müller, E., Günnemann, S., Färber, I., Seidl, T.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1207–1210 (2012)

    Google Scholar 

  17. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  18. Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)

    Article  Google Scholar 

  19. Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada (2009)

    Google Scholar 

  20. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: Proceedings of the 6th ACM SIGKDD KDD-2000 Workshop on Text Mining, Boston, MA, USA (2000)

    Google Scholar 

  21. Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  22. Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(3), 337–372 (2011)

    Article  MathSciNet  Google Scholar 

  23. Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Inf. Sci. 126(1–4), 83–98 (2000)

    Article  MATH  Google Scholar 

  24. Wang, Z., Li, J., Zhao, Y., Setchi, R., Tang, J.: A unified approach to matching semantic data on the web. Knowl. Based Syst. 39, 173–184 (2013)

    Article  Google Scholar 

  25. Xu, R., Wunsch II, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  26. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)

    Article  MATH  Google Scholar 

  27. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Montanelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ferrara, A., Genta, L., Montanelli, S., Castano, S. (2015). Dimensional Clustering of Linked Data: Techniques and Applications. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., De Antonellis, V., De Virgilio, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. Lecture Notes in Computer Science(), vol 8990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46562-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46562-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46561-5

  • Online ISBN: 978-3-662-46562-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics