Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A similarity study of I/O traces via string kernels

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Understanding I/O for data-intense applications is the foundation for the optimization of these applications. The classification of the applications according to the expressed I/O access pattern eases the analysis. An access pattern can be seen as fingerprint of an application. In this paper, we address the classification of traces. Firstly, we convert them first into a weighted string representation. Due to the fact that string objects can be easily compared using kernel methods, we explore their use for fingerprinting I/O patterns. To improve accuracy, we propose a novel string kernel function called kast2 spectrum kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using kernel principal component analysis and hierarchical clustering. The evaluation showed that two out of four I/O access pattern groups were completely identified, while the other two groups conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Torres R, Kunkel JM, Dolz MF, Ludwig T (2017) A novel string representation and kernel function for the comparison of I/O access patterns. Parallel computing technologies: lecture notes in computer science, vol 10421. Springer, Cham

    Google Scholar 

  2. Kunkel JM (2012) Simulating parallel programs on application and system level. Comput Sci Res Dev 28(2):167–174

    Google Scholar 

  3. Liu Y, Gunasekaran R, Ma XS, Vazhkudai SS (2014) Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 14), Santa Clara, pp 213–228

  4. Kung SY (2014) Kernel methods and machine learning. Cambridge University Press, Cambridge

    Book  Google Scholar 

  5. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York

    Book  Google Scholar 

  6. BakIr G, Hofmann T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SVN (2007) Predicting structured data. The MIT Press, Cambridge

    Book  Google Scholar 

  7. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning–data mining, inference. Springer series in statistics. Springer, New York

    MATH  Google Scholar 

  8. Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: International Conference on Artificial Neural Networks, ICANN 1997: Artificial Neural Networks—ICANN’97, pp 583–588

    Google Scholar 

  9. Gärtner T, Lloyd JW, Flach PA (2002) Kernels for structured data. In: ILP’02 Proceedings of the 12th International Conference on Inductive Logic Programming, Sidney, pp 66–83

  10. Gärtner T, Lloyd JW, Flach PA (2004) Kernels and distances for structured data. Mach Learn 57(3):205–232

    Article  Google Scholar 

  11. Haussler D (1999) Convolution kernels on discrete structures. Technical report, University of California at Santa Cruz

  12. Vishwanathan SVN, Smola AJ (2003) Fast kernels for string and tree matching. Adv Neural Inf Process Syst 15:569–576

    Google Scholar 

  13. Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol 7, pp 566–575

  14. Loewe W, McLarty T, Morrone C (2012) IOR benchmark. https://github.com/hpc/ior. Accessed 16 Mar 2018

  15. Fryxell B, Olson K, Ricker P, Timmes FX, Zingale M, Lamb DQ, MacNeice P, Rosner R, Truran JW, Tufo H (2000) FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys J Suppl Ser 131(1):273

    Article  Google Scholar 

  16. Kluge M (2011) Comparison and end-to-end performance analysis of parallel filesystems. Ph.D. thesis dissertation, Technische Universität Dresden

  17. Madhyastha TM, Reed DA (2002) Learning to classify parallel input/output access patterns. IEEE Trans Parallel Distrib Syst 13(8):802–813

    Article  Google Scholar 

  18. Behzad B, Byna S, Prabhat, Snir M (2015) Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp 43–48

  19. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD ’98 Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, pp 94–105

  20. Koller R, Rangaswami R (2010) I/O deduplication: utilizing content similarity to improve I/O performance. ACM Trans Storage 6(3):13:1–13:26

    Article  Google Scholar 

Download references

Acknowledgements

Raul Torres would like to acknowledge the financial support from the Colombian Administrative Department of Science, Technology and Innovation (Colciencias) as well as the mathematical advisory received from Ruslan Krenzler.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raul Torres.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torres, R., Kunkel, J.M., Dolz, M.F. et al. A similarity study of I/O traces via string kernels. J Supercomput 75, 7814–7826 (2019). https://doi.org/10.1007/s11227-018-2471-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2471-x

Keywords