Abstract
Parallel I/O access patterns act as fingerprints of a parallel program. In order to extract meaningful information from these patterns, they have to be represented appropriately. Due to the fact that string objects can be easily compared using Kernel Methods, a conversion to a weighted string representation is proposed in this paper, together with a novel string kernel function called Kast Spectrum Kernel. The similarity matrices, obtained after applying the mentioned kernel over a set of examples from a real application, were analyzed using Kernel Principal Component Analysis (Kernel PCA) and Hierarchical Clustering. The evaluation showed that 2 out of 4 I/O access pattern groups were completely identified, while the other 2 conformed a single cluster due to the intrinsic similarity of their members. The proposed strategy can be promisingly applied to other similarity problems involving tree-like structured data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kunkel, J.M.: Simulating parallel programs on application and system level. Comput. Sci. Res. Dev. 28(2), 167–174 (2012)
Liu, Y., Gunasekaran, R., Ma, X.S., Vazhkudai, S.S.: Automatic identification of application I/O signatures from noisy server-side traces. In: Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST 2014), Santa Clara, pp. 213–228 (2014)
Kung, S.Y.: Kernel Methods and Machine Learning. Cambridge University Press, Cambridge (2014)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
BakIr, G., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. The MIT Press, Cambridge (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference. Springer Series in Statistics. Springer, New York (2009)
Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). doi:10.1007/BFb0020217
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels for structured data. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 66–83. Springer, Heidelberg (2003). doi:10.1007/3-540-36468-4_5
Gärtner, T., Lloyd, J.W., Flach, P.A.: Kernels and Distances for Structured Data. Mach. Learn. 57(3), 205–232 (2004)
Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz (1999)
Vishwanathan, S.V.N., Smola, A.J.: Fast kernels for string and tree matching. In: Advances in Neural Information Processing Systems 15, pp. 569–576 (2003)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the Pacific Symposium on Biocomputing, vol. 7, pp. 566–575 (2002)
Kluge, M.: Comparison and End-to-End Performance Analysis of Parallel Filesystems. Ph.D. Thesis Dissertation. Technische Universität Dresden (2011)
Loewe, W., McLarty, T., Morrone, C.: IOR Benchmark (2012)
Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131(1), 273 (2000)
Madhyastha, T.M., Reed, D.A.: Learning to classify parallel input/output access patterns. IEEE Trans. Parallel Distrib. Syst. 13(8), 802–813 (2002)
Behzad B., Byna S., Prabhat and Snir, M.: Pattern-driven parallel I/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, Austin, Texas, pp. 43–48 (2015)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, pp. 94–105 (1998)
Koller, R., Rangaswami, R.: I/O Deduplication: utilizing content similarity to improve I/O performance. ACM Trans. Storage (TOS) 6(3), 13:1–13:26 (2010)
Acknowledgements
Raul Torres would like to acknowledge the financial support from the Colombian Administrative Department of Science, Technology and Innovation (Colciencias) as well as the mathematical advisory received from Ruslan Krenzler.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Torres, R., Kunkel, J., Dolz, M.F., Ludwig, T. (2017). A Novel String Representation and Kernel Function for the Comparison of I/O Access Patterns. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2017. Lecture Notes in Computer Science(), vol 10421. Springer, Cham. https://doi.org/10.1007/978-3-319-62932-2_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-62932-2_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62931-5
Online ISBN: 978-3-319-62932-2
eBook Packages: Computer ScienceComputer Science (R0)