Abstract
Present day computing environments consist of different bits of hardware and software that are associated with each other in a complex way. Hence, in case of failures of such system, it is very difficult to detect the exact module which has caused the problem. In such a situation, an automated technique which can pin down to (at least) a set of modules that may be responsible for the failure would be very useful for support engineers. This paper makes an important step towards that direction. We propose a graph based troubleshooting methodology exploring storage system logs (EMS) of around 4500 customer cases to troubleshoot customer problems. We provide a ranked list of modules to the support engineers which can significantly narrow down the troubleshooting process for around 95% cases with only 10% false positive rate whereas the competing baseline MonitorRank covers only 74% cases with 23% false positive rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Case resolution period is the duration between ‘Case opened date’ and ‘Case closed date’ from customer support database.
- 2.
- 3.
- 4.
Note that case index is denoted by \(C_{k}\) whereas cluster index by \(c_{k}\).
References
Abad, C., Taylor, J., Sengul, C., Yurcik, W., Zhou, Y., Rowe, K.: Log correlation for intrusion detection: a proof of concept. In: ACSAC, pp. 255–264 (2003)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, É.: The louvain method for community detection in large networks. JSTAT 10, P10008 (2011)
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.U.: Complex networks: structure and dynamics. Phys. Rep. 424(4), 175–308 (2006)
Brodie, M., Ma, S., Lohman, G., Mignet, L., Wilding, M., Champlin, J., Sohn, P.: Quickly finding known software problems via automated symptom matching. In: Autonomic Computing, ICAC 2005, pp. 101–110. IEEE (2005)
Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recogn. Lett. 1(4), 245–253 (1983)
Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. JAIR 1, 231–255 (1994)
Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture. ACM SIGMETRICS Perform. Eval. Rev. 41, 93–104 (2013)
Lao, N., Wen, J.R., Ma, W.Y., Wang, Y.M.: Combining high level symptom descriptions and low level state information for configuration fault diagnosis. LISA 4, 151–158 (2004)
Nair, V., Raul, A., Khanduja, S., Bahirwani, V., Shao, Q., Sellamanickam, S., Keerthi, S., Herbert, S., Dhulipalla, S.: Learning a hierarchical monitoring system for detecting and diagnosing service issues. In: ACM SIGKDD, pp. 2029–2038 (2015)
Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: KDD, pp. 631–636 (2003)
Pelleg, D., Moore, A.W., et al.: X-means: extending k-means with efficient estimation of the number of clusters. In: ICML, vol. 1, pp. 727–734 (2000)
Khatuya, S., Ganguly, N., Basak, J., Bharde, M., Mitra, B.: Anomaly detection from event log empiricism. In: IEEE Conference on Computer Communications, INFOCOM 2018. IEEE (2018)
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. AWL (Publishing), Boston (1989)
Talwadker, R.: Dexter: faster troubleshooting of misconfiguration cases using system logs. In: SYSTOR, p. 7 (2017)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. JMLR 11, 2837–2854 (2010)
Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: SIGOPS, pp. 117–132 (2009)
Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., Pasupathy, S.: Sherlog: error diagnosis by connecting clues from run-time logs. ACM SIGARCH Comput. Architect. News 38, 143–154 (2010)
Acknowledgement
This work has been partially supported by NetApp Inc, Bangalore funded research project “Real Time Fault Prediction System (RFS)”. The authors also extend their sincere thanks to Siddhartha Nandi from Advanced Technology Group, NetApp for facilitating this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Khatuya, S., Bakhshi, A., Basak, J., Ganguly, N., Mitra, B. (2018). GBTM: Graph Based Troubleshooting Method for Handling Customer Cases Using Storage System Log. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-93034-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93033-6
Online ISBN: 978-3-319-93034-3
eBook Packages: Computer ScienceComputer Science (R0)