Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

ConfSeer: leveraging customer support knowledge bases for automated misconfiguration detection

Published: 01 August 2015 Publication History

Abstract

We introduce ConfSeer, an automated system that detects potential configuration issues or deviations from identified best practices by leveraging a knowledge base (KB) of technical solutions. The intuition is that these KB articles describe the configuration problems and their fixes so if the system can accurately understand them, it can automatically pinpoint both the errors and their resolution. Unfortunately, finding an accurate match is difficult because (a) the KB articles are written in natural language text, and (b) configuration files typically contain a large number of parameters with a high value range. Thus, expert-driven manual troubleshooting is not scalable.
While there are several state-of-the-art techniques proposed for individual tasks such as keyword matching, concept determination and entity resolution, none offer a practical end-to-end solution to detect problems in machine configurations. In this paper, we describe our experiences building ConfSeer using a novel combinations of ideas from natural language processing, information retrieval and interactive learning. ConfSeer powers the recommendation engine behind Microsoft Operations Management Suite that proposes fixes for software configuration errors. The system has been running in production for about a year to proactively find misconfigurations on tens of thousands of servers. Our evaluation of ConfSeer against an expert-defined rule-based commercial system, an expert survey and web search engines shows that it achieves 80%-97.5% accuracy and incurs low runtime overheads.

References

[1]
Apple Knowledge Base. http://kbase.info.apple.com/.
[2]
Desk.com. http://desk.com.
[3]
EMC Powerlink. http://powerlink.emc.com.
[4]
Google Knowledge Base. http://goo.gl/6wN6oB.
[5]
How to query the Microsoft Knowledge Base. http://support.microsoft.com/kb/242450.
[6]
IBM Software Knowledge Base. http://goo.gl/fY0cDQ.
[7]
Microsoft Operations Management Suite. http://www.microsoft.com/en-us/server-cloud/operations-management-suite/.
[8]
Microsoft Support. http://support.microsoft.com/.
[9]
Oracle Support. http://support.oracle.com.
[10]
StackOverflow. http://stackoverflow.com.
[11]
SysSieve. http://research.microsoft.com/en-us/um/people/navendu/syssieve/.
[12]
VMWare KB - Knowledge Base Articles for all VMWare Products. http://kb.vmware.com.
[13]
B. Agarwal, R. Bhagwan, T. Das, S. Eswaran, V. N. Padmanabhan, and G. M. Voelker. Netprints: Diagnosing home network misconfigurations using shared knowledge. In NSDI, 2009.
[14]
H. Altwaijry, D. V. Kalashnikov, and S. Mehrotra. Query-driven approach to entity resolution. Proceedings of the VLDB Endowment, 2013.
[15]
M. Attariyan, M. Chow, and J. Flinn. X-ray: automating root-cause diagnosis of performance anomalies in production software. In OSDI, 2012.
[16]
M. Attariyan and J. Flinn. Automating configuration troubleshooting with dynamic information flow analysis. In OSDI, 2010.
[17]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007.
[18]
E. Bounimova, P. Godefroid, and D. Molnar. Billions and billions of constraints: Whitebox fuzz testing in production. In IEEE ICSE, 2013.
[19]
C. Cadar, D. Dunbar, and D. R. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, 2008.
[20]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.
[21]
K. Chen, C. Guo, H. Wu, J. Yuan, Z. Feng, Y. Chen, S. Lu, and W. Wu. Generic and automatic address conf for data center networks. SIGCOMM CCR, 2010.
[22]
X. Chen, Y. Mao, Z. M. Mao, and J. Van der Merwe. Declarative configuration management for complex and dynamic networks. In CoNEXT, 2010.
[23]
K. Church and W. Gale. Inverse document frequency (idf): A measure of deviations from poisson. In NLPVLC. 1999.
[24]
D. Crockford. The application/json media type for javascript object notation (json). https://goo.gl/SM1kDa, 2006.
[25]
H. Dai, C. Murphy, and G. Kaiser. Configuration fuzzing for software vulnerability detection. In IEEE ARES, 2010.
[26]
D. Deng, Y. Jiang, G. Li, J. Li, and C. Yu. Scalable column concept determination for web tables using large knowledge bases. VLDB, 2013.
[27]
H. Duan, C. Zhai, J. Cheng, and A. Gattani. Supporting keyword search in product database: a probabilistic approach. Proceedings of the VLDB Endowment, 2013.
[28]
S. Duan, V. Thummala, and S. Babu. Tuning database configuration parameters with ituned. VLDB Endowment, 2009.
[29]
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall. In ACM WWW, 2004.
[30]
Y. Fang and K. C.-C. Chang. Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality. In WDSM. ACM, 2011.
[31]
N. Feamster and H. Balakrishnan. Detecting bgp configuration faults with static analysis. In NSDI, 2005.
[32]
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In EACL. Citeseer, 2006.
[33]
M. U. Haq, H. Ahmed, and A. M. Qamar. Dynamic entity and relationship extraction from news articles. In ICET. IEEE, 2012.
[34]
A. Kapoor. Web-to-host: Reducing total cost of ownership. Technical report, 200503, The Tolly Group, 2000.
[35]
L. Keller, P. Upadhyaya, and G. Candea. Conferr: A tool for assessing resilience to human configuration errors. In DSN, 2008.
[36]
J. Kiefer. Sequential minimax search for a maximum. AMS, 1953.
[37]
R. Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, volume 14, pages 1137--1145, 1995.
[38]
H. Köpcke, A. Thor, and E. Rahm. Evaluation of entity resolution approaches on real-world match problems. VLDB, 2010.
[39]
J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. JMLR, 2009.
[40]
D. B. Lenat and R. V. Guha. Building large knowledge-based systems; representation and inference in the Cyc project. Addison-Wesley Longman Publishing Co., Inc., 1989.
[41]
G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment, 2010.
[42]
J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li. Mining invariants from console logs for system problem detection. ATC, 2010.
[43]
A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In NLL. ACL, 2003.
[44]
Microsoft. Microsoft Developer Network. http://msdn.microsoft.com/.
[45]
Microsoft. Windows Error Reporting. http://goo.gl/Tma5G3.
[46]
G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 1995.
[47]
J. C. Perez. Google outages blamed on sign-in system. http://goo.gl/PScp6m.
[48]
M. F. Porter. Snowball: A language for stemming algorithms, 2001.
[49]
R. Potharaju, N. Jain, and C. Nita-Rotaru. Juggling the jigsaw: Towards automated problem inference from network trouble tickets. In NSDI, 2013.
[50]
A. Rabkin and R. Katz. Precomputing possible configuration error diagnoses. In IEEE ASE, 2011.
[51]
V. Ramachandran, M. Gupta, M. Sethi, and S. R. Chowdhury. Determining configuration parameter dependencies via analysis of configuration data from multi-tiered enterprise applications. In ACM ICAC, 2009.
[52]
W. Rao, L. Chen, P. Hui, and S. Tarkoma. Bitlist: New full-text index for low space cost and efficient keyword search. VLDB, 2013.
[53]
V. Rastogi, N. Dalvi, and M. Garofalakis. Large-scale collective entity matching. Proceedings of the VLDB Endowment, 2011.
[54]
G. Salton and M. J. McGill. Introduction to modern information retrieval, McGraw-Hill, Inc. 1986.
[55]
G. Simoes, H. Galhardas, and L. Gravano. When speed has a price: Fast information extraction using approximate algorithms. VLDB, 2013.
[56]
F. Smadja. Retrieving collocations from text: Xtract. CL, 1993.
[57]
Y.-Y. Su, M. Attariyan, and J. Flinn. Autobash: improving configuration management with operating system causality analysis. SIGOPS OSR, 2007.
[58]
F. M. Suchanek, S. Abiteboul, and P. Senellart. Paris: Probabilistic alignment of relations, instances, and schema. VLDB, 2011.
[59]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW. ACM, 2007.
[60]
The Guardian. Choose customer service without the call center. http://goo.gl/8sUj7D.
[61]
H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with peerpressure. In OSDI, 2004.
[62]
Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang, C. Yuan, and Z. Zhang. Strider: A black-box, state-based approach to change and configuration management and support. SCP, 2004.
[63]
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML, 2009.
[64]
A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. In OSDI, 2004.
[65]
A. Wilhelm. Microsoft: Azure went down in Western Europe due to misconfigured network device. http://goo.gl/USVRlC.
[66]
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD. ACM, 2012.
[67]
Y. Xiong, A. Hubaux, S. She, and K. Czarnecki. Generating range fixes for software configuration. In ICSE, 2012.
[68]
T. Xu, J. Zhang, P. Huang, J. Zheng, T. Sheng, D. Yuan, Y. Zhou, and S. Pasupathy. Do not blame users for misconfigurations. In In ACM SOSP, 2013.
[69]
W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan. Detecting large-scale system problems by mining console logs. In SOSP. ACM, 2009.
[70]
W. Xu, L. Huang, A. Fox, D. A. Patterson, and M. I. Jordan. Mining Console Logs for Large-Scale System Problem Detection. SysML, 2008.
[71]
M. Yamamoto and K. Church. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. CL, 2001.
[72]
Z. Yan, N. Zheng, Z. G. Ives, P. P. Talukdar, and C. Yu. Active learning in keyword search-based data integration. The VLDB Journal, 2015.
[73]
Z. Yin, X. Ma, J. Zheng, Y. Zhou, L. Bairavasundaram, and S. Pasupathy. An empirical study on configuration errors in commercial and open source systems. In SOSP. ACM, 2011.
[74]
C. Yuan, N. Lao, J.-R. Wen, J. Li, Z. Zhang, Y.-M. Wang, and W.-Y. Ma. Automated known problem diagnosis with event traces. In SIGOPS OSR, 2006.
[75]
D. Yuan, Y. Xie, R. Panigrahy, J. Yang, C. Verbowski, and A. Kumar. Context-based online configuration-error detection. In ATC, 2011.
[76]
J. Zhang, L. Renganarayana, X. Zhang, N. Ge, V. Bala, T. Xu, and Y. Zhou. Encore: exploiting system environment and correlation information for misconfiguration detection. In ASPLOS, 2014.
[77]
S. Zhang and M. D. Ernst. Automated diagnosis of software configuration errors. In ICSE, 2013.

Cited By

View all
  • (2021)Static detection of silent misconfigurations with deep interaction analysisProceedings of the ACM on Programming Languages10.1145/34855175:OOPSLA(1-30)Online publication date: 15-Oct-2021
  • (2021)Test-case prioritization for configuration testingProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464810(452-465)Online publication date: 11-Jul-2021
  • (2021)Challenges and opportunities: an in-depth empirical study on configuration error injection testingProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464799(478-490)Online publication date: 11-Jul-2021
  • Show More Cited By
  1. ConfSeer: leveraging customer support knowledge bases for automated misconfiguration detection

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 8, Issue 12
    Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii
    August 2015
    728 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2015
    Published in PVLDB Volume 8, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Static detection of silent misconfigurations with deep interaction analysisProceedings of the ACM on Programming Languages10.1145/34855175:OOPSLA(1-30)Online publication date: 15-Oct-2021
    • (2021)Test-case prioritization for configuration testingProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464810(452-465)Online publication date: 11-Jul-2021
    • (2021)Challenges and opportunities: an in-depth empirical study on configuration error injection testingProceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464799(478-490)Online publication date: 11-Jul-2021
    • (2020)PracExtractorProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489164(265-280)Online publication date: 15-Jul-2020
    • (2020)Testing configuration changes in context to prevent production failuresProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488808(735-751)Online publication date: 4-Nov-2020
    • (2020)DeepTriageProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403380(3281-3289)Online publication date: 23-Aug-2020
    • (2018)Towards Effective Extraction and Linking of Software Mentions from User-Generated Support TicketsProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3272026(2263-2271)Online publication date: 17-Oct-2018
    • (2017)Data-Driven Techniques in Computing System ManagementACM Computing Surveys10.1145/309269750:3(1-43)Online publication date: 27-Jul-2017
    • (2016)Early detection of configuration errors to reduce failure damageProceedings of the 12th USENIX conference on Operating Systems Design and Implementation10.5555/3026877.3026925(619-634)Online publication date: 2-Nov-2016

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media