Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3488766.3488807guideproceedingsArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
research-article
Free access

Automated reasoning and detection of specious configuration in large systems with symbolic execution

Published: 04 November 2020 Publication History

Abstract

Misconfiguration is a major cause of system failures. Prior solutions focus on detecting invalid settings that are introduced by user mistakes. But another type of misconfiguration that continues to haunt production services is specious configuration-- settings that are valid but lead to unexpectedly poor performance in production. Such misconfigurations are subtle, so even careful administrators may fail to foresee them.
We propose a tool called Violet to detect specious configuration. We realize the crux of specious configuration is that it causes some slow code path to be executed, but the bad performance effect cannot always be triggered. Violet thus takes a novel approach that uses selective symbolic execution to systematically reason about the performance effect of configuration parameters, their combination effect, and the relationship with input. Violet outputs a performance impact model for the automatic detection of poor configuration settings. We applied Violet on four large systems. To evaluate the effectiveness of Violet, we collect 17 real-world specious configuration cases. Violet detects 15 of them. Violet also identifies 11 unknown specious configurations.

References

[1]
Amazon AWS S3 outage for several hours on February 28th, 2017. https://aws.amazon.com/message/41926.
[2]
Amazon EC2 and RDS service disruption on April 21st, 2011. http://aws.amazon.com/message/65648.
[3]
AWS service outage on October 22nd, 2012. https://aws.amazon.com/message/680342.
[4]
Database administrators. https://dba.stackexchange.com.
[5]
Facebook global outage for 2.5 hours on September 23rd, 2010. https://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919.
[6]
Google API infrastructure outage on April 30th, 2013. http://googledevelopers.blogspot.com/2013/05/google-api-infrastructure-outage_3.html.
[7]
Google compute engine incident #16007. https://status.cloud.google.com/incident/compute/16007?post-mortem.
[8]
Google service outage on January 24th, 2014. http://googleblog.blogspot.com/2014/01/todays-outage-for-several-google.html.
[9]
Microsoft Azure storage disruption in US south on December 28th, 2012. http://blogs.msdn.com/b/windowsazure/archive/2013/01/16/details-of-the-december-28th-2012-windows-azure-storage-disruption-in-us-south.aspx.
[10]
Microsoft Azure storage disruption on February 22nd, 2013. http://blogs.msdn.com/b/windowsazure/archive/2013/03/01/details-of-the-february-22nd-2013-windows-azure-storage-disruption.aspx.
[11]
Oss-fuzz: Continuous fuzzing for open source software. https://github.com/google/oss-fuzz.
[12]
Percona blogs. https://www.percona.com/blog.
[13]
RDS MySQL insights: Top query "commit". https://serverfault.com/questions/1029595/rds-mysql-insights-top-query-commit.
[14]
Serverfault. https://serverfault.com.
[15]
Slow InnoDB insert/update. https://www.serveradminblog.com/2014/01/slow-innodb-insertupdate/.
[16]
Sysbench. https://github.com/akopytov/sysbench.
[17]
Cisco loses customer data in Meraki cloud muckup due to misconfiguration. https://www.theregister.co.uk/2017/08/06/cisco_meraki_data_loss, Aug 6th, 2017.
[18]
Amazon. AWS service outage on December 24th, 2012. http://aws.amazon.com/message/680587.
[19]
M. Attariyan, M. Chow, and J. Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production software. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 307-320, 2012.
[20]
M. Attariyan and J. Flinn. Using causality to diagnose configuration bugs. In Proceedings of the 2008 USENIX Annual Technical Conference, ATC'08, pages 281-286, 2008.
[21]
M. Attariyan and J. Flinn. Automating configuration troubleshooting with dynamic information flow analysis. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1-11, 2010.
[22]
L. Bauer, S. Garriss, and M. K. Reiter. Detecting and resolving policy misconfigurations in access-control systems. In Proceedings of the 13th ACM Symposium on Access Control Models and Technologies, SACMAT '08, pages 185-194, 2008.
[23]
R. Beckett, R. Mahajan, T. Millstein, J. Padhye, and D. Walker. Don't mind the gap: Bridging network-wide objectives and device-level configurations. In Proceedings of the 2016 ACM SIGCOMM Conference, SIGCOMM '16, pages 328-341, Florianopolis, Brazil, 2016.
[24]
C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 209-224, San Diego, California, 2008.
[25]
X. Chen, Y. Mao, Z. M. Mao, and J. Van der Merwe. Declarative configuration management for complex and dynamic networks. In Proceedings of the 6th International Conference, Co-NEXT '10, pages 6:1-6:12, 2010.
[26]
V. Chipounov, V. Kuznetsov, and G. Candea. S2E: A platform for in-vivo multi-path analysis of software systems. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 265-278, Newport Beach, California, USA, 2011.
[27]
T. Das, R. Bhagwan, and P. Naldurg. Baaz: A system for detecting access control misconfigurations. In Proceedings of the 19th USENIX Conference on Security, USENIX Security'10, pages 11-11, 2010.
[28]
J. DeTreville. Making system configuration more declarative. In Proceedings of the 10th Conference on Hot Topics in Operating Systems, HOTOS'05, pages 11-11, 2005.
[29]
W. Enck, P. McDaniel, S. Sen, P. Sebos, S. Spoerel, A. Greenberg, S. Rao, and W. Aiello. Configuration management at massive scale: System design and experience. In Proceedings of the 2007 USENIX Annual Technical Conference, ATC'07, pages 6:1-6:14, 2007.
[30]
N. Feamster and H. Balakrishnan. Detecting BGP configuration faults with static analysis. In Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation, NSDI'05, pages 43-56, 2005.
[31]
Google. Twilio billing incident post-mortem: Breakdown, analysis and root cause. https://www.twilio.com/blog/2013/07/billing-incident-post-mortem-breakdown-analysis-and-root-cause.html.
[32]
J. Gray. Why do computers stop and what can be done about it? In Proc. Symposium on Reliability in Distributed Software and Database Systems, pages 3-12, 1986.
[33]
H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A self-tuning system for big data analytics. In In CIDR, pages 261-272, 2011.
[34]
Y. Hu, G. Huang, and P. Huang. Automated reasoning and detection of specious configuration in large systems with symbolic execution (technical report). http://arxiv.org/abs/2010.06356, 2020.
[35]
P. Huang, W. J. Bolosky, A. Singh, and Y. Zhou. ConfValley: A systematic configuration validation framework for cloud services. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 19:1-19:16, Bordeaux, France, 2015.
[36]
R. Iyer, L. Pedrosa, A. Zaostrovnykh, S. Pirelli, K. Argyraki, and G. Candea. Performance contracts for software network functions. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation, NSDI'19, page 517-530, Boston, MA, USA, 2019.
[37]
L. Keller, P. Upadhyaya, and G. Candea. ConfErr: A tool for assessing resilience to human configuration errors. In Proceedings of the 38th International Conference on Dependable Systems and Networks, DSN'08, pages 157-166, 2008.
[38]
J. C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385-394, July 1976.
[39]
N. Kushman and D. Katabi. Enabling configuration-independent automation by non-expert users. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1-10, 2010.
[40]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization, CGO '04, pages 75-, Palo Alto, California, 2004.
[41]
C. Li, S. Wang, H. Hoffmann, and S. Lu. Statically inferring performance properties of software configurations. In Proceedings of the Fifteenth European Conference on Computer Systems, EuroSys '20, Heraklion, Greece, 2020.
[42]
B. T. Loo, J. M. Hellerstein, I. Stoica, and R. Ramakrishnan. Declarative routing: Extensible routing with declarative queries. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM '05, pages 289-300, 2005.
[43]
D. Oppenheimer, A. Ganapathi, and D. A. Patterson. Why do Internet services fail, and what can be done about it? In Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems (USITS), Seattle, WA, Mar. 2003.
[44]
T. Osogami and T. Itoko. Finding probably better system configurations quickly. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '06/Performance '06, pages 264-275, Saint Malo, France, 2006.
[45]
A. Rabkin and R. Katz. How Hadoop clusters break. IEEE Softw., 30(4):88-94, July 2013.
[46]
D. A. Ramos and D. Engler. Under-constrained symbolic execution: Correctness checking for real code. In Proceedings of the 24th USENIX Conference on Security Symposium, SEC'15, page 49-64, Washington, D.C., 2015.
[47]
A. Schüpbach, A. Baumann, T. Roscoe, and S. Peter. A declarative language approach to device configuration. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'11. ACM, March 2011.
[48]
Y.-Y. Su, M. Attariyan, and J. Flinn. AutoBash: Improving configuration management with operating system causality analysis. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP '07, pages 237-250, 2007.
[49]
C. Tang, T. Kooburat, P. Venkatachalam, A. Chander, Z. Wen, A. Narayanan, P. Dowell, and R. Karl. Holistic configuration management at facebook. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 328-343, Monterey, California, 2015.
[50]
H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with PeerPressure. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI'04, pages 17-17, 2004.
[51]
S. Wang, C. Li, H. Hoffmann, S. Lu, W. Sentosa, and A. I. Kistijantoro. Understanding and auto-adjusting performance-sensitive configurations. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '18, page 154-168, Williamsburg, VA, USA, 2018.
[52]
Y.-M. Wang, C. Verbowski, J. Dunagan, Y. Chen, H. J. Wang, C. Yuan, and Z. Zhang. Strider: A black-box, state-based approach to change and configuration management and support. In Proceedings of the 17th USENIX Conference on System Administration, LISA '03, pages 159-172, 2003.
[53]
X. Wei, S. Shen, R. Chen, and H. Chen. Replication-driven live reconfiguration for fast distributed transaction processing. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), ATC 17, pages 335- 347. USENIX Association, July 2017.
[54]
A. Whitaker, R. S. Cox, and S. D. Gribble. Configuration debugging as search: Finding the needle in the haystack. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI'04, pages 6-6, 2004.
[55]
B. Xi, Z. Liu, M. Raghavachari, C. H. Xia, and L. Zhang. A smart hill-climbing algorithm for application server configuration. In Proceedings of the 13th International Conference on World Wide Web, WWW '04, pages 287-296, New York, NY, USA, 2004.
[56]
T. Xu, X. Jin, P. Huang, Y. Zhou, S. Lu, L. Jin, and S. Pasupathy. Early detection of configuration errors to reduce failure damage. In Proceedings of the The 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI '16, November 2016.
[57]
T. Xu, J. Zhang, P. Huang, J. Zheng, T. Sheng, D. Yuan, Y. Zhou, and S. Pasupathy. Do not blame users for misconfigurations. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 244-259, 2013.
[58]
X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, page 283-294, San Jose, California, USA, 2011.
[59]
T. Ye and S. Kalyanaraman. A recursive random search algorithm for large-scale network parameter configuration. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '03, pages 196- 205, San Diego, CA, USA, 2003.
[60]
Z. Yin, X. Ma, J. Zheng, Y. Zhou, L. N. Bairavasundaram, and S. Pasupathy. An empirical study on configuration errors in commercial and open source systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 159-172, 2011.
[61]
D. Yuan, Y. Xie, R. Panigrahy, J. Yang, C. Verbowski, and A. Kumar. Context-based online configuration-error detection. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, ATC'11, pages 28-28, 2011.
[62]
J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu, and et al. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, page 415-432, Amsterdam, Netherlands, 2019.
[63]
J. Zhang, L. Renganarayana, X. Zhang, N. Ge, V. Bala, T. Xu, and Y. Zhou. EnCore: Exploiting system environment and correlation information for misconfiguration detection. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 687-700, 2014.
[64]
Y. Zhu, J. Liu, M. Guo, Y. Bao, W. Ma, Z. Liu, K. Song, and Y. Yang. BestConfig: Tapping the performance potential of systems via automatic configuration tuning. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC '17, page 338-350, Santa Clara, California, 2017.

Cited By

View all
  • (2023)When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via ConfigurationsProceedings of the VLDB Endowment10.14778/3587136.358714516:7(1712-1725)Online publication date: 1-Mar-2023
  • (2021)Static detection of silent misconfigurations with deep interaction analysisProceedings of the ACM on Programming Languages10.1145/34855175:OOPSLA(1-30)Online publication date: 15-Oct-2021

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
OSDI'20: Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation
November 2020
1255 pages
ISBN:978-1-939133-19-9

Sponsors

  • ORACLE
  • VMware
  • Google Inc.
  • Amazon
  • Microsoft

Publisher

USENIX Association

United States

Publication History

Published: 04 November 2020

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)7
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via ConfigurationsProceedings of the VLDB Endowment10.14778/3587136.358714516:7(1712-1725)Online publication date: 1-Mar-2023
  • (2021)Static detection of silent misconfigurations with deep interaction analysisProceedings of the ACM on Programming Languages10.1145/34855175:OOPSLA(1-30)Online publication date: 15-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media