Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Studying and detecting log-related issues

Published: 01 December 2018 Publication History

Abstract

Logs capture valuable information throughout the execution of software systems. The rich knowledge conveyed in logs is highly leveraged by researchers and practitioners in performing various tasks, both in software development and its operation. Log-related issues, such as missing or having outdated information, may have a large impact on the users who depend on these logs. In this paper, we first perform an empirical study on log-related issues in two large-scale, open source software systems. We find that the files with log-related issues have undergone statistically significantly more frequent prior changes, and bug fixes. We also find that developers fixing these log-related issues are often not the ones who introduced the logging statement nor the owner of the method containing the logging statement. Maintaining logs is more challenging without clear experts. Finally, we find that most of the defective logging statements remain unreported for a long period (median 320 days). Once reported, the issues are fixed quickly (median five days). Our empirical findings suggest the need for automated tools that can detect log-related issues promptly. We conducted a manual study and identified seven root-causes of the log-related issues. Based on these root causes, we developed an automated tool that detects four evident types of log-related issues. Our tool can detect 75 existing inappropriate logging statements reported in 40 log-related issues. We also reported new issues found by our tool to developers and 38 previously unknown issues in the latest release of the subject systems were accepted by developers.

References

[1]
Baker C, Wuest J, Stern PN (1992) Method slurring: the grounded theory/phenomenology example. J Adv Nurs 17(11):1355-1360.
[2]
Barik T, DeLine R, Drucker S, Fisher D (2016) The bones of the system: A case study of logging and telemetry at microsoft. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp 92-101.
[3]
Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality?: an empirical case study of windows vista. Commun ACM 52(8):85-93.
[4]
Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don't touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, pp 4-14.
[5]
Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of CCA. vol 8, pp 1-5.
[6]
Carasso D (2012) Exploring Splunk. CIT, Croatia.
[7]
Chen TH, Shang W, Jiang ZM, Hassan AE, Nasser M, Flora P (2014) Detecting performance anti-patterns for applications developed using object-relational mapping. In: Proceedings of the 36th International Conference on Software Engineering. ACM, pp 1001-1012.
[8]
Chen TH, Shang W, Hassan AE, Nasser M, Flora P (2016) Detecting problems in the database access code of large scale systems - an industrial experience report. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp 71-80.
[9]
Chen B, Jiang ZMJ (2017) Characterizing and detecting anti-patterns in the logging code. In: 2017 IEEE/ACM 39th IEEE International Conference on Software Engineering (ICSE). IEEE.
[10]
Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A cost-aware logging mechanism for performance diagnosis. In: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC '15. USENIX Association, Berkeley, pp 139-150. http://dl.acm.org/citation.cfm?id=2813767.2813778
[11]
Dmitrienko A, Molenberghs G, Chuang-Stein C, Offen WW (2005) Analysis of clinical trials using SAS: A practical guide. SAS Institute.
[12]
Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G (2012) Logmaster: Mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed Systems, SRDS '12. IEEE Computer Society, Washington, pp 71-80.
[13]
Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering. ACM, pp 24-33.
[14]
Gousios G (2017) java-callgraph: Java Call Graph Utilities. https://github.com/gousiosg/java-callgraph
[15]
Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, ICSE '09. IEEE Computer Society, Washington, pp 78-88.
[16]
Hen I (2017) GitHub Research:Over 50% of Java Logging Statements Are Written Wrong. https://goo.gl/4Tp1nr/
[17]
Herraiz I, Robles G, Gonzalez-Barahona JM, Capiluppi A, Ramil JF (2006) Comparison between slocs and number of files as size metrics for software evolution analysis. In: Conference on Software Maintenance and Reengineering (CSMR'06). pp 8, pp-213.
[18]
Herraiz I, Hassan AE (2010) Beyond lines of code: Do we need more complexity metrics?. Making software: what really works, and why we believe it, pp 125-141.
[19]
Kabinna S, Bezemer CP, Shang W, Hassan AE (2016) Logging library migrations: A case study for the apache software foundation projects. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR '16. ACM, New York, pp 154-164.
[20]
Kabinna S, Shang W, Bezemer CP, Hassan AE (2016) Examining the stability of logging statements. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). vol 1, pp 326-337,
[21]
Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) Systematic review: A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11-12):1073-1086.
[22]
Kernighan BW, Pike R (1999) The Practice of Programming. Addison-Wesley Longman Publishing Co., Inc., Boston.
[23]
Yao K, de Pádua GB, Shang W, Sporea S, Toma A, Sajedi S (2018) Log4perf: Suggesting logging locations for web-based systems' performance monitoring. In: Proceedings of the 9th ACM/SPEC on International Conference on Performance Engineering, ICPE '18. ACM, New York.
[24]
Li H, Shang W, Hassan AE (2017) Which log level should developers choose for a new logging statement? Empir Softw Engg 22(4):1684-1716.
[25]
Li H, Shang W, Zou Y, Hassan EA (2017) Towards just-in-time suggestions for log changes. Empir Softw Engg 22(4):1831-1865.
[26]
Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE '13. IEEE Press, Piscataway, pp 1012-1021. http://dl.acm.org/citation.cfm?id=2486788.2486927
[27]
Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality. In: Software Engineering, ACM/IEEE 30th International Conference on 2008. ICSE'08. IEEE, pp 521-530.
[28]
Nagaraj K, Killian C, Neville J (2012) Structured comparative analysis of systems logs to diagnose performance problems. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12. USENIX Association, Berkeley, pp 26-26. http://dl.acm.org/citation.cfm?id=2228298.2228334
[29]
Oliner AJ, Aiken A (2011) Online detection of multi-component interactions in production systems. In: Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems& Networks, DSN '11. IEEE Computer Society, Washington, pp 49-60.
[30]
Oliner A, Ganapathi A, XuW(2012) Advances and challenges in log analysis. Commun ACM 55(2):55-61.
[31]
Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2011) An exploratory study of the evolution of communicated information about the execution of large software systems. In: Proceedings of the 18th Working Conference on Reverse Engineering, WCRE '11, pp 335-344.
[32]
Shang W, Nagappan M, Hassan AE, Jiang ZM (2014) Understanding log lines using development knowledge. In: Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, ICSME '14. IEEE Computer Society,Washington, pp 21-30.
[33]
Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw: Evol Process 26(1):3-26.
[34]
Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1-27.
[35]
Stol KJ, Ralph P, Fitzgerald B (2016) Grounded theory in software engineering research: A critical review and guidelines. In: Proceedings of the 38th International Conference on Software Engineering, ICSE '16. ACM, New York, pp 120-131.
[36]
Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: Proceedings of the First USENIX Conference on Analysis of System Logs, WASL'08. USENIX Association, Berkeley, pp 6-6. http://dl.acm.org/citation.cfm?id=1855886.1855892
[37]
The Open Source Elastic Stack (2017) https://www.elastic.co/products/
[38]
Wilcoxon F, Wilcox RA (1964) Some rapid approximate statistical procedures. Lederle Laboratories.
[39]
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09. ACM, New York, pp 117-132.
[40]
Yin RK (2013) Case study research: Design and methods. Sage publications.
[41]
Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: Error diagnosis by connecting clues from run-time logs. In: Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV. ACM, New York, pp 143-154.
[42]
Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012a) Be conservative: Enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12. USENIX Association, Berkeley, pp 293-306. http://dl.acm.org/citation.cfm?id=2387880.2387909
[43]
Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering, ICSE '12. IEEE Press, Piscataway, pp 102- 112. http://dl.acm.org/citation.cfm?id=2337223.2337236
[44]
Yuan D, Zheng J, Park S, Zhou Y, Savage S (2012c) Improving software diagnosability via log enhancement. ACM Trans Comput Syst 30(1):4:1-4:28.
[45]
Yuan D, Zheng J, Park S, Zhou Y, Savage S (2012d) Improving software diagnosability via log enhancement. ACM Trans Comput Syst (TOCS) 30(1):4.
[46]
Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009. ICSM 2009. IEEE International Conference on Software Maintenance. IEEE, pp 274-283.
[47]
Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE '15. IEEE Press, Piscataway, pp 415-425. http://dl.acm.org/citation.cfm?id=2818754.2818807

Cited By

View all
  • (2024)Segmentation of wood CT images for internal defects detection based on CNNComputers and Electronics in Agriculture10.1016/j.compag.2024.109244224:COnline publication date: 1-Sep-2024
  • (2023) LoGenText-Plus: Improving Neural Machine Translation Based Logging Texts Generation with Syntactic TemplatesACM Transactions on Software Engineering and Methodology10.1145/362474033:2(1-45)Online publication date: 22-Dec-2023
  • (2023)Adonis: Practical and Efficient Control Flow Recovery through OS-level TracesACM Transactions on Software Engineering and Methodology10.1145/360718733:1(1-27)Online publication date: 4-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 23, Issue 6
December 2018
749 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 December 2018

Author Tags

  1. Empirical study
  2. Log
  3. Mining software repositories
  4. Software bug

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Segmentation of wood CT images for internal defects detection based on CNNComputers and Electronics in Agriculture10.1016/j.compag.2024.109244224:COnline publication date: 1-Sep-2024
  • (2023) LoGenText-Plus: Improving Neural Machine Translation Based Logging Texts Generation with Syntactic TemplatesACM Transactions on Software Engineering and Methodology10.1145/362474033:2(1-45)Online publication date: 22-Dec-2023
  • (2023)Adonis: Practical and Efficient Control Flow Recovery through OS-level TracesACM Transactions on Software Engineering and Methodology10.1145/360718733:1(1-27)Online publication date: 4-Jul-2023
  • (2023)On the Temporal Relations between Logging and CodeProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00079(843-854)Online publication date: 14-May-2023
  • (2023)Development of a method for processing log files using clusteringSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-07740-227:3(1617-1628)Online publication date: 1-Feb-2023
  • (2022)QuLogProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527906(275-286)Online publication date: 16-May-2022
  • (2022)Automated evolution of feature logging statement levels using Git histories and degree of interestScience of Computer Programming10.1016/j.scico.2021.102724214:COnline publication date: 1-Feb-2022
  • (2022)Studying logging practice in test codeEmpirical Software Engineering10.1007/s10664-022-10139-027:4Online publication date: 7-Apr-2022
  • (2022)The sense of logging in the Linux kernelEmpirical Software Engineering10.1007/s10664-022-10136-327:6Online publication date: 6-Aug-2022
  • (2021)A Survey of Software Log InstrumentationACM Computing Surveys10.1145/344897654:4(1-34)Online publication date: 3-May-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media