research-article

Using finite-state models for log differencing

Authors:

Shahar MaozAuthors Info & Claims

ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 49 - 59

https://doi.org/10.1145/3236024.3236069

Published: 26 October 2018 Publication History

Abstract

Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest.

In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails.

We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments.

References

[1]

Tool and supporting materials website. http://smlab.cs.tau.ac.il/xlog/#FSE18.

[2]

Diffchecker. http://www.diffchecker.com.

[3]

M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE), pages 25–34, 2007.

Digital Library

[4]

L. Bao, T.-D. B. Le, and D. Lo. Mining sandboxes: Are we there yet? In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 445–455, 2018.

[5]

V. Basili. The role of controlled experiments in software engineering research. In V. Basili, D. Rombach, K. Schneider, B. Kitchenham, D. Pfahl, and R. Selby, editors, Empirical Software Engineering Issues. Critical Assessment and Future Directions. Springer, Berlin, Heidelber, 2007.

Digital Library

[6]

I. Beschastnikh, Y. Brun, J. Abrahamson, M. D. Ernst, and A. Krishnamurthy. Unifying FSM-inference algorithms through declarative specification. In Proceedings of the 35th ACM/IEEE International Conference on Software Engineering (ICSE), pages 252–261, 2013.

Digital Library

[7]

I. Beschastnikh, Y. Brun, J. Abrahamson, M. D. Ernst, and A. Krishnamurthy. Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms. IEEE Transation on Software Engineering, 41(4):408–428, 2015.

[8]

I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering and the 13th European Software Engineering Conference (ESEC/FSE), pages 267–277, 2011.

Digital Library

[9]

A. W. Biermann and J. A. Feldman. On the synthesis of finite-state machines from samples of their behavior. IEEE Transactions on Computers, 21(6):592–597, June 1972.

Digital Library

[10]

N. Busany and S. Maoz. Behavioral log analysis with statistical guarantees. In Proceedings of the 38th ACM/IEEE International Conference on Software Engineering (ICSE), pages 877–887. ACM, 2016.

Digital Library

[11]

A. Classen, P. Heymans, P. Schobbens, A. Legay, and J. Raskin. Model checking lots of systems: efficient verification of temporal properties in software product lines. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE), pages 335–344, 2010.

Digital Library

[12]

N. Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3):494, 1993.

[13]

J. E. Cook and A. L. Wolf. Discovering models of software processes from eventbased data. ACM Transactions on Software Engineering and Methodology (TOSEM), 7(3):215–249, 1998.

Digital Library

[14]

M. El-Ramly, E. Stroulia, and P. G. Sorenson. From run-time behavior to usage scenarios: an interaction-pattern mining approach. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 315–324. ACM, 2002.

Digital Library

[15]

D. Fahland, D. Lo, and S. Maoz. Mining branching-time scenarios. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 443–453. IEEE, 2013.

Digital Library

[16]

M. Goldstein, D. Raz, and I. Segall. Experience report: Log-based behavioral differencing. In Proceedings of the 28th IEEE International Symposium on Software Reliability Engineering (ISSRE), pages 282–293. IEEE Computer Society, 2017.

[17]

M. Hammoudi, B. Burg, G. Bae, and G. Rothermel. On the use of delta debugging to reduce recordings and facilitate debugging of web applications. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pages 333–344, 2015.

Digital Library

[18]

S. Kumar, S.-C. Khoo, A. Roychoudhury, and D. Lo. Mining message sequence graphs. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 91–100, 2011.

Digital Library

[19]

C. Lee, F. Chen, and G. Rosu. Mining parametric specifications. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 591–600, 2011.

Digital Library

[20]

L. Li, D. Li, T. F. Bissyandé, J. Klein, Y. Le Traon, D. Lo, and L. Cavallaro. Understanding Android app piggybacking: A systematic study of malicious code grafting. IEEE Transactions on Information Forensics and Security (T-IFS), 12(6):1269– 1284, 2017.

Digital Library

[21]

Y. Li, Z. Yang, Y. Guo, and X. Chen. Droidbot: a lightweight UI-guided test input generator for android. In Proceedings of the 39th ACM/IEEE International Conference on Software Engineering Companion, pages 23–26. IEEE Press, 2017.

Digital Library

[22]

D. Lo and S. Maoz. Scenario-based and value-based specification mining: better together. Automated Software Engineering, 19(4):423–458, 2012.

Digital Library

[23]

D. Lo, S. Maoz, and S.-C. Khoo. Mining modal scenario-based specifications from execution traces of reactive systems. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 465–468, 2007.

Digital Library

[24]

D. Lo, L. Mariani, and M. Santoro. Learning extended FSA from software: An empirical assessment. Journal of Systems and Software, 85(9):2063–2076, 2012.

Digital Library

[25]

D. Lorenzoli, L. Mariani, and M. Pezzè. Automatic generation of software behavioral models. In Proceedings of the 30th ACM/IEEE international Conference on Software Engineering (ICSE), pages 501–510, 2008.

Digital Library

[26]

M. Mäntylä, K. Petersen, T. O. A. Lehtinen, and C. Lassenius. Time pressure: a controlled experiment of test case development and requirements review. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 83–94, 2014.

Digital Library

[27]

S. Maoz, J. O. Ringert, and B. Rumpe. ADDiff: Semantic Differencing for Activity Diagrams. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE), pages 179–189, 2011.

Digital Library

[28]

L. Mariani, F. Pastore, and M. Pezzè. Dynamic analysis for diagnosing integration faults. IEEE Transactions on Software Engineering, 37(4):486–508, 2011.

Digital Library

[29]

T. Ohmann, M. Herzberg, S. Fiss, A. Halbert, M. Palyart, I. Beschastnikh, and Y. Brun. Behavioral resource-aware model inference. In Proceedings of the ACM/IEEE International Conference on Automated Software Engineering (ASE), pages 19–30, 2014.

Digital Library

[30]

M. Pradel, P. Bichsel, and T. R. Gross. A framework for the evaluation of specification miners based on finite state machines. In Proceedings of the IEEE International Conference on the Software Maintenance (ICSM), pages 1–10, 2010.

Digital Library

[31]

M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 371–382. IEEE Computer Society, 2009.

Digital Library

[32]

S. P. Reiss and M. Renieris. Encoding program executions. In Proceedings of the 23rd ACM/IEEE International Conference On Software Engineering (ICSE), pages 221–230, 2001.

Digital Library

[33]

I. Salman, A. T. Misirli, and N. J. Juzgado. Are students representatives of professionals in software engineering experiments? In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), pages 666–676, 2015.

Digital Library

[34]

N. Walkinshaw and K. Bogdanov. Inferring finite-state models with temporal constraints. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 248–257. IEEE, 2008.

Digital Library

[35]

Q. Wang, Y. Brun, and A. Orso. Behavioral execution comparison: Are tests representative of field behavior? In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 321–332. IEEE Computer Society, 2017.

[36]

Q. Wang, C. Parnin, and A. Orso. Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of the 24th International Symposium on Software Testing and Analysis (ISSTA), pages 1–11, 2015.

Digital Library

[37]

R. Wettel, M. Lanza, and R. Robbes. Software systems as cities: a controlled experiment. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 551–560, 2011.

Digital Library

[38]

X. Xie, Z. Liu, S. Song, Z. Chen, J. Xuan, and B. Xu. Revisit of automatic debugging via human focus-tracking analysis. In Proceedings of the 38th ACM/IEEE International Conference on Software Engineering (ICSE), pages 808–819, 2016.

Digital Library

[39]

J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: mining temporal API rules from imperfect traces. In Proceedings of the 28th ACM/IEEE International Conference on Software Engineering (ICSE), pages 282–291, 2006.

Digital Library

[40]

D. Zayan, M. Antkiewicz, and K. Czarnecki. Effects of using examples on structural model comprehension: a controlled experiment. In Proceedings of the 36th ACM/IEEE International Conference on Software Engineering (ICSE), pages 955–966, 2014.

Digital Library

Cited By

Yu BYao JFu QZhong ZXie HWu YMa YHe PRoychoudhury APaiva AAbreu RStorey M(2024)Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623308(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623308
Batoun MSayagh MAghili ROuni ALi H(2024)A literature review and existing challenges on software logging practicesEmpirical Software Engineering10.1007/s10664-024-10452-w29:4Online publication date: 18-Jun-2024
https://doi.org/10.1007/s10664-024-10452-w
Wei XWang JSun CTowey DZhang SZuo WYu YRuan RSong G(2024)Log‐based anomaly detection for distributed systems: State of the art, industry experience, and open issuesJournal of Software: Evolution and Process10.1002/smr.2650Online publication date: 7-Feb-2024
https://doi.org/10.1002/smr.2650
Show More Cited By

Index Terms

Using finite-state models for log differencing
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Robust log-based anomaly detection on unstable log data
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event ...
A Survey on Automated Log Analysis for Reliability Engineering
Invited Tutorial

Logs are semi-structured text generated by logging statements in software source code. In recent decades, software logs have become imperative in the reliability assurance mechanism of many software systems, because they are often the only data ...
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Logs are important in modern software development with runtime information. Log parsing is the first step in many log-based analyses, that involve extracting structured information from unstructured log data. Traditional log parsers face challenges in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

October 2018

987 pages

ISBN:9781450355735

DOI:10.1145/3236024

General Chair:
Gary T. Leavens
University of Central Florida, USA
,
Program Chairs:
Alessandro Garcia
PUC-Rio, Brazil
,
Corina S. Păsăreanu
NASA Ames Research Center, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '18

Sponsor:

SIGSOFT

ESEC/FSE '18: 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 4 - 9, 2018

FL, Lake Buena Vista, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
587
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu BYao JFu QZhong ZXie HWu YMa YHe PRoychoudhury APaiva AAbreu RStorey M(2024)Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623308(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3623308
Batoun MSayagh MAghili ROuni ALi H(2024)A literature review and existing challenges on software logging practicesEmpirical Software Engineering10.1007/s10664-024-10452-w29:4Online publication date: 18-Jun-2024
https://doi.org/10.1007/s10664-024-10452-w
Wei XWang JSun CTowey DZhang SZuo WYu YRuan RSong G(2024)Log‐based anomaly detection for distributed systems: State of the art, industry experience, and open issuesJournal of Software: Evolution and Process10.1002/smr.2650Online publication date: 7-Feb-2024
https://doi.org/10.1002/smr.2650
Huo YLee CSu YShan SLiu JLyu M(2023)EvLog: Identifying Anomalous Logs over Software Evolution2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00018(391-402)Online publication date: 9-Oct-2023
https://doi.org/10.1109/ISSRE59848.2023.00018
Huo YSu YLee CLyu M(2023)SemParser: A Semantic Parser for Log Analytics2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00082(881-893)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00082
Huo YLi YSu YHe PXie ZLyu M(2023)AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00133(497-509)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00133
Yang NCuijpers PHendriks DSchiffelers RLukkien JSerebrenik A(2023)An interview study about the use of logs in embedded software engineeringEmpirical Software Engineering10.1007/s10664-022-10258-828:2Online publication date: 11-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10258-8
Korzeniowski LGoczyla K(2022)Landscape of Automated Log Analysis: A Systematic Literature Review and Mapping StudyIEEE Access10.1109/ACCESS.2022.315254910(21892-21913)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3152549
Zhao NWang HLi ZPeng XWang GPan ZWu YFeng ZWen XZhang WSui KPei DSpinellis DGousios GChechik MDi Penta M(2021)An empirical investigation of practical log anomaly detection for online service systemsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3473933(1404-1415)Online publication date: 20-Aug-2021
https://dl.acm.org/doi/10.1145/3468264.3473933
He SHe PChen ZYang TSu YLyu M(2021)A Survey on Automated Log Analysis for Reliability EngineeringACM Computing Surveys10.1145/346034554:6(1-37)Online publication date: 13-Jul-2021
https://dl.acm.org/doi/10.1145/3460345
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents