Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3236024.3236069acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Using finite-state models for log differencing

Published: 26 October 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Much work has been published on extracting various kinds of models from logs that document the execution of running systems. In many cases, however, for example in the context of evolution, testing, or malware analysis, engineers are interested not only in a single log but in a set of several logs, each of which originated from a different set of runs of the system at hand. Then, the difference between the logs is the main target of interest.
    In this work we investigate the use of finite-state models for log differencing. Rather than comparing the logs directly, we generate concise models to describe and highlight their differences. Specifically, we present two algorithms based on the classic k-Tails algorithm: 2KDiff, which computes and highlights simple traces containing sequences of k events that belong to one log but not the other, and nKDiff, which extends k-Tails from one to many logs, and distinguishes the sequences of length k that are common to all logs from the ones found in only some of them, all on top of a single, rich model. Both algorithms are sound and complete modulo the abstraction defined by the use of k-Tails.
    We implemented both algorithms and evaluated their performance on mutated logs that we generated based on models from the literature. We conducted a user study including 60 participants demonstrating the effectiveness of the approach in log differencing tasks. We have further performed a case study to examine the use of our approach in malware analysis. Finally, we have made our work available in a prototype web-application, for experiments.

    References

    [1]
    Tool and supporting materials website. http://smlab.cs.tau.ac.il/xlog/#FSE18.
    [2]
    Diffchecker. http://www.diffchecker.com.
    [3]
    M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE), pages 25–34, 2007.
    [4]
    L. Bao, T.-D. B. Le, and D. Lo. Mining sandboxes: Are we there yet? In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 445–455, 2018.
    [5]
    V. Basili. The role of controlled experiments in software engineering research. In V. Basili, D. Rombach, K. Schneider, B. Kitchenham, D. Pfahl, and R. Selby, editors, Empirical Software Engineering Issues. Critical Assessment and Future Directions. Springer, Berlin, Heidelber, 2007.
    [6]
    I. Beschastnikh, Y. Brun, J. Abrahamson, M. D. Ernst, and A. Krishnamurthy. Unifying FSM-inference algorithms through declarative specification. In Proceedings of the 35th ACM/IEEE International Conference on Software Engineering (ICSE), pages 252–261, 2013.
    [7]
    I. Beschastnikh, Y. Brun, J. Abrahamson, M. D. Ernst, and A. Krishnamurthy. Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms. IEEE Transation on Software Engineering, 41(4):408–428, 2015.
    [8]
    I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering and the 13th European Software Engineering Conference (ESEC/FSE), pages 267–277, 2011.
    [9]
    A. W. Biermann and J. A. Feldman. On the synthesis of finite-state machines from samples of their behavior. IEEE Transactions on Computers, 21(6):592–597, June 1972.
    [10]
    N. Busany and S. Maoz. Behavioral log analysis with statistical guarantees. In Proceedings of the 38th ACM/IEEE International Conference on Software Engineering (ICSE), pages 877–887. ACM, 2016.
    [11]
    A. Classen, P. Heymans, P. Schobbens, A. Legay, and J. Raskin. Model checking lots of systems: efficient verification of temporal properties in software product lines. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE), pages 335–344, 2010.
    [12]
    N. Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3):494, 1993.
    [13]
    J. E. Cook and A. L. Wolf. Discovering models of software processes from eventbased data. ACM Transactions on Software Engineering and Methodology (TOSEM), 7(3):215–249, 1998.
    [14]
    M. El-Ramly, E. Stroulia, and P. G. Sorenson. From run-time behavior to usage scenarios: an interaction-pattern mining approach. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 315–324. ACM, 2002.
    [15]
    D. Fahland, D. Lo, and S. Maoz. Mining branching-time scenarios. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 443–453. IEEE, 2013.
    [16]
    M. Goldstein, D. Raz, and I. Segall. Experience report: Log-based behavioral differencing. In Proceedings of the 28th IEEE International Symposium on Software Reliability Engineering (ISSRE), pages 282–293. IEEE Computer Society, 2017.
    [17]
    M. Hammoudi, B. Burg, G. Bae, and G. Rothermel. On the use of delta debugging to reduce recordings and facilitate debugging of web applications. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE), pages 333–344, 2015.
    [18]
    S. Kumar, S.-C. Khoo, A. Roychoudhury, and D. Lo. Mining message sequence graphs. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 91–100, 2011.
    [19]
    C. Lee, F. Chen, and G. Rosu. Mining parametric specifications. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 591–600, 2011.
    [20]
    L. Li, D. Li, T. F. Bissyandé, J. Klein, Y. Le Traon, D. Lo, and L. Cavallaro. Understanding Android app piggybacking: A systematic study of malicious code grafting. IEEE Transactions on Information Forensics and Security (T-IFS), 12(6):1269– 1284, 2017.
    [21]
    Y. Li, Z. Yang, Y. Guo, and X. Chen. Droidbot: a lightweight UI-guided test input generator for android. In Proceedings of the 39th ACM/IEEE International Conference on Software Engineering Companion, pages 23–26. IEEE Press, 2017.
    [22]
    D. Lo and S. Maoz. Scenario-based and value-based specification mining: better together. Automated Software Engineering, 19(4):423–458, 2012.
    [23]
    D. Lo, S. Maoz, and S.-C. Khoo. Mining modal scenario-based specifications from execution traces of reactive systems. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 465–468, 2007.
    [24]
    D. Lo, L. Mariani, and M. Santoro. Learning extended FSA from software: An empirical assessment. Journal of Systems and Software, 85(9):2063–2076, 2012.
    [25]
    D. Lorenzoli, L. Mariani, and M. Pezzè. Automatic generation of software behavioral models. In Proceedings of the 30th ACM/IEEE international Conference on Software Engineering (ICSE), pages 501–510, 2008.
    [26]
    M. Mäntylä, K. Petersen, T. O. A. Lehtinen, and C. Lassenius. Time pressure: a controlled experiment of test case development and requirements review. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 83–94, 2014.
    [27]
    S. Maoz, J. O. Ringert, and B. Rumpe. ADDiff: Semantic Differencing for Activity Diagrams. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (ESEC/FSE), pages 179–189, 2011.
    [28]
    L. Mariani, F. Pastore, and M. Pezzè. Dynamic analysis for diagnosing integration faults. IEEE Transactions on Software Engineering, 37(4):486–508, 2011.
    [29]
    T. Ohmann, M. Herzberg, S. Fiss, A. Halbert, M. Palyart, I. Beschastnikh, and Y. Brun. Behavioral resource-aware model inference. In Proceedings of the ACM/IEEE International Conference on Automated Software Engineering (ASE), pages 19–30, 2014.
    [30]
    M. Pradel, P. Bichsel, and T. R. Gross. A framework for the evaluation of specification miners based on finite state machines. In Proceedings of the IEEE International Conference on the Software Maintenance (ICSM), pages 1–10, 2010.
    [31]
    M. Pradel and T. R. Gross. Automatic generation of object usage specifications from large method traces. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 371–382. IEEE Computer Society, 2009.
    [32]
    S. P. Reiss and M. Renieris. Encoding program executions. In Proceedings of the 23rd ACM/IEEE International Conference On Software Engineering (ICSE), pages 221–230, 2001.
    [33]
    I. Salman, A. T. Misirli, and N. J. Juzgado. Are students representatives of professionals in software engineering experiments? In Proceedings of the 37th IEEE/ACM International Conference on Software Engineering (ICSE), pages 666–676, 2015.
    [34]
    N. Walkinshaw and K. Bogdanov. Inferring finite-state models with temporal constraints. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 248–257. IEEE, 2008.
    [35]
    Q. Wang, Y. Brun, and A. Orso. Behavioral execution comparison: Are tests representative of field behavior? In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 321–332. IEEE Computer Society, 2017.
    [36]
    Q. Wang, C. Parnin, and A. Orso. Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of the 24th International Symposium on Software Testing and Analysis (ISSTA), pages 1–11, 2015.
    [37]
    R. Wettel, M. Lanza, and R. Robbes. Software systems as cities: a controlled experiment. In Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering (ICSE), pages 551–560, 2011.
    [38]
    X. Xie, Z. Liu, S. Song, Z. Chen, J. Xuan, and B. Xu. Revisit of automatic debugging via human focus-tracking analysis. In Proceedings of the 38th ACM/IEEE International Conference on Software Engineering (ICSE), pages 808–819, 2016.
    [39]
    J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: mining temporal API rules from imperfect traces. In Proceedings of the 28th ACM/IEEE International Conference on Software Engineering (ICSE), pages 282–291, 2006.
    [40]
    D. Zayan, M. Antkiewicz, and K. Czarnecki. Effects of using examples on structural model comprehension: a controlled experiment. In Proceedings of the 36th ACM/IEEE International Conference on Software Engineering (ICSE), pages 955–966, 2014.

    Cited By

    View all
    • (2024)Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623308(1-13)Online publication date: 20-May-2024
    • (2024)A literature review and existing challenges on software logging practicesEmpirical Software Engineering10.1007/s10664-024-10452-w29:4Online publication date: 18-Jun-2024
    • (2024)Log‐based anomaly detection for distributed systems: State of the art, industry experience, and open issuesJournal of Software: Evolution and Process10.1002/smr.2650Online publication date: 7-Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    October 2018
    987 pages
    ISBN:9781450355735
    DOI:10.1145/3236024
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. log analysis
    2. model inference

    Qualifiers

    • Research-article

    Conference

    ESEC/FSE '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly DetectionProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623308(1-13)Online publication date: 20-May-2024
    • (2024)A literature review and existing challenges on software logging practicesEmpirical Software Engineering10.1007/s10664-024-10452-w29:4Online publication date: 18-Jun-2024
    • (2024)Log‐based anomaly detection for distributed systems: State of the art, industry experience, and open issuesJournal of Software: Evolution and Process10.1002/smr.2650Online publication date: 7-Feb-2024
    • (2023)EvLog: Identifying Anomalous Logs over Software Evolution2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00018(391-402)Online publication date: 9-Oct-2023
    • (2023)SemParser: A Semantic Parser for Log Analytics2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00082(881-893)Online publication date: May-2023
    • (2023)AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00133(497-509)Online publication date: 11-Sep-2023
    • (2023)An interview study about the use of logs in embedded software engineeringEmpirical Software Engineering10.1007/s10664-022-10258-828:2Online publication date: 11-Feb-2023
    • (2022)Landscape of Automated Log Analysis: A Systematic Literature Review and Mapping StudyIEEE Access10.1109/ACCESS.2022.315254910(21892-21913)Online publication date: 2022
    • (2021)An empirical investigation of practical log anomaly detection for online service systemsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3473933(1404-1415)Online publication date: 20-Aug-2021
    • (2021)A Survey on Automated Log Analysis for Reliability EngineeringACM Computing Surveys10.1145/346034554:6(1-37)Online publication date: 13-Jul-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media