research-article

A user-guided approach to program analysis

Authors:

Aditya V. Nori,

Mayur NaikAuthors Info & Claims

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Pages 462 - 473

https://doi.org/10.1145/2786805.2786851

Published: 30 August 2015 Publication History

Abstract

Program analysis tools often produce undesirable output due to various approximations. We present an approach and a system EUGENE that allows user feedback to guide such approximations towards producing the desired output. We formulate the problem of user-guided program analysis in terms of solving a combination of hard rules and soft rules: hard rules capture soundness while soft rules capture degrees of approximations and preferences of users. Our technique solves the rules using an off-the-shelf solver in a manner that is sound (satisfies all hard rules), optimal (maximally satisfies soft rules), and scales to real-world analyses and programs. We evaluate EUGENE on two different analyses with labeled output on a suite of seven Java programs of size 131–198 KLOC. We also report upon a user study involving nine users who employ EUGENE to guide an information-flow analysis on three Java micro-benchmarks. In our experiments, EUGENE significantly reduces misclassified reports upon providing limited amounts of feedback.

References

[1]

Apache FTP Server. http://mina.apache.org/ftpserver-project/.

[2]

PJBench. https://code.google.com/p/pjbench/.

[3]

Securibench Micro. http://suif.stanford.edu/ ~livshits/work/securibench-micro/index.html.

[4]

N. Beckman and A. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, 2011.

Digital Library

[5]

S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´ c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA, 2006.

Digital Library

[6]

S. Blackshear and S. Lahiri. Almost-correct specifications: A modular semantic framework for assigning confidence to warnings. In PLDI, 2013.

Digital Library

[7]

M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In OOPSLA, 2009.

Digital Library

[8]

A. Chaganty, A. Lal, A. Nori, and S. Rajamani. Combining relational learning with SMT solvers using CEGAR. In CAV, 2013.

Digital Library

[9]

I. Dillig, T. Dillig, and A. Aiken. Automated error diagnosis using abductive inference. In PLDI, 2012.

Digital Library

[10]

P. Domingos and D. Lowd. Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009.

Digital Library

[11]

S. Guarnieri and B. Livshits. Gatekeeper: Mostly static enforcement of security and reliability policies for JavaScript code. In USENIX Security Symposium, 2009.

Digital Library

[12]

K. Hoder, N. Bjørner, and L. M. de Moura. µZ - an efficient engine for fixed points with constraints. In CAV, 2011.

Digital Library

[13]

Y. Jung, J. Kim, J. Shin, and K. Yi. Taming false alarms from a domain-unaware C analyzer by a bayesian statistical post analysis. In SAS, 2005.

Digital Library

[14]

S. Kok, M. Sumner, M. Richardson, P. Singla, H. Poon, D. Lowd, and P. Domingos. The alchemy system for statistical relational AI. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA, 2007.

[15]

T. Kremenek, K. Ashcraft, J. Yang, and D. Engler. Correlation exploitation in error ranking. In FSE, 2004.

Digital Library

[16]

T. Kremenek and D. Engler. Z-ranking: Using statistical analysis to counter the impact of static analysis approximations. In SAS, 2003.

Digital Library

[17]

T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In OSDI, 2006.

Digital Library

[18]

W. Le and M. L. Soffa. Path-based fault correlations. In FSE, 2010.

Digital Library

[19]

W. Lee, W. Lee, and K. Yi. Sound non-statistical clustering of static analysis alarms. In VMCAI, 2012.

Digital Library

[20]

O. Lhoták. Spark: A flexible points-to analysis framework for Java. Master’s thesis, McGill University, 2002.

[21]

O. Lhoták and L. Hendren. Jedd: a BDD-based relational extension of Java. In PLDI, 2004.

Digital Library

[22]

O. Lhoták and L. Hendren. Context-sensitive points-to analysis: is it worth it? In CC, 2006.

Digital Library

[23]

B. Livshits and M. Lam. Finding security vulnerabilities in Java applications with static analysis. In USENIX Security Symposium, 2005.

Digital Library

[24]

B. Livshits, A. Nori, S. Rajamani, and A. Banerjee. Merlin: specification inference for explicit information flow problems. In PLDI, 2009.

Digital Library

[25]

B. Livshits, J. Whaley, and M. S. Lam. Reflection analysis for Java. In APLAS, 2005.

Digital Library

[26]

R. Mangal, X. Zhang, M. Naik, and A. Nori. Solving weighted constraints with applications to program analysis. http://hdl.handle.net/1853/53191, 2015.

[27]

M. Martin, B. Livshits, and M. Lam. Finding application errors and security flaws using PQL: a program query language. In OOPSLA, 2005.

Digital Library

[28]

A. Milanova, A. Rountev, and B. G. Ryder. Parameterized object sensitivity for points-to analysis for Java. ACM TOSEM, 14(1), 2005.

Digital Library

[29]

M. Naik. Chord: A program analysis platform for Java. http://jchord.googlecode.com/.

[30]

M. Naik, A. Aiken, and J. Whaley. Effective static race detection for Java. In PLDI, 2006.

Digital Library

[31]

M. Naik, C.-S. Park, K. Sen, and D. Gay. Effective static deadlock detection. In ICSE, 2009.

Digital Library

[32]

S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically classifying benign and harmful data races using replay analysis. In PLDI, 2007.

Digital Library

[33]

F. Niu, C. Ré, A. Doan, and J. W. Shavlik. Tuffy: Scaling up statistical inference in markov logic networks using an RDBMS. In VLDB, 2011.

Digital Library

[34]

J. Noessner, M. Niepert, and H. Stuckenschmidt. RockIt: Exploiting parallelism and symmetry for MAP inference in statistical relational models. In AAAI, 2013.

Digital Library

[35]

C. H. Papadimitriou. Computational complexity. Addison-Wesley, 1994.

[36]

E. I. Psallida. Relational representation of the LLVM intermediate language. B.S. Thesis, University of Athens, Jan. 2014.

[37]

S. Riedel. Improving the accuracy and efficiency of MAP inference for Markov Logic. In UAI, 2008.

[38]

P. Singla and P. Domingos. Discriminative training of markov logic networks. In AAAI, 2005.

Digital Library

[39]

Y. Smaragdakis and M. Bravenboer. Using Datalog for fast and easy program analysis. In Datalog 2.0 Workshop, 2010.

Digital Library

[40]

Y. Smaragdakis, M. Bravenboer, and O. Lhoták. Pick your contexts well: Understanding object-sensitivity. In POPL, 2013.

Digital Library

[41]

Y. Smaragdakis, G. Kastrinis, and G. Balatsouras. Introspective analysis: context-sensitivity, across the board. In PLDI, 2014.

Digital Library

[42]

M. Sridharan and R. Bod´ık. Refinement-based context-sensitive points-to analysis for Java. In PLDI, 2006.

Digital Library

[43]

J. Whaley, D. Avots, M. Carbin, and M. Lam. Using Datalog with binary decision diagrams for program analysis. In APLAS, 2005.

Digital Library

[44]

J. Whaley and M. Lam. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In PLDI, 2004.

Digital Library

[45]

X. Zhang, R. Mangal, R. Grigore, M. Naik, and H. Yang. On abstraction refinement for program analyses in Datalog. In PLDI, 2014.

Digital Library

Cited By

Zhang YShi YZhang X(2024)Learning Abstraction Selection for Bayesian Program AnalysisProceedings of the ACM on Programming Languages10.1145/36498458:OOPSLA1(954-982)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649845
Mir AKeshani MProksch SSpinellis DConstantinou EBacchelli A(2024)On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644897(457-468)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644897
Bhuiyan MChandra SBlincoe KTonella P(2023)The Call Graph Chronicles: Unleashing the Power WithinProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3617854(2210-2212)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3617854
Show More Cited By

Index Terms

A user-guided approach to program analysis

Recommendations

Parametrizing Program Analysis
TASE '14: Proceedings of the 2014 Theoretical Aspects of Software Engineering Conference (tase 2014)

A parametric analysis is an analysis whose input and output are parametrized with a number of parameters which can be instantiated to abstract properties after analysis is completed. We use Cousot and Cousot's Cardinal power domain to capture ...
Interprocedural pointer alias analysis

We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework ...
Control-flow analysis of dynamic languages via pointer analysis
DLS 2015: Proceedings of the 11th Symposium on Dynamic Languages

We demonstrate how to map a control-flow analysis for a higher-order language (dynamic languages are typically higher-order) into a pointer analysis for a first-order language, such as C. This allows us to use existing pointer analysis tools to perform ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

August 2015

1068 pages

ISBN:9781450336758

DOI:10.1145/2786805

General Chair:
Elisabetta Di Nitto
Politecnico di Milano, Italy
,
Program Chairs:
Mark Harman
University College London, UK
,
Patrick Heymans
University of Namur, Belgium

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ESEC/FSE'15

Sponsor:

SIGSOFT

ESEC/FSE'15: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

August 30 - September 4, 2015

Bergamo, Italy

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
521
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)3

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YShi YZhang X(2024)Learning Abstraction Selection for Bayesian Program AnalysisProceedings of the ACM on Programming Languages10.1145/36498458:OOPSLA1(954-982)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649845
Mir AKeshani MProksch SSpinellis DConstantinou EBacchelli A(2024)On the Effectiveness of Machine Learning-based Call Graph Pruning: An Empirical StudyProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644897(457-468)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643991.3644897
Bhuiyan MChandra SBlincoe KTonella P(2023)The Call Graph Chronicles: Unleashing the Power WithinProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3617854(2210-2212)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3617854
Guo ZTan TLiu SLiu XLai WYang YLi YChen LDong WZhou Y(2023)Mitigating False Positive Static Analysis Warnings: Progress, Challenges, and OpportunitiesIEEE Transactions on Software Engineering10.1109/TSE.2023.332966749:12(5154-5188)Online publication date: Dec-2023
https://doi.org/10.1109/TSE.2023.3329667
Wang XZhao L(2023)APICAD: Augmenting API Misuse Detection through Specifications from Code and Documents2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00032(245-256)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00032
Chen XWang BJin ZFeng YLi XFeng XLiu Q(2023)Tabby: Automated Gadget Chain Detection for Java Deserialization Vulnerabilities2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN58367.2023.00028(179-192)Online publication date: Jun-2023
https://doi.org/10.1109/DSN58367.2023.00028
Liu PLu YYang WPan M(2023)VALAR: Streamlining Alarm Ranking in Static Analysis with Value-Flow Assisted Active Learning2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00098(1940-1951)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00098
Choi YNam J(2023)WINEInformation and Software Technology10.1016/j.infsof.2022.107109155:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.infsof.2022.107109
Le-Cong TKang HNguyen THaryono SLo DLe XHuynh QRoychoudhury ACadar CKim M(2022)AutoPruner: transformer-based call graph pruningProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549175(520-532)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549175
Utture ALiu SKalhauge CPalsberg JDwyer MDamian DZeller A(2022)Striking a balanceProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510166(2043-2055)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510166
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten