Programming distributed applications free from communication deadlocks and race conditions is com... more Programming distributed applications free from communication deadlocks and race conditions is complex. Preserving these properties when applications are updated at runtime is even harder. We present a choreographic approach for programming updatable, distributed applications. We define a choreography language, called Dynamic Interaction-Oriented Choreography (DIOC), that allows the programmer to specify, from a global viewpoint, which parts of the application can be updated. At runtime, these parts may be replaced by new DIOC fragments from outside the application. DIOC programs are compiled, generating code for each participant in a process-level language called Dynamic Process-Oriented Choreographies (DPOC). We prove that DPOC distributed applications generated from DIOC specifications are deadlock free and race free and that these properties hold also after any runtime update. We instantiate the theoretical model above into a programming framework called Adaptable Interaction-Ori...
Softwarewatermarking is a defence technique used to prevent software piracy by embedding a signat... more Softwarewatermarking is a defence technique used to prevent software piracy by embedding a signature in the code. When an illegal copy is made, the ownership can be claimed by extracting the signature. The signature has to be hidden inside the code and it has to be difficult for an attacker to detect, tamper or remove it. In this paper we show how the ability of the attacker to identify the signature can bemodelled in the framework of abstract interpretation as a completeness property. We view attackers as abstract interpreters that can precisely observe only the properties for which they are complete. In this setting, hiding a signature in the code corresponds to insert it in terms of a semantic property that can be retrieved only by attackers that are complete for it. Indeed, any abstract interpreter that is not complete for the property specifying the signature cannot detect, tamper or remove it.
Static and dynamic program analyses attempt to extract useful information on program’s behaviours... more Static and dynamic program analyses attempt to extract useful information on program’s behaviours. Static analysis uses an abstract model of programs to reason on their runtime behaviour without actually running them, while dynamic analysis reasons on a test set of real program executions. For this reason, the precision of static analysis is limited by the presence of false positives (executions allowed by the abstract model that cannot happen at runtime), while the precision of dynamic analysis is limited by the presence of false negatives (real executions that are not in the test set). Researchers have developed many analysis techniques and tools in the attempt to increase the precision of program verification. Software protection is an interesting scenario where programs need to be protected from adversaries that use program analysis to understand their inner working and then exploit this knowledge to perform some illicit actions. Program analysis plays a dual role in program ver...
Dynamic program analysis is extremely successful both in code debugging and in malicious code att... more Dynamic program analysis is extremely successful both in code debugging and in malicious code attacks. Fuzzing, concolic, and monkey testing are instances of the more general problem of analysing programs by dynamically executing their code with selected inputs. While static program analysis has a beautiful and well established theoretical foundation in abstract interpretation, dynamic analysis still lacks such a foundation. In this paper, we introduce a formal model for understanding the notion of precision in dynamic program analysis. It is known that in sound-by-construction static program analysis the precision amounts to completeness. In dynamic analysis, which is inherently unsound, precision boils down to a notion of coverage of execution traces with respect to what the observer (attacker or debugger) can effectively observe about the computation. We introduce a topological characterisation of the notion of coverage relatively to a given (fixed) observation for dynamic progra...
Dynamic languages often employ reflection primitives to turn dynamically generated text into exec... more Dynamic languages often employ reflection primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyse, are themselves dynamically mutating objects. We introduce SEA, an abstract interpreter for automatic sound string executability analysis of dynamic languages employing bounded (i.e, finitely nested) reflection and dynamic code generation. Strings are statically approximated in an abstract domain of finite state automata with basic operations implemented as symbolic transducers. SEA combines standard program analysis together with string executability analysis. The analysis of a call to reflection determines a call to the same abstract interpreter over a code which is synthesised directly from the result of the static string executability analy...
Journal of Computer Virology and Hacking Techniques, 2021
In the past few years, malware classification techniques have shifted from shallow traditional ma... more In the past few years, malware classification techniques have shifted from shallow traditional machine learning models to deeper neural network architectures. The main benefit of some of these is the ability to work with raw data, guaranteed by their automatic feature extraction capabilities. This results in less technical expertise needed while building the models, thus less initial pre-processing resources. Nevertheless, such advantage comes with its drawbacks, since deep learning models require huge quantities of data in order to generate a model that generalizes well. The amount of data required to train a deep network without overfitting is often unobtainable for malware analysts. We take inspiration from image-based data augmentation techniques and apply a sequence of semantics-preserving syntactic code transformations (obfuscations) to a small dataset of programs to generate a larger dataset. We then design two learning models, a convolutional neural network and a bi-directio...
Journal of Computer Virology and Hacking Techniques, 2021
Metamorphic malware are self-modifying programs which apply semantic preserving transformations t... more Metamorphic malware are self-modifying programs which apply semantic preserving transformations to their own code in order to foil detection systems based on signature matching. Metamorphism impacts both software security and code protection technologies: it is used by malware writers to evade detection systems based on pattern matching and by software developers for preventing malicious host attacks through software diversification. In this paper, we consider the problem of automatically extracting metamorphic signatures from the analysis of metamorphic malware variants. We define a metamorphic signature as an abstract program representation that ideally captures all the possible code variants that might be generated during the execution of a metamorphic program. For this purpose, we developed MetaSign: a tool that takes as input a collection of metamorphic code variants and produces, as output, a set of transformation rules that could have been used to generate the considered meta...
Obfuscation is the art of making code hard to reverse engineer and understand. In this paper, we ... more Obfuscation is the art of making code hard to reverse engineer and understand. In this paper, we propose a formal model for specifying and understanding the strength of obfuscating transformations with respect to a given attack model. The idea is to consider the attacker as an abstract interpreter willing to extract information about the program’s semantics. In this scenario, we show that obfuscating code is making the analysis imprecise, namely making the corresponding abstract domain incomplete. It is known that completeness is a property of the abstract domain and the program to analyse. We introduce a framework for transforming abstract domains, i.e., analyses, towards incompleteness. The family of incomplete abstractions for a given program provides a characterisation of the potency of obfuscation employed in that program, i.e., its strength against the attack specified by those abstractions. We show this characterisation for known obfuscating transformations used to inhibit pr...
Programming distributed applications free from communication deadlocks and race conditions is com... more Programming distributed applications free from communication deadlocks and race conditions is complex. Preserving these properties when applications are updated at runtime is even harder. We present a choreographic approach for programming updatable, distributed applications. We define a choreography language, called Dynamic Interaction-Oriented Choreography (DIOC), that allows the programmer to specify, from a global viewpoint, which parts of the application can be updated. At runtime, these parts may be replaced by new DIOC fragments from outside the application. DIOC programs are compiled, generating code for each participant in a process-level language called Dynamic Process-Oriented Choreographies (DPOC). We prove that DPOC distributed applications generated from DIOC specifications are deadlock free and race free and that these properties hold also after any runtime update. We instantiate the theoretical model above into a programming framework called Adaptable Interaction-Ori...
Softwarewatermarking is a defence technique used to prevent software piracy by embedding a signat... more Softwarewatermarking is a defence technique used to prevent software piracy by embedding a signature in the code. When an illegal copy is made, the ownership can be claimed by extracting the signature. The signature has to be hidden inside the code and it has to be difficult for an attacker to detect, tamper or remove it. In this paper we show how the ability of the attacker to identify the signature can bemodelled in the framework of abstract interpretation as a completeness property. We view attackers as abstract interpreters that can precisely observe only the properties for which they are complete. In this setting, hiding a signature in the code corresponds to insert it in terms of a semantic property that can be retrieved only by attackers that are complete for it. Indeed, any abstract interpreter that is not complete for the property specifying the signature cannot detect, tamper or remove it.
Static and dynamic program analyses attempt to extract useful information on program’s behaviours... more Static and dynamic program analyses attempt to extract useful information on program’s behaviours. Static analysis uses an abstract model of programs to reason on their runtime behaviour without actually running them, while dynamic analysis reasons on a test set of real program executions. For this reason, the precision of static analysis is limited by the presence of false positives (executions allowed by the abstract model that cannot happen at runtime), while the precision of dynamic analysis is limited by the presence of false negatives (real executions that are not in the test set). Researchers have developed many analysis techniques and tools in the attempt to increase the precision of program verification. Software protection is an interesting scenario where programs need to be protected from adversaries that use program analysis to understand their inner working and then exploit this knowledge to perform some illicit actions. Program analysis plays a dual role in program ver...
Dynamic program analysis is extremely successful both in code debugging and in malicious code att... more Dynamic program analysis is extremely successful both in code debugging and in malicious code attacks. Fuzzing, concolic, and monkey testing are instances of the more general problem of analysing programs by dynamically executing their code with selected inputs. While static program analysis has a beautiful and well established theoretical foundation in abstract interpretation, dynamic analysis still lacks such a foundation. In this paper, we introduce a formal model for understanding the notion of precision in dynamic program analysis. It is known that in sound-by-construction static program analysis the precision amounts to completeness. In dynamic analysis, which is inherently unsound, precision boils down to a notion of coverage of execution traces with respect to what the observer (attacker or debugger) can effectively observe about the computation. We introduce a topological characterisation of the notion of coverage relatively to a given (fixed) observation for dynamic progra...
Dynamic languages often employ reflection primitives to turn dynamically generated text into exec... more Dynamic languages often employ reflection primitives to turn dynamically generated text into executable code at run-time. These features make standard static analysis extremely hard if not impossible because its essential data structures, i.e., the control-flow graph and the system of recursive equations associated with the program to analyse, are themselves dynamically mutating objects. We introduce SEA, an abstract interpreter for automatic sound string executability analysis of dynamic languages employing bounded (i.e, finitely nested) reflection and dynamic code generation. Strings are statically approximated in an abstract domain of finite state automata with basic operations implemented as symbolic transducers. SEA combines standard program analysis together with string executability analysis. The analysis of a call to reflection determines a call to the same abstract interpreter over a code which is synthesised directly from the result of the static string executability analy...
Journal of Computer Virology and Hacking Techniques, 2021
In the past few years, malware classification techniques have shifted from shallow traditional ma... more In the past few years, malware classification techniques have shifted from shallow traditional machine learning models to deeper neural network architectures. The main benefit of some of these is the ability to work with raw data, guaranteed by their automatic feature extraction capabilities. This results in less technical expertise needed while building the models, thus less initial pre-processing resources. Nevertheless, such advantage comes with its drawbacks, since deep learning models require huge quantities of data in order to generate a model that generalizes well. The amount of data required to train a deep network without overfitting is often unobtainable for malware analysts. We take inspiration from image-based data augmentation techniques and apply a sequence of semantics-preserving syntactic code transformations (obfuscations) to a small dataset of programs to generate a larger dataset. We then design two learning models, a convolutional neural network and a bi-directio...
Journal of Computer Virology and Hacking Techniques, 2021
Metamorphic malware are self-modifying programs which apply semantic preserving transformations t... more Metamorphic malware are self-modifying programs which apply semantic preserving transformations to their own code in order to foil detection systems based on signature matching. Metamorphism impacts both software security and code protection technologies: it is used by malware writers to evade detection systems based on pattern matching and by software developers for preventing malicious host attacks through software diversification. In this paper, we consider the problem of automatically extracting metamorphic signatures from the analysis of metamorphic malware variants. We define a metamorphic signature as an abstract program representation that ideally captures all the possible code variants that might be generated during the execution of a metamorphic program. For this purpose, we developed MetaSign: a tool that takes as input a collection of metamorphic code variants and produces, as output, a set of transformation rules that could have been used to generate the considered meta...
Obfuscation is the art of making code hard to reverse engineer and understand. In this paper, we ... more Obfuscation is the art of making code hard to reverse engineer and understand. In this paper, we propose a formal model for specifying and understanding the strength of obfuscating transformations with respect to a given attack model. The idea is to consider the attacker as an abstract interpreter willing to extract information about the program’s semantics. In this scenario, we show that obfuscating code is making the analysis imprecise, namely making the corresponding abstract domain incomplete. It is known that completeness is a property of the abstract domain and the program to analyse. We introduce a framework for transforming abstract domains, i.e., analyses, towards incompleteness. The family of incomplete abstractions for a given program provides a characterisation of the potency of obfuscation employed in that program, i.e., its strength against the attack specified by those abstractions. We show this characterisation for known obfuscating transformations used to inhibit pr...
Uploads
Papers by Mila Preda