This section reviews previous SCA-based disassemblers and presents an overview of the proposed approach.
2.1 Related Work
Various SCA-based methods exist for recovering information about target processes on embedded systems. Code-monitoring with SCA is most often used to identify fixed instruction sequences, separate basic blocks, and predict control flow [
1,
2] based on some
a priori knowledge of an evaluated benchmark. Using SCA to disassemble individual instructions from an arbitrary unknown code as in References [
4–
8] is far more challenging in part because each instruction impacts a multitude of architectural blocks differently. Disassemblers can be compared based on their success rates and their acquisition cost. While success rate is simply the ratio of correctly identified instructions and total number of executed instructions, the acquisition cost is a function of the number of sensor configurations used during profiling
\({N}_{{\rm{pc}}}\) , the number of instantiations performed to characterize each instruction
\({\bar{N}}_{{\rm{inst}}}\) , and the number of samples collected for each of these measurements
\({N}_t\) . The acquisition cost in this work only accounts for samples stored post measurement collection and does not quantify repeated measurements and averaging performed by the oscilloscope software.
1Instruction disassembly based on coarse-grained EM or power SCA setups [
4–
6] uses a single sensor configuration (
\({N}_{{\rm{pc}}} = 1)\) and requires significant post-processing of the signals measured as the DUT executes an extensive set of test instructions. In Reference [
4], a power SCA-based disassembler, using
principal component analysis (PCA) for feature selection and a multivariate Gaussian classifier, was proposed to evaluate a small instruction set (
\(N = 33\) ). It correctly recognized ∼71% and ∼51% of instructions in test code and application benchmarks, respectively. The method in Reference [
4] assumes some
a priori knowledge of the code, however, as it applies hidden Markov models to blocks of the executed code. In Reference [
6], a coarse-grained EM SCA-based disassembler, using PCA with frequency-domain signals for feature selection and AdaBoost, support vector machine, and other methods for classification, was proposed. It was able to distinguish two instructions with a 100% success rate. Unfortunately, the method's performance for the remaining instructions was not evaluated in Reference [
6]. A larger instruction set (
\(N > 100\) ) was evaluated in Reference [
5] with a power SCA-based disassembler, using
Kullback–Leibler (KL) divergence for feature selection and quadratic discriminant analysis for classification. The method disassembled a test code with ∼99% success rate. Although Reference [
5] used hierarchical classification, included an extra method to improve success rates for application benchmarks, and recovered two instructions implemented in one such code with 92% success rate, the method was not evaluated comprehensively on real-world application benchmarks. In Reference [
27], an instruction disassembler targeting a Cortex M0 processor was proposed, implementing KL divergence for feature selection and classification algorithms demonstrated in Reference [
5], which was further enhanced by using models based on multi-layer perceptron and convolutional neural network. While the method recognized ∼99% and ∼88% of instructions in test code and application benchmarks, respectively, the disassembly was limited to a small subset of the full instruction set (
\(N = 17\) ).
Instruction disassembly based on fine-grained EM SCA was demonstrated in References [
7,
8]. A small instruction set (
\(N = 33\) ) was evaluated in Reference [
7] using linear discriminant analysis for feature selection and a
k-Nearest Neighbor algorithm for classification. While the disassembler recognized ∼96% of the instructions in a test code and ∼88% of them in application benchmarks, the approach in Reference [
7] is an invasive method that requires decapsulation of the DUT to constrain the search space of configurations during feature selection. A similar fine-grained setup in Reference [
8] targeted a slightly larger instruction set (
\(N = 50\) ) by performing bit-level disassembly of opcodes, training quadrature discriminant analysis-based classifiers to identify individual bit transitions as instructions are pre-fetched. Although the disassembler recognized 95% of instructions in test codes, it was not evaluated on real benchmarks.
While the methods proposed in References [
4–
8,
27] (Table
1) have very high success rates when disassembling test codes that follow the same structure/template as the profiling codes they use to select features, their success rates either decrease markedly or are unknown when disassembling application benchmarks; moreover, the methods in References [
4,
6,
7,
27], which were developed and tested with only limited number of instructions, may not scale well as
\(N\) , the instruction set's size, increases. Another issue common to the methods in References [
4–
8] is that they do not elaborate on the disassembly of conditional branches; such branches requires careful consideration during both phases of disassembly and can enable the detection of possible transitions to different parts of the code and the evaluation of control flow for comprehensive disassembly. Finally, the methods in References [
4–
7] extensively instantiate instructions with randomized operands, in different sequences, and so on; they instantiate each instruction from 200 [
6] to 3,000 [
5] times. These methods cannot be directly extended to fine-grained EM SCA, because their acquisition costs would be infeasibly high, especially if the number of possible instructions and measurement configurations is large. By contrast, our proposed method aims to (i) improve the success rate of disassembly for application codes, (ii) identify if branches were taken/not taken during execution, and (iii) maintain a feasible acquisition cost even for large instruction sets and high-resolution EM probing.
2.2 Proposed Approach
As mentioned in the Introduction, the proposed method consists of two phases (Figure
2). In the
feature-selection phase, EM fields emanated from the DUT are collected for all instructions by designing and using profiling codes that instantiate each instruction for multiple specific machine states, chosen according to the HW leakage model [
9,
15]. The signals are collected with all measurement configurations in a 5D search space consisting of the probe location, probe orientation, and time interval. Next, the min-max bounds of signals—directly probed fields, as well as differential signals derived from them—are found for each instruction, and these signal envelopes are compiled within a hierarchical database. The database stores for each instruction—at the bottom stage of the hierarchy—real-valued envelopes that are multivariate functions of the measurement configuration, i.e., they are functions of five variables. For the upper stages of the hierarchy, instructions are grouped using certain instruction attributes (Figure
1), and the database is compiled bottom-up, i.e., the envelopes for the instruction classes in the upper stages are constructed using envelopes for instruction classes compiled in the lower stages.
Once the database is constructed, it is used to identify optimal measurement configurations and features for binary classification. During feature selection, the envelopes for each instruction class are compared pairwise (one at a time) to those of other classes at the same stage; the comparison identifies
\(M\) configurations, where the pair's signal envelopes are most distant; i.e., these are the optimal values of the five variables to distinguish the pair from each other. The signals obtained with the optimal measurement configurations, i.e., the selected features, and the envelopes of the two classes corresponding to them are recorded for use in the next phase. In the
classification phase, signals measured while the DUT executes arbitrary codes are categorized hierarchically starting from the top stage. At each stage, candidate classes are identified given the class selected in the previous stage, using binary classification with majority voting [
5].