Design and Implementation of A Quantum Compiler
Design and Implementation of A Quantum Compiler
Design and Implementation of A Quantum Compiler
ABSTRACT
We present a compiler for programming quantum architectures based on the Quantum Random Access Machine
(QRAM) model. The QRAM model consists of a classical subsystem responsible for generating the quantum
operations that are executed on a quantum subsystem. The compiler can also be applied to trade studies
for optimizing the reliability and latency of quantum programs and to determine the required error correction
resources. We use the Bacon-Shor [[9, 1, 3]] quantum error correcting code as an example quantum program that
can be processed and analyzed by the compiler.
1. INTRODUCTION
In this paper, we present a compiler designed for the Quantum Random Access Machine (QRAM)1 model, which
is an extension of the classical random access machine.2 The QRAM model has become a widely accepted model
for quantum computers with recently several published examples.3, 4 What has not been discussed, however,
is a clear description of how these architectures can be programmed. The compiler is a tool that allows us to
program these architectures and also to analyze the resources necessary for implementing large-scale quantum
applications using these architectures. As the design of these QRAM-based architectures and physical quantum
components continues to improve, such analysis tool can help engineers in bridging the gap between higher-level
system requirements and low-level device capabilities.
sequence of physical
Classical Subsystem quantum operations Quantum Subsystem
(Hardware/Software) measurement results (Quantum Hardware)
As shown in Figure 1, the QRAM model consists of a classical subsystem responsible for generating sequences
of quantum operations to be executed on a separate quantum subsystem. The classical subsystem does all
compilation of the high-level quantum program and any classical pre-processing and post-processing computations
needed by quantum algorithms. The quantum subsystem provides quantum hardware resources to the system in
the form individually addressable physical qubits and logic gates. The classical subsystem communicates with
the quantum subsystem by processing and analyzing the results of quantum measurements of individual qubits.
The compiler is a software program that is stored and executed by the classical subsystem. Its purpose is to
automate the process of transforming a quantum program written in a high-level computer programming language
to the physical operations that are to be executed in the quantum subsystem through several transformation
stages. The automation of this process makes modeling of a full-scale quantum application (complete with fault-
tolerant quantum error correction) feasible since such a model will require the orchestration of many millions of
qubit interactions at each step in the transformation of the program and is not possible to develop by hand. In
addition to mapping a quantum program to a device technology, the compiler is designed for several types of
analyses, including:
Quantum Information and Computation VIII, edited by Eric J. Donkor, Andrew R. Pirich, Howard E. Brandt,
Proc. of SPIE Vol. 7702, 77020S · © 2010 SPIE · CCC code: 0277-786X/10/$18 · doi: 10.1117/12.852548
Figure 2. The three quantum compiler components: Front End (pre-compiler), Assembler, and the Back End (assembly
legalization) and the quantum program compilation flow
• Analyze the reliability and latency of the program execution in order to understand the design trade-offs
at each stage of the compilation process that allow us to maximize reliability and minimize latency
• Quantify the impact of including quantum error correction in terms of the number of resources required
and overall system performance
• Calculate the fault-tolerant threshold failure probability for quantum programs written for fault-tolerance.
The networks for quantum error correcting codes are examples of such programs.
• Analyze top level algorithm execution requirements on the quantum circuit to understand how they trans-
late into specific performance requirements on the physical components
• Quantify the performance of the physical implementation and how this performance might be optimized.
The performance parameters of interest are resource requirements, execution time and circuit failure prob-
ability.
The quantum compiler design follows the standard design of classical compilers,5 which have two well-defined
components: a front end and a back end (Cooper, et al., Page 5 for definitions6 ) ∗ . This design allows us
to take advantage of existing classical compiler techniques and algorithms during each transformation stage of
the quantum program. Figure 2 illustrates the compilation flow as well as the compiler components. As can
be seen in this figure, we have added an additional component, the Assembler. The assembler is responsible
for assembling a representation of the program using a universal set of logic gates that can be implemented
fault-tolerantly. The function of each of the three compiler components can be summarized as follows:
1. Front End (Pre-compiler): The pre-compiler’s function is to translate a human readable high-level speci-
fication of a quantum circuit into a machine readable intermediate representation. The intermediate rep-
resentation allows initial device independent analysis and optimization of the circuit. Part of the analysis
performed by our compiler in the front end is to determine if error correction is necessary.
2. Assembler: After the pre-compiler, the program enters the assembler, whose function is to map the inter-
mediate representation of the circuit into an equivalent low-level representation composed of a universal set
of logic gates, which can be implemented fault-tolerantly. The low-level representation which is generated
by the assembler is the quantum assembly code, analogous the classical assembly code and can be sent to
the back end for legalization.
∗
We note that the organization of our compiler follows loosely the conceptual quantum compiler described by K. Svore,
et al. [7], and for consistency, we have adopted many of the same abbreviations.
The remainder of the paper is organized as follows. Section 2 describes the compiler front end, including the
high-level programming language. This section also includes the example implementation of the [[9, 1, 3]] Bacon-
Shor quantum error correcting code.8, 9 Section 3 briefly describes the assembler and the application of the
Solovay-Kitaev algorithm to decompose an arbitrary single-qubit quantum gate into a sequence of quantum gates
which can be implemented fault-tolerantly. Section 4 describes the back end legalization and implementation on
a target physical architecture. Section 5 provides our conclusions.
U2 start_main:
start_function U2 {qreg Q(4)}:
qreg U2 qreg A(4);
bit (x=1);
A U1 U4 qreg B(3);
while (x <= 4)
U2 creg C creg C(3);
qgate U2( Q[x] );
U2 call_function U1 {A};
alu x = x + 1;
call_function U2 {A};
U3 c1 call_function U3 {A[3:4], B}; end_while
qreg call_function U4 {A}; end_function;
B c2 call_function MeasureB {B, C};
c3 end_function;
(a) (b)
Figure 3. (a) Example quantum circuit, quantum registers, and classical registers; (b) Implementation of the given circuit
in Qapp
An example quantum program, based on the circuit model, is illustrated in Figure 3(a). Each line corresponds
to a specific qubit whose state is modified through time (left to right) according to the unitary gates Ui . Figure
3(a) also shows the two primary data types of the Qapp language: quantum registers (qreg) for storage and
grouping of qubits and classical registers (creg) for storing the results of quantum measurement. The third data
type in Qapp is a classical bit type (bit), which can be any integer. The classical bits are used for control of
purely classical constructs (such as “while” loops) that can be inserted in Qapp to make it easier to represent
larger quantum circuits.
The Qapp parser does not have the ability to automatically synthesize arbitrary N -qubit quantum operations
into a set of discrete low-level quantum gates. Instead, we use the fact that any N -qubit gate can be decomposed
The method for addressing qubits in Qapp is motivated by the fact that in many cases a given quantum gate may
operate on multiple qubits during the same time step (see Figure 3(a) for an example of the application of U2
and the measurement gates). To make this property of quantum circuits natural for the programmer to express
in Qapp, the Qapp syntax allows the addressing of any subset of qubits from any given quantum register to be
done within the declaration of a single quantum gate. Table 2, for example, lists all possible ways to address
qubits when applying the single-qubit H gate on two quantum registers, A and B. Note that single-qubit gates
can have an unlimited number of operands.
Addressing qubits within two-qubit gates is nearly identical to the addressing of qubits within single-qubit
gates. The difference is that two-qubit gates can only have two operands, while single qubit gates can have an
unlimited number of operands. Also, both operands within a two-qubit gate must refer to the same number of
qubits. The first operand refers to the control qubits, while the second operand must refer to an equal number
of target qubits. This is best demonstrated by the example shown in Figure 4, which illustrates a few simple
ways to make a 4-qubit cat-state and the corresponding Qapp syntax for the cnot gates.
1 H H H
qreg 2
A 3
4
qgate CX (A[1], A[3]);
qgate CX (A[1:1:1], A[2:3:4]); qgate CX (A[1-4], A[2-5]);
qgate CX(A[1:3], A[2:4]);
Figure 4. Different circuits for making a cat-state and corresponding Qapp syntax for the cat-state cnot gates
DC1
DC2
DC(9) X Z DC3
Z X
AC(9) EC EC EC EC AC1
AC2 H
DT(9) X Z Z X AC3
AT(9) EC EC EC EC
repeat for DC7, DC8, DC9
Figure 5. The circuit for a logical cnot gate using the Bacon-Shor [[9, 1, 3]] error correcting code (The boxes “X EC” and
“Z EC” denote error correction of Pauli X errors and Pauli Z errors, respectively)
The logical cnot circuit that we analyze is shown in Figure 5. The circuit consists of four 9-qubit quantum
registers. The registers denoted with DC and DT are the data logical qubits, where one is the control qubit
and the other one is the target qubit. The registers denoted with AC and AT are the corresponding auxiliary
†
The threshold is the maximum failure probability allowed per physical gate in order for error correction to be effective.
DAG
start_main:
Figure 6. The Qapp representation of the circuit shown in Figure 5, and the transformation to QIR and subsequent
analysis
qubits used during the [[9, 1, 3]] error correcting process. Each error correction step is divided into a correction
of Pauli X errors and Pauli Z errors (denoted with “X EC” and “Z EC”, respectively). The right-hand-side of
Figure 5 shows the detailed circuit for the correction of Pauli X errors. Aliferis, et al., [21] provide a detailed
description of the difference between fault-tolerant Pauli X correction and Pauli Z correction for codes like the
[[9, 1, 3]] code.
Figure 6 shows the Qapp representation of the circuit in Figure 5, where the functions for correcting Pauli
Z errors have been omitted to save space. Note that the “main” function corresponds to the left-hand-side
of Figure 5 and the functions “PrepareForXCorrect” and “XCorrect” correspond to the right-hand-side of
Figure 5. Figure 6 also shows an example transformation of part of the Qapp program through the pre-compiler
and the resulting analysis. The right-hand-side of Figure 6 starts with the QIR representation of the circuit
described in the “PrepareForXCorrect” Qapp function. The QIR code is then mapped to a Directed Acyclic
Graph (DAG), allowing us to identify all data dependencies between each of the QIR instructions. In reality, the
compiler generates a DAG for the entire application, which exposes data dependencies between all instructions.
Each node in the DAG corresponds to a QIR instruction and an edge between two nodes denotes a dependency
between two instructions. When generating the DAG, dependencies are calculated in two ways: (1) if the two
instructions can be described by two unitary matrices, then the pre-compiler checks if the matrices commute;
(2) if one (or both) of the instructions cannot be described by a unitary matrix, the pre-compiler checks for any
shared data qubits or classical bits.
3. ASSEMBLER
The input to the assembler is the QIR representation of the program. The function of the assembler is assemble a
representation composed of gates which can be implemented fault-tolerantly. We refer to the resulting assembler
generated code as Qasm (for quantum assembly) and note that Qasm is simply an extension of QIR, since QIR
is also an assembly-like language. In fact, if error correction is not required, or the QIR program is composed
entirely of stabilizer gates,22 then the assembler component is skipped and transformation of the program is
passed on directly to the back end. Stabilizer gates include the Hadamard gate, the cnot gate, the Pauli X,
Y , Z gates, the single-qubit S gate, and quantum measurement. Fault tolerant implementations exist for all
stabilizer gates, thus there is no need for the assembler to do anything to them. Another gate for which the
assembler is not needed is the single-qubit T gate, which can also be implemented fault-tolerantly.21
The QIR gates which need to be processed by the assembler are the arbitrary ẑ-axis rotations by an ar-
bitrary angle θ radians (the single-qubit gate Rz(θ) gate and the two-qubit CRz(θ) gate) and the two-qubit
swap gate. The swap gate applied to two qubits Q1 and Q2 is simply converted to three cnot gates as follows:
swap (Q1,Q2); = CX(Q1,Q2); CX(Q2,Q1); CX(Q1,Q2); Similarly, the two-qubit CRz(θ) gates can be decom-
posed into a sequence of three Rz(θ/2) gates and a cnot gate, as shown by Nielsen and Chuang [23], Chapter
4. The conversion of the Rz(θ) rotations into fault-tolerant gates is more difficult. This is implemented in two
different ways, depending on the value of θ:
1. If θ = 2πk for k being any positive integer, then the assembler employs a recursive decomposition of Rz( 2πk )
for which the base case is the S gate. The recursive implementation of the single-qubit Rz( 2πk ) on qubit
Q is illustrated as a quantum circuit in Figure 7. The gates in Figure 7 denoted with A( 2πk ) are used to
prepare an auxiliary qubit into the state Rz( 2πk )| + .21, 24
2. If θ is not equal to 2πk , but is arbitrary, the compiler approximates a single-qubit Rz(θ) gate using
O(log3.97 (1/)) gates from the set {H, T, S} by the Solovay-Kitaev theorem.25 The Solovay-Kitaev ap-
proximation error () is equivalent to a small rotation error applied to the qubit. To compute the desired
sequence required to approximate an Rz(θ) gate, the compiler employs the algorithm of Dawson and Nielsen
[26].
Q S X X X
Q k
Rz(π/2 )
A
|0〉 (π/2k)
A
|0〉 (π/2k-1)
A
|0〉 (π/4)
...................
ions
U. UU UU UU UU UU UU UU UU 'U
...................
U. ballistic channel
U U U UU UU UU U U U trap zones
...................
U. UU - UU UU UU UU .UU UU - UU U
Figure 8. A screen capture of the back end’s visual interpretation of the ion-trap technology
An abstraction of the ion-trap device architecture is shown in Figure 8, which is an actual screen capture of
the back end’s visualization utility of the circuit given in Figure 5 for correcting Pauli X errors. Note that Figure
8 shows nine qubits for the data (corresponding to the register DC in Figure 5) and nine qubits for the ancilla
(corresponding to the register AC in Figure 5). As shown in Figure 8, ions are stored in designated trapping
zones and are ballistically shuttled from one region to another via the drawn ballistic channels. For two-qubit
gates, ions are moved to the same trapping zone and the vibrational modes provides the mechanism for qubit
interaction.
The QPR code, derived from the trapped-ion ISA, consists of the instructions shown in Table 3. Note that
the ion-trap technology does not allow direct implementation of cnot gates. All cnot gates must be converted
QPR analysis and the properties of the quantum program derived by the QPR analysis are the same as
those derived for the QIR circuits, described in Section 2.2. The fact that this analysis uses QPR, makes the
results technology-specific by taking into account the underlying device characteristics. When scheduling the
QPR program, the back end employs the priority-based scheduling algorithm, which dynamically updates the
instruction priorities based on the communication requirements for the ions at each computational step during
the scheduling process.33 The duration of each QPR gate is expressed in terms of computational steps (without
a specific time unit per step), where the number of computational steps per gate is left up to the user to define.
The default configuration is set to a single computational step for each type of gate. In reality the ratio of
the duration of a controlled-Z gate and a single-qubit gate is approximately three to one,30 thus one possible
configuration is to assign one computational step for each single-qubit gate and three steps to each two-qubit
gate. The duration of two-qubit gates, however also depends on the number of movement gates necessary to
bring the participating qubits together.
When looking at the QPR analysis results made available by the back end, it is easy to see why even a
single logical cnot gate is difficult to analyze by hand. Assuming the default gate duration, the QPR execution
length can be calculated to be 292 computational time steps, 344 total number of quantum gates, 585 movement
operations, and as many as 4459 “wait” operations. The “move” operations are derived via an auto-generated
geometrical placement of the ion-trap qubits onto a 2-D device abstraction (see Figure 8). It is also possible to
provide a user-defined placement of the ions as input to the back end (in a text file) for a more customized analysis
of the circuit. The “wait” operations are only assigned when the qubit is active and no quantum operation is
being applied to it. A qubit is inactive when it has been measured and before its state is initialized to |0 for
computation. The fault-tolerant threshold value for the logical cnot gate that corresponds to this particular
configuritation is approximately 1.0 × 10−6 . This means that the physical gates must be manufactured such
that each gate has a failure probability at most 1.0 × 10−6 . Recent experimental work with trapped ions32 has
indicated that the contribution of the “wait” operations to the fault-tolerant threshold is negligible. In this case,
the fault-tolerant threshold calculated by the back end becomes approximately 9.0 × 10−6 , which is a significant
increase of the maximum allowed gate failure probability.
REFERENCES
[1] Knill, E., “Conventions for quantum pseudocode,” Los Alamos National Laboratories, Technical Report
LAUR-96-2724 (1996).
[2] Cook, S. A. and Reckhow, R. A., “Time-bounded random access machines,” Journal of Computer Systems
Science 7, 354–375 (1972).
[3] Metodi, T. S., Thaker, D. D., Cross, A. W., Chong, F. T., and Chuang, I. L., “A quantum logic array mi-
croarchitecture: Scalable quantum data movement and computation,” Proceedings of the 38th International
Symposium on Microarchitecture (MICRO-38) (2005).
[4] Fowler, A. G., Thompson, W. F., Yan, Z., Stephens, A. M., Plourde, B. L. T., and Wilhelm, F. K., “Long-
range coupling and scalable architecture for superconducting flux qubits,” Phys. Rev. B. 76(174507) (2007).
[5] Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D., [Compilers, Principles, Techniques, and Tools],
Addison Wesley, 2 ed. (August 2006).
[6] Cooper, K. D. and Torczon, L., [Engineering a Compiler], Morgan Kaufman Publishers, 5 ed. (March 2008).
[7] Svore, K., Cross, A., Aho, A., Chuang, I., and Markov, I., “Toward a software architecture for quantum
computing design tools,” In Proceedings of the Workshop on Quantum Programming Languages (QPL)
(2004).
[8] Poulin, D., “Stabilizer formalism for operator quantum error correction,” Phys. Rev. Lett. 95(230504) (2005).
[9] Bacon, D., “Operator quantum error correcting subsystems for self-correcting quantum memories,” Phys.
Rev. A 73(102340) (2006).
[10] Barenco, A., Bennett, C. H., Cleve, R., DiVincenzo, D. P., Margolus, N., Shor, P., Sleator, T., Smolin, J.,
and Weinfurter, H., “Elementary gates for quantum computation,” Phys. Rev. A. 52(3457) (1995).
[11] Deutsch, D., “Quantum theory, the church-turing principle and the universal quantum computer,” Proceed-
ings of the Royal Society of London A-400, 97–117 (1985).
[12] Shor, P. W., “Fault-tolerant quantum computation,” in Proc. 37th Symp. on Foundations of Computer
Science, IEEE Computer Society Press , 55–65 (1996).
[13] Steane, A. M., “Efficient fault-tolerant quantum computing,” Phys. Rev. Lett. 78, 2252–2255 (1997).
[14] Wootters, W. and Zurek, W., “A single quantum cannot be cloned,” Nature 299, 802–803 (1982).
[15] Maslov, D., Falconer, S. M., and Mosca, M., “Quantum circuit placement: Optimizing qubit-to-qubit inter-
actions through mapping quantum circuits into a physical experiment,” E-Print arXiv:quant-ph/0703256v1
(2007).