Depth-Optimized Reversible Circuit Synthesis
arXiv:1208.5425v1 [quant-ph] 27 Aug 2012
Mona Arabzadeh, Morteza Saheb Zamani, Mehdi Sedighi,
Mehdi Saeedi
Abstract In this paper, simultaneous reduction of circuit depth and synthesis cost of reversible circuits in
quantum technologies with limited interaction is addressed. We developed a cycle-based synthesis algorithm
which uses negative controls and limited distance between gate lines. To improve circuit depth, a new
parallel structure is introduced in which before synthesis a set of disjoint cycles are extracted from the input
specification and distributed into some subsets. The cycles of each subset are synthesized independently on
different sets of ancillae. Accordingly, each disjoint set can be synthesized by different synthesis methods.
Our analysis shows that the best worst-case synthesis cost of reversible circuits in the linear nearest neighbor
architecture is improved by the proposed approach. Our experimental results reveal the effectiveness of the
proposed approach to reduce cost and circuit depth for several benchmarks.
Keywords Reversible logic · Synthesis · Linear nearest neighbor architecture · Circuit depth
1 Introduction
Boolean reversible circuits have attracted attention as components in several quantum algorithms including
Shor’s quantum factoring [1] and stabilizer circuits [2]. In the recent years, considerable efforts have been
made to synthesize a Boolean reversible function by a set of quantum gates [3].
The proposed technologies for quantum computing suffer from practical limitations for implementation.
For example, popular quantum technologies allow computation on a few qubits in a linear nearest neighbor
(LNN) architecture where only adjacent qubits can interact [4]. Additionally, physical qubits are fragile and
can hold their states only for a limited time, called coherence time, [5]. To reflect technological constraints
in the synthesis stage, different technology-specific cost metrics have been introduced.
– Two-qubit cost is the number of two-qubit gates of any type and the number of one-qubit gates (reported
separately) in a given circuit. The number of two-qubit gates for an n-qubit Toffoli gate (for n ≥ 3)
is estimated as 10n − 25 [6]. Quantum cost (QC) is the number of NOT, CNOT, controlled-V and
controlled-V† gates required to implement a given reversible function.
– Interaction cost is the distance between gate qubits for any two-qubit gate. Quantum circuit technologies
with 1D, 2D and 3D interactions exist [4]. Interaction cost for a circuit is calculated by a summation
over the interaction costs of its gates.
– Number of ancillae and garbage qubits reflect the limited number of qubits in the current quantum technologies.
– Depth is the largest number of elementary gates on any path from inputs to outputs in a circuit. Reducing
circuit depth can increase coherence time.
Synthesis of reversible Boolean circuits has an exponential search space. Consequently, many heuristic
algorithms have been proposed to consider the effects of quantum cost and two-qubit cost in the synthesis stage [7-10]. Additionally, several post-process optimization methods have been developed to improve
A preliminary and partial version of this paper was presented at the 2011 International Workshop on Logic and Synthesis,
San Diego, USA.
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran.
E-mail: {m.arabzadeh, szamani, msedighi, msaeedi}@aut.ac.ir
M. Saeedi is currently with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA,
USA 90089-2562. E-mail: msaeedi@usc.edu
2
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
quantum cost [8, 11, 6], interaction cost [12, 13], and depth [14]. However, the number of algorithms which
consider different parameters simultaneously — the focus of this work — is very limited.
Besides technological limitations, studying theoretical aspects of circuits with either limited interactions
among qubits of gates or limited depth attracts interest in complexity theory. For example, NCi is the
class of decision problems solvable by a uniform family of Boolean circuits with polynomial size, depth of
O(logi n) and fan-in=2. QNC is the class of constant-depth quantum circuits without fanout gates [15].
In this paper, a synthesis algorithm for Boolean reversible circuits is proposed which uses a cycle-based
strategy to synthesize circuits for the LNN architecture. The proposed technique leads to improved synthesis
costs as compared to the best prior methods for several benchmarks. Moreover, a parallel structure for
reversible Boolean circuits is presented which significantly reduces circuit depth with 2n ancillae. Overall,
our circuits can be considered as depth-optimized reversible circuits for the LNN architecture.
This paper is organized as follows. Basic concepts are introduced in Section 2. Related synthesis and
post-process optimization methods are reviewed in Section 3. The proposed cycle-based synthesis algorithm
for the LNN architecture is described in Section 4. Section 5 presents a parallel structure to reduce circuit
depth. Experimental results are reported in Section 6, and Section 7 concludes the paper.
2 Basic Concepts
In this section, preliminary concepts are briefly introduced. Further background can be found in [3].
Permutation Function. Let B be any set and define f : B → B as a one-to-one and onto transition
function. The function f is a permutation function, as applying f to B leads to a set with the same elements
of B and probably in a different order. If B = {1, 2, 3, ..., m}, there exist two elements bi and bj belonging
to B such that f (bi ) = bj . A k-cycle with length k is denoted as (b1 , b2 , ..., bk ) which means that f (b1 ) =
b2 , f (b2 ) = b3 , ..., and f (bk ) = b1 . A given k-cycle (b1 , b2 , ..., bk ) could be written in different ways, such as
(b2 , b3 , ...bk , b1 ). Cycles c1 and c2 are called disjoint if they have no common members. Any permutation can
be written uniquely, except for the order, as a product of disjoint cycles. If two cycles c1 and c2 are disjoint,
they can commute, i.e., c1 c2 = c2 c1 . A cycle with length two is called transposition. A cycle or a permutation
is called even (odd ) if it can be written as an even (odd) number of transpositions. When k-cycle is even
(odd) then k is odd (even).
Reversible Function. An n-input, n-output, fully specified Boolean function f : B → B over variables
X = {x0 , ..., xn−1 } is called reversible if it maps each input pattern to a unique output pattern. Each
reversible function can be considered as a permutation function. The added lines to a circuit are called
ancillae and typically start out with a 0 or 1.
Reversible Gate. An n-input, n-output gate is reversible if it realizes a reversible function. A multiplecontrol Toffoli gate can be written as Cm NOT(C; t), where C = {i1 , . . . , im } is the set of control lines, t = {j}
with C ∩ t = ∅ is the target line and 0 ≤ i, j ≤ n − 1. A control line may be positive (negative ) which means
that if its value is one (zero), the value of the target is inverted. For m=0 and m=1, the gates are called NOT
(N) and CNOT (C), respectively. For m=2, the gate is called C2 NOT or Toffoli (T). The SWAP(a,b) gate
changes the value of two qubits a and b, and can be constructed by three CNOT gates C(a,b)C(b,a)C(a,b).
The controlled-V (controlled-V†) gate changes the value of its target line using the transformation given by
the matrix V (V† ) if the control line has the value 1.
V =
1 + i 1 −i
1−i 1 i
,V † =
−i 1
i 1
2
2
h
i
h
i
3 Related Work
In this section, we review prior synthesis and optimization techniques that are used in this paper.
In [16], an NCT-based synthesis method is proposed which decomposes a given cycle into a set of
transpositions. To implement an arbitrary transposition (a, b)(c, d) for distinct a, b, c, d 6= 0, 2i , the
authors introduced three subcircuits, namely π , κ0 and π −1 (the inverse of π ), where the κ0 circuit,
Cn−2 NOT(a2 , ..., an−1 ; a0), implements a fixed transposition (2n − 4, 2n − 3) (2n − 2, 2n − 1). Accordingly, a synthesis algorithm was proposed to transform a, b, c and d to 2n − 4, 2n − 3, 2n − 2 and 2n − 1,
respectively. By cascading π , κ0 and π −1 , an arbitrary transposition can be implemented with quantum
cost 34n − 64.
The NCT-based synthesis method in [16] was extensively improved in [10], k-cycle method hereafter. In
the k-cycle method, a given cycle of length ≥ 6 is decomposed into a set of cycles of lengths < 6, called
elementary cycles. Next, a set of synthesis algorithms was proposed to synthesize different elementary cycles,
Depth-Optimized Reversible Circuit Synthesis
3
x
x
x
x
y
G
y
G
Cin
Sum
Cin
Cout
0
0
V
V
V+
V
V
V
Sum
V+
V
(a)
Cout
(b)
x
x
y
0
G
V
V
V+
V
Cout
Sum
Cin
(c)
Fig. 1 (a) 3-input reversible full adder with optimal depth 4 [14], (b) the circuit in (a) after inserting SWAP gates and (c)
reducing the number of SWAP gates by [12].
i.e., a pair of 2-cycles, a single 3-cycle, a pair of 3-cycles, a single 5-cycle, a pair of 5-cycles, a single 2-cycle
(4-cycle) followed by a single 4-cycle (2-cycle) and a pair of 4-cycles. Similar to [16], 0 and 2i terms are fixed
before synthesis because their effect on their synthesis results is negligible [10]. NCT gates with positive
controls are used in both [16] and [10]. The effect of decomposition on the result of [10] was considered in
[17] where a cycle-assignment technique based on graph matching was proposed. The worst-case quantum
cost for synthesizing an arbitrary reversible function on n lines is 8.5n2n + o(2n ) in [10].
In [14], the authors introduced a post-process optimization algorithm to reduce the depth of a given
quantum circuit. To achieve this, a set of circuit templates (circuit identities) was proposed to reduce
quantum cost and circuit depth. The suggested templates are applied to change either gate locations or
control/target positions in a subcircuit to parallelize more gates. The introduced templates were used by a
greedy algorithm which starts from gate i and traverses the gates afterwards. At each step, the algorithm
moves gates to left whenever possible and applies templates to check whether other gates can be moved to
left or not. If no change is possible, it starts the same process from gate i + 1.
In [12], a synthesis flow was proposed to improve the interaction cost of a given quantum circuit. The
authors studied the exact synthesis of some small gates for the LNN architecture. The proposed optimal
circuits are used to simplify larger circuits. Besides, some circuit templates are introduced to reduce the
number of SWAP gates. Finally, local and global reordering of input qubits are considered to reorder gate
qubits for improving the interaction cost. The proposed techniques were consolidated in a unified design
flow to implement a given circuit with arbitrary interactions for architectures with limited interactions.
Fig. 1-a shows a 3-input full adder with depth 4 [14] and six elementary gates. Actually, depth 4 is optimal
since four qubits are involved in the fourth qubit [14]. Fig. 1-b shows the same circuit after inserting SWAP
gates to make the gate qubits adjacent with QC=24 and depth=23. Fig. 1-c illustrates the same circuit
after applying the method in [12] for reducing the number of SWAP gates where QC=18 and depth=17.
4 The Proposed Cycle-Based Synthesis Method for Interaction Cost
The main contribution of [10] is to propose a cycle-based synthesis approach with the primary focus on quantum cost as the sole metric considered. However, another important implementational constraint, namely
interaction cost, is considered besides the quantum cost in our proposed cycle-based method in this section.
To do that, we improve the k-cycle method by using negative controls and adapting the synthesis algorithms
of elementary cycles to the LNN architecture. Particularly, two new elementary odd cycles, a 2-cycle and
a 4-cycle, are included to improve quantum cost. These odd cycles are synthesized as a pair of 2-cycles
and a pair of 4-cycles in [10] with one ancilla. Odd cycles need one ancilla in the NCT library for the
implementation [16]. In our experiments, we used this ancilla for the decomposition of complex gates into
elementary gates. Additionally, 0 and 2i terms are not fixed before synthesis to be used in the proposed
parallel structure as discussed in Section 5.
Negative controls can reduce the number of elementary gates in the κ0 , π and π −1 circuits both with
and without considering nearest neighbor restriction. Multiple-control Toffoli gates with at least one positive
control can be simulated as efficiently as complex Toffoli gates with only positive controls [14]. By using
4
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
a0
a0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
a0
a1
a1
1
a2
ak-1
a2
ak
a3
ak+1
a4
a5
ak+2
a6
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
π
(b)
(c)
a0
a1
a1
a2
a2
a3
a3
a4
a4
a5
a5
a6
a6
a7
a7
1
1
a7
an-1
(a)
1
1
1
an-1
1
1
a0
κ0(2,2)
a0
1
1
1
1
a1
1
1
a2
a3
1
1
1
1
1
1
1
1
a4
1
a5
a6
1
a7
π
κ0(2,2)
(d)
Fig. 2 (a) The κ0(2,2) circuit in [16, 10]. (b) The proposed κ0(2,2) circuit. Each control at position i, 0 ≤ i ≤ n − 1, i 6= k + 2
is negative. (c) An example of π circuit in [10]. a0 is used to control CNOTs in the first part. The second subcircuit is the
circuit in [10, Theorem 3.1]. (d) An example of π circuit in the proposed method. Here, k=3. Refer to Table 1.
Algorithm 1: Gate selection in the π circuit
Input:
L n-bit input terms. Bit value at position i of the j-th input term is b(i,j) .
0
.
L n-bit κ0 terms. Bit value at position i of the j-th κ0 term is bκ
(i,j)
P ivot is the boldfaced position in the intermediate terms in Table 1.
Output: The π circuit.
for i in 0 to L do
if b(i,P ivot) 6=1 then
Set b(i,P ivot) =1 by either a CNOT or a Toffoli gate;
end
for j in 0 to P ivot do
0
then
if b(i,j) 6= bκ
(i,j)
Find a position p: b(i,p) =1, b(k,p) 6=1 (k < i), |p − j| is the minimum possible value, and p ≤ P ivot;
Apply CNOT(p;j);
end
end
for j in n-1 to P ivot+1 do
0
if b(i,j) 6= bκ
then
(i,j)
Find a position p: b(i,p) =1, b(k,p) 6=1 (k < i), |p − j| is the minimum possible value, and p ≥ P ivot;
Apply CNOT(p;j);
end
end
end
CNOT and Toffoli gates with negative controls, one may not fix 0 and 2i terms before synthesis as compared
with the methods in [16, 10].
Cycle Construction Length (CCL) is defined as the number of lines required to implement a given
cycle of length L. In theory, the minimum CCL is log2 L. To implement the elementary cycles by NCT
gates, at most two more lines are required in the proposed approach — one to avoid Toffoli gates without
any positive control in the κ0 circuit, and one to improve circuit cost in the π , π −1 circuits. Accordingly, we
set CCL(2) =2, CCL(2,2)=4, CCL(3) =3, CCL(3,3)=5, CCL(4) =4, CCL(4,2)=5, CCL(4,4)=5, CCL(5) =5 and
CCL(5,5)=6. For an n-line circuit, lines required to construct a given cycle, CCL in total, can be selected in
n × (n − 1) × ... ×(n − CCL − 1) different ways. To improve interaction cost and depth we place the selected
lines close to each other in the middle of the κ0 circuit at positions k, k ± 1 and k ± 2 for k = ⌊n/2⌋. Details
are discussed later.
To synthesize a given elementary cycle, one needs to change input terms into the terms specified by the
κ0 circuit. This is done by converting the input terms into intermediate terms specified by the π circuit.
Afterwards, the intermediate terms are transformed into κ0 terms by a few specific gates, called static gates.
In the proposed method, the control and target lines in the π circuit are selected such that interaction cost
can be reduced. Since κ0 cycles are constructed in the middle of the circuit and the intermediate terms are
designed with at least one “1”, as boldfaced in column Int. Terms in Table 1, it is possible to select control
and target lines of each gate with length ≤ ⌈(n − CCL)/2 + CCL⌉. Considering two SWAP gates with cost 6
leads to QCLN N ≤ 3(n + CCL) for each gate. To reduce circuit depth, the gates required to fix bit positions
at the first half and the second half are applied in parallel. Algorithm 1 provides the details.
Depth-Optimized Reversible Circuit Synthesis
5
The κ0(2,2) circuits in [16, 10] and the proposed κ0(2,2) circuit are shown in Fig. 2-a and Fig. 2-b,
respectively. Fig. 2-c illustrates one example of the π circuit in [10]. The input term is “11110111” which
should be changed to the second term in the κ0(2,2) circuit in [10], i.e., “11111101”. This is done by a circuit
with QC=16 and depth=11. In contrast, “11110111” should be changed to “00100100” in the proposed
method. Fig. 2-d shows the π circuit with QC=5 and depth=3 based on Algorithm 1.
4.1 Building Blocks
In this section, direct synthesis of the suggested elementary cycles, i.e., (2), (2,2), (3), (3,3), (4,2), (4,4),
(5), (5,5), is discussed. Fig. 3 illustrates the κ0 circuits of all elementary cycles. We give a full description
of the synthesis method for a pair of 2-cycles first.
(2,2)-synthesis: To change (a, b)(c, d) to κ0(2,2) terms:
– At most n NOT gates can be used to convert a to “0...1000...0”. Other terms b, c, and d may be changed
to new terms b′ , c′ and d′ , respectively.
– At most one CNOT gate conditioned on either the i-th line i6=k + 2 (positive) or i = k + 2 (negative)
can be used to set the (k − 1)-th bit of b′ . Next, at most n − 1 CNOT gates conditioned on the (k − 1)-th
bit can be applied to change the j -th bit of b′ (0 ≤ j ≤ n − 1, j6=k − 1) to “0...1001...0”. c′ , and d′ may
be changed to new terms c′′ and d′′ .
– At most one CNOT gate conditioned on either the i-th line i6=k + 2 (positive) or i = k + 2 (negative)
can be used to set the k-th bit of c′′ . Next, at most n − 1 CNOT gates with positive control conditioned
on the k-th bit can be applied to change the j -th bit of c′′ (0 ≤ j ≤ n − 1, j =
6 k) to “0...1010...0”. The
last term d′′ may be changed to a new term d′′′ .
– At most one CNOT gate conditioned on either the i-th line i6=k + 2 (positive) or i = k + 2 (negative) can
be used to set the (k + 1)-th bit of d′′′ . Next, at most n − 1 CNOT gates with positive control conditioned
on the (k + 1)-th bit can be applied to change the j -th bit of d′′′ (0 ≤ j ≤ n − 1, j6= k + 2) to “0...1111...0”.
– A Toffoli gate conditioned on the (k − 1)-th and the k-th lines can be used to set the (k + 1)-th line.
Therefore, it changes “0...1111...0” to “0...1011...0”.
Note that converting each term does not corrupt the previously fixed terms. The same number of gates are
needed for the π −1 circuit. Accordingly, a total number of 8n + 22 elementary gates are required for the π
and π −1 circuits. The κ0 circuit in Fig. 3-b implements (2k+2 , 2k+2 +2k−1 )(2k+2 +2k , 2k+2 +2k +2k−1 ) with
cost 24n − 88. Therefore, an arbitrary pair of 2-cycles (a, b)(c, d) can be implemented by at most 32n − 66
elementary gates.
Following the above discussion for the (2,2)-synthesis method, details for the synthesis of other elementary cycles are given in Table 1. In this table, subscripts in column Input Cycle(s) denote orders in
considering each term. Intermediate terms are represented by binary expansions with LSB on the right and
the underlined bit in the k-th position (k=⌊ n2 ⌋). The boldfaced “1” is Pivot in Algorithm 1 for each term.
The parenthesized pairs in column Max. Cost represent CNOT count with negative and positive controls,
respectively. The numbers given in column Terms for the κ0 circuit are bit positions with value “1” in binary
representation. Table 2 reports the resulting quantum cost of each elementary cycle. As can be seen, the
total number of elementary gates is improved by a linear factor in most cases. Considering the worst-case
cost of 3(n + CCL) for each gate in the π and π −1 circuits in the LNN architecture and 6n − 12 elementary
gates (i.e., two chains of n − 2 SWAP gates) for the κ0 circuits leads to the results given in Total Cost (LNN)
column in Table 2.
4.2 Worst-Case Analysis
In this section, an upper bound on the number of gates in the proposed cycle-based method is calculated. To
achieve this, let all terms of a truth table be involved in the input cycles to have a cycle with the maximum
length 2n for an n-input/n-output function. To convert a cycle with length>5 to a set of elementary cycles,
we may have some repeated terms in non-disjoint cycles. As such, 2n +ar shows the maximum number of
n
terms where ar is the maximum number of repeated terms and can be estimated as ar = ar−51 +4 , a0 = 25
n
Plog
(2
n −5
)
n
i
2 +5 −5
which results in ar = 25 + i=25 4
= 2n−2 +log5 ( 2
5i
number of elementary gates in our approach.
n
−5
4
) − 94 . Theorem 1 discusses the maximum
Theorem 1 The maximum number of elementary gates for any permutation in the proposed approach is 9.4n2n −
18.82n + o(n2 ) and 42.4n2 2n + o(n3 ) without and with considering interaction cost, respectively.
6
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
Table 1 Direct synthesis of elementary cycles. Subscripts in the input cycles denote the orders in considering each term.
⌋). The boldfaced “1” is Pivot in Algorithm 1. Numbers given for κ0 terms
The underlined bit in the k-th position (k=⌊ n
2
are bit positions with “1” in the binary expansion.
Input Cycle(s)
(a1 , b2 )
(a1 , b2 )
(c3 , d4 )
(a1 , b2 , c3 )
(a1 , b2 , c3 )
(d4 , e5 , f6 )
(a1 , b2 , c3 , d4 )
(a1 , b2 , c3 , d4 )
(e5 , f6 )
(a1 , b2 , c3 , d4 )
(e5 , f6 , g7 , h8 )
(a1 , b4 , c2 , d3 , e5 )
(a1 , b4 , c2 , d3 , e10 )
(f5 , g8 , h6 , i7 , j9 )
Int. Terms
(0...10...0)
(0...11...0)
(0...1000...0)
(0...1001...0)
(0...1010...0)
(0...1111...0)
(0...001...0)
(0...101...0)
(0...111...0)
(0...00001...0)
(0...00011...0)
(0...00111...0)
(0...10001...0)
(0...11011...0)
(0...11111...0)
(0...1000...0)
(0...1001...0)
(0...1010...0)
(0...1111...0)
(0...10000...0)
(0...10010...0)
(0...10100...0)
(0...11110...0)
(0...10111...0)
(0...11011...0)
(0...10000...0)
(0...10001...0)
(0...10010...0)
(0...10111...0)
(0...10100...0)
(0...11101...0)
(0...11110...0)
(0...11111...0)
(0...10000...0)
(0...10001...0)
(0...10010...0)
(0...11011...0)
(0...10111...0)
(0...100000...0)
(0...100001...0)
(0...100010...0)
(0...100111...0)
(0...101000...0)
(0...111001...0)
(0...111010...0)
(0...111011...0)
(0...101111...0)
(0...110111...0)
π or π −1 Circuit
Max. Cost
Static Gates
nN
n(1,n-1) C
nN
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
T(k − 1, k; k + 1)
nN
n(1,n-1) C
n(1,n-1) C
nN
n(1,n-1) C
n(1,n-1) C
1 T, n-1 C
1 T, n-1 C
T(k − 1, k + 2; k + 1)
1 T, n-1 C
T(k, k + 2; k + 1)
nN
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
T(k − 1, k; k + 1)
nN
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
T(k − 1, k; k + 1)
n(1,n-1) C
1 T, n-1 C
T(k − 2, k ′ ; k + 1)
nN
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
T(k − 2, k − 1; k)
1 T, n-1 C
T(k − 2, k; k + 1)
1 T, n-1 C
T(k − 1, k; k + 1)
n(1,n-1) C
T(k − 2, k − 1, k; k + 1)
nN
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
T(k − 2, k − 1; k + 1)
n(1,n-1) C
nN
n(1,n-1) C
n(1,n-1) C
n(1,n-1) C
T(k − 3, k − 2; k − 1)
n(1,n-1) C
1 T, n-1 C
T(k − 3, k; k + 1)
1 T, n-1 C
T(k − 2, k; k + 1)
n(1,n-1) C
T(k − 2, k − 1, k; k + 1)
n(1,n-1) C
1 T, n-1 C
T(k − 1, k ′ ; k + 1)
κ0 Circuit
Terms
(k + 1)
(k − 1)(k + 1)
(k + 2)
(k + 2)(k − 1)
(k + 2)(k)
(k + 2)(k)(k − 1)
(k − 1)
(k + 1)(k − 1)
(k + 1)(k)(k − 1)
(k − 2)
(k − 1)(k − 2)
(k)(k − 1)(k − 2)
(k + 2)(k − 2)
(k + 2)(k − 1)(k − 2)
(k + 2)(k)(k − 1)(k − 2)
(k + 2)
(k − 1)(k + 2)
(k)(k + 2)
(k − 1)(k)(k + 2)
(k + 2)
(k + 2)(k − 1)
(k + 2)(k)
(k + 2)(k)(k − 1)
(k + 2)(k)(k − 1)(k − 2)
(k + 2)(k − 1)(k − 2)
(k + 2)
(k + 2)(k − 2)
(k + 2)(k − 1)
(k + 2)(k − 1)(k − 2)
(k + 2)(k)
(k + 2)(k)(k − 2)
(k + 2)(k)(k − 1)
(k + 2)(k)(k − 1)(k − 2)
(k + 2)
(k + 2)(k − 2)
(k + 2)(k − 1)
(k + 2)(k − 1)(k − 2)
(k + 2)(k)(k − 1)(k − 2)
(k + 2)
(k + 2)(k − 3)
(k + 2)(k − 2)
(k + 2)(k − 2)(k − 3)
(k + 2)(k)
(k + 2)(k)(k − 3)
(k + 2)(k)(k − 2)
(k + 2)(k)(k − 2)(k − 3)
(k + 2)(k)(k − 1)(k − 2)(k − 3)
(k + 2)(k − 1)(k − 2)(k − 3)
Fig.
3-a
3-b
3-c
3-d
3-e
3-f
3-g
3-h
3-i
Table 2 Worst-case costs for elementary cycles.
EC
(2)
(2,2)
(3)
(3,3)
(4)
(4,2)
(4,4)
(5)
(5,5)
Length
2
4
3
6
4
6
8
5
10
κ0
24n-64
24n-88
24n-88
24n-112
48n-152
36n-204
36n-204
48n-166
36n-204
π, π −1
2n+2
4n+11
3n+4
6n+26
4n+11
6n+14
8n+46
5n+13
10n+57
The Proposed Method
Total Cost
Cost/Length
28n-60
14n-30
32n-66
8n-16.5
30n-80
10n-26.7
36n-60
6n-10
56n-130
14n-32.5
48n-176
8n-29.4
52n-112
6.5n-14
58n-140
11.6n-28
56n-90
5.6n-9
Total Cost (LNN)
145n2 -666n+772
147n2 -791n+1100
146n2 -804n+1068
149n2 -907n+1474
291n2 -1463n+1868
221n2 -1615n+2483
223n2 -1573n+2678
292n2 -1537n+2057
225n2 -319n+2790
Total Cost
34n-30
34n-64
32n-82
38n-46
50n-84
50n-122
56n-126
60n-130
64n-54
[10]
Cost/Length
17n-15
8.5n-16
10.7n-27.3
6.3n-15.3
12.5n-21
8.3n-20.3
7n-15.7
12n-26
6.4n-5.4
Depth-Optimized Reversible Circuit Synthesis
7
a0
a0
a0
a0
a0
ak-1
ak
ak+1
ak
ak+1
ak+2
k
controls
ak-2
ak-1
k+1
controls
ak-1
ak
ak+1
m=n-k-1
controls
ak+2
m=n-k-1
controls
an-1
(a)
(b)
(c)
(d)
a0
(e)
a0
ak-2
k+1
controls
ak-1
ak
ak+1
ak+2
n-k-2
controls
an-1
ak+1
an-1
an-1
an-1
an-1
ak-2
ak-1
ak
ak
ak+1
k
controls
ak-2
n-k-1
controls
ak-1
ak
n-k-2
controls
ak+1
ak+2
an-1
(g)
(f)
a0
a0
ak-3
ak-2
ak-2
ak-1
ak
ak+1
ak+2
ak-1
ak
ak+2
an-1
an-1
k-1
controls
ak+1
(h)
n-k-1
controls
(i)
Fig. 3 The κ0 circuit structures for different elementary cycles. The circuit structures for cycles (2,2), (3), (3,3), (4,2),
(4,4), (5), and (5,5) are similar to those proposed in [10]. The new circuits for (2) and (4) besides the application of negative
controls and the revised terms in the κ0 circuits improve quantum cost and interaction cost.
Proof In Table 2, the column Cost/Length determines a cost needed for setting a term in each elementary
cycle. To calculate the maximum cost, suppose at most one 3-cycle, one 4-cycle and one 5-cycle are included
which can be synthesized by the related synthesis algorithms. All other terms are supposed to be synthesized
as pairs of 2-cycles. Note that the number of elementary gates for fixing terms in a pair of 2-cycles is greater
than any other pairs (See Table 2). The repeated terms in non-disjoint 5-cycles are synthesized by the
(5,5)-cycle synthesis method.
Accordingly we will have, 3×Cost/Length3 + 4×Cost/Length4 + 5×Cost/Length5 + (2n −12)×Cost/Length2,2
+ ar ×Cost/Length5,5 which leads to 9.4n2n − 18.8 × 2n + 2.8n2 + 43.5n − 152.1 elementary gates in the worstcase with arbitrary interaction and 42.4n2 2n + 11.3n3 + 288.2n2 with limited interaction.
The worst-case quantum cost of [10] is 51n2 2n for architectures with limited interaction.
5 Synthesis with Parallel Structure
In this section, a parallel circuit structure is introduced for reversible logic that can be used to considerably
reduce circuit depth of reversible circuits in most cases. The general idea is to copy input lines into k sets
of zero-initialized ancillae, divide the input specification into k sets of disjoint cycles and then synthesize
each set independently by using the prepared ancillae. The final results can be recovered by several CNOTs.
It should be mentioned that adding ancillae has been previously used for quantum cost reduction in the
synthesis and optimization methods [9, 6]. In the proposed method, ancillae are used for the propose of
depth reduction without considerable overhead on quantum cost, thanks to the specific form of input
representation, i.e., cycle. Note that each cycle can be synthesized by a different synthesis method.
8
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
a0
a1
a0
a1
a0
a1
a0
a1
an-1
an-1
an-1
an-1
0
0
a0
a1
0
0
a0
a1
0
an-1
a0
a1
0
0
0
0
0
an-1
a0
a1
0
an-1
0
an-1
0
0
a0
a1
0
0
a0
a1
0
an-1
0
an-1
(a)
(b)
a0
a1
a2
a3
a0
a1
a2
a3
0
0
0
0
0
0
0
0
a0
a1
a2
a3
a0
a1
a2
a3
0
0
0
0
a0
a1
a2
a3
0
0
0
0
a0
a1
a2
a3
(c)
Fig. 4 (a) The input storing block with linear depth. (b) An alternative circuit structure with improved interaction cost
and linear depth. (c) A logarithmic-depth circuit structure.
Input Storing Block. Copying an arbitrary quantum state is not possible in general but a Boolean value
can be copied into a zero-initialized ancilla by a CNOT gate conditioned on the main line and targeted
on the ancilla. For m n-line zero-initialized ancillae, the input storing block includes mn CNOT gates with
constant depth m. Fig. 4-a shows the input storing block for a circuit with n main lines and m n-line ancillae.
The interaction cost can be calculated as n(n − 1)(1 + 2 + ... + m − 1) = (1/2)nm(n − 1)(m − 1). Fig. 4-b
illustrates another circuit structure with improved interaction cost, mn(n − 1). Circuit depth in Fig. 4-a can
be improved from linear factor to logarithmic factor O(log m) [15] as shown in Fig. 4-c. Thus, interaction
Plog m−1 2i
cost can be calculated as n(n − 1) i=02
2 = (1/2)n(n − 1)(m2 − 2).
Output Restoring Block. Since each subcircuit implements a set of disjoint cycles, for a given input
combination, only one circuit (active) produces the results and the outputs of other subcircuits (inactive)
are the same as the inputs. The number of inactive subcircuits is equal to the number of n-line ancillae
registers, which is even. As such, XORing (by CNOT) the outputs of all subcircuits on the main lines cancels
inputs and restores correct outputs at the main lines. Overall, for m n-line ancillae and m+1 sets of disjoint
cycles, mn CNOTs with depth m are sufficient. Fig. 5-a illustrates the output restoring block for m n-line
ancillae with interaction cost nm(n − 1)(m − 1). CNOT-circuit with common target can be implemented
with logarithmic depth [15] as illustrated in Fig. 5-b for n=4 and m=4. In this case, interaction cost is
Plog m−1 i
n(n − 1) i=02
2 − 1 2i+1 = (1/2)nm(n − 1)(2m + log2 m + 2).
Theorem 2 Consider a given specification F on n lines written as a set of disjoint cycles C1 C2 ...Cm for an odd
m. Assume that subcircuit Li implements Ci . The specification F can be implemented with depth O(depthmax (Li ))
in the presence of m n-line ancillae.
Proof Copying the input lines to m − 1 n-line zero-initialized ancillae replicates inputs at the ancillae.
Disjoint cycles commute. Hence, each subcircuit can be implemented on one register independently. The
input storing/output restoring blocks have constant depth m. Therefore, circuit depth is dominated by the
maximum depth of all subcircuits.
Depth-Optimized Reversible Circuit Synthesis
9
a0
a1
a2
a3
a0
a1
an-1
(a)
(b)
Fig. 5 (a) The output restoring block with linear depth. (b) The output restoring block with logarithmic depth for four
main lines and four 4-line ancillae.
a0
a0
a1
a1
a2
a2
a3
a3
0
G
0
G
0
G
0
G
0
G
0
G
0
G
G
0
Input storing block
Output restoring block
Fig. 6 An example of the proposed parallel cycle-based structure for a 4-line function.
A given specification may contain a set of disjoint cycles with exponential lengths, i.e., O(2n ). In such
cases, circuit depth cannot be further improved by Theorem 2. However, as will be shown in Section 6,
circuit depth can be reduced considerably even with a small number of n-line ancillae. To efficiently employ
the result of Theorem 2, one needs to determine disjoint cycle sets.
Example 1 Assume that the input cycles (1,3) (7,10) (0,4) (6,15) (2,8) (5,13) are given for a circuit with 4 lines.
All cycles are elementary and no decomposition is required. Let 2 4-line ancillae be available and each pair of
2-cycles be assigned to one set, i.e., (1,3) (7,10) to set #1, (0,4) (6,15) to set #2 and (2,8) (5,13) to set #3.
Applying the input storing block provides the input data on the added zero-initialized ancillae. Now, the proposed
method in Section 4 can be applied for each cycle pair which leads to three subcircuits. To combine the results, one
needs to add the output restoring block. Accordingly, total depth is equal to the maximum depth of the synthesized
subcircuits (i.e., 33) plus 4 (2 for each input storing/output restoring block). Fig. 6 illustrates the result.
Cycle Distribution. Consider n elementary cycles and m register sets, including the input register. The
problem is to assign disjoint cycles into different registers such that the total depth of the circuit in each
register is minimized and the depths of the registers are almost equal. To achieve this goal, we modeled the
cycle distribution problem as the bin packing problem1 with a few exceptions. In our modeling, registers
are bins and cycles are objects. Each cycle is decomposed into a set of elementary cycles and cost values in
Table 2 are used as the weights of elementary cycles. If the input permutation is odd, the permutation in
one bin should be odd. Many heuristic algorithms have been developed to solve different variants of the bin
packing problem. Examples include first fit and best fit algorithms.
1 Bin packing problem is a combinatorial NP-hard problem in computational complexity theory in which objects of
different weights must be packed into a finite number of bins of capacity W such that the number of used bins are
minimized. Given a bin of size W and
P a list w1 , ..., wn of sizes of the items, one should find an integer B and a B-partition
wi ≤ W for all k = 1, ..., B. A solution is optimal if it has minimal B.
S1 ∪ ... ∪ SB of {1, ..., n} such that
i∈S
k
10
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
Table 3 Benchmark specifications before and after decomposition.
Benchmark
Function
hwb8
hwb9
hwb10
hwb11
nth prime7
nth prime8
nth prime9
nth prime10
nth prime11
Before
EC
48
54
228
1
1
-
DCM
nEC
16
38
26
186
5
1
3
3
6
(2)
36
152
1
2
1
2
# of Cycles
After DCM
(3)
(4)
(5)
28
16
54
76
154
186
372
3
1
28
1
62
1
125
2
253
1
507
After DCM & DIST
set1
set2
set3
26
28
26
43
43
44
101
103
102
186
186
186
19
6
8
63
122
4
2
96
85
75
315
36
159
Depth
circ1
1923
4344
9929
23862
1519
5852
13783
12115
46888
circ2
1995
4347
10058
23826
390
393
10947
5470
circ3
1953
3988
9898
23827
734
346
9329
23765
total
1999
4351
10062
23866
1523
5852
13787
12119
46892
To solve the problem, a best fit algorithm is developed which sorts c elementary cycles according to their
maximum synthesis costs and proceeds one cycle at a time. To distribute cycles, the first cycle is selected
and temporarily assigned to bin i for 1 ≤ i ≤ m. Then, the total cost is calculated among all the bins and
the cycle is permanently assigned to the bin which results in the lowest total cost. In the case of a tie, the
bins are selected in sequence. The algorithm continues until all the cycles are assigned. Therefore, the total
time complexity is O(c log c) + O(cm2 ). At the end, the algorithm checks the permutation of each bin to
make sure that at most one bin has an odd permutation. Odd permutations need one ancilla in the NCT
library [16]. If more than one bin is found with an odd permutation (called odd bin), the algorithm moves
the smallest odd cycle of the odd bin with maximum depth to the odd bin with the minimum depth. This
can take O(m) time. After the changes, the involved bins should have even permutations. This process is
continued until at most one bin with an odd permutation exists — this occurs when the input permutation
is odd and at least m even permutations exist to fill all the bins. Altogether, the whole process has a time
complexity of O(c log c) + O(cm2 ).
6 Experimental Results
The proposed cycle-based synthesis method for the LNN architecture and the suggested parallel structure
for reversible logic synthesis were implemented in C++ and all of the experiments were performed on
an Intel Pentium IV 2.5GHz computer with 4GB memory. To evaluate the proposed synthesis method,
some of the reversible benchmark functions from [18] were synthesized. The selection criteria for these
benchmarks will be discussed later in this section and their specifications are given in Table 3 before and
after decomposition. The decomposition approach of [10] is used in our method to decompose the input
cycles into the proposed elementary cycles. The number of elementary cycles (EC) and non-elementary
cycles (nEC) of each benchmark is reported is this table. After decomposition, all cycles are elementary
with length<6. Note that [10] proposes the best prior synthesis algorithm for medium-size hwbN and N-th
prime functions if no ancilla is available [18]. While hwbN functions can be implemented with a polynomial
cost O(n log2 n) if a logarithmic number of garbage bits ⌈log n⌉ + 1 is available [18], the proposed approach
is more general and can be applied to many reversible functions.
To evaluate the proposed parallel structure, the cycle-based algorithm of Section 4 was used for synthesizing each subset. Since the number of signals is limited in the current quantum technologies, the minimum
number of ancillae (2 n-line registers) was used. Therefore, the number of input cycles should be >3 to
have at least one cycle in each subset. In our experiments, the results of [14] were used for decomposing
multiple-control Toffoli gates and calculating quantum cost for the gates with negative controls. Besides,
the two-qubit cost model of [6] is used for evaluating the results. A naive SWAP insertion method and the
method of [12] were used to evaluate the results for the LNN architecture. For the naive method, move
and delete rules were applied on the synthesized circuits to remove redundant gates. To estimate circuit
depth, the greedy level compaction algorithm of [14] was implemented without applying the templates.
Table 4 and Table 5 report the quantum cost (QC), the two-qubit cost (2-qubit) and the depth (Depth)
for the synthesized circuits without and with limited interaction. Since [10] does not target the limited
interaction in the LNN architecture, we used the method of [12] on the results of [10] and ours to insert
SWAP gates. Runtime of [10] and our method is less than one minute for the selected benchmarks. In
the proposed method, this time includes the time required for applying the distribution procedure in the
parallel structure and the time required for synthesis and applying the move and delete rules. In the parallel
structure, due to the qubit reordering in [12], at most 3n(3n − 1) SWAP gates are used between the input
storing block, the subsets and the output restoring block to order lines.
Depth-Optimized Reversible Circuit Synthesis
11
Table 4 Comparison of the proposed approach and prior best results. #A is the number of ancillae. R and P are used for
regular and parallel structures, respectively. The resulted circuits are available at http://ceit.aut.ac.ir/˜arabzadeh/results/,
and may be viewed with RCViewer+ [19].
Benchmark
Function
n
hwb8
8
hwb9
9
hwb10
10
hwb11
11
nth prime7
7
nth prime8
8
nth prime9
9
nth prime10
10
nth prime11
11
R/P
R
P
R
P
R
P
R
P
R
P
R
P
R
P
R
P
R
P
The
#A
16
18
20
22
14
18
20
22
Proposed
QC
6686
6964
14474
15262
35298
35890
86864
87234
2888
3100
7016
16820
17507
38843
39317
92863
93389
Method
2-qubit
4468
4730
10382
10764
23584
23874
65260
65442
2296
2398
5624
11907
12053
27743
27933
67401
67677
Depth
5622
1999
12054
4351
29751
10062
71418
23866
2473
1523
5852
14285
13787
31924
12119
75668
46892
QC
[10]
2-qubit
Depth
6940
5348
5442
16173
12479
12472
35618
25453
27812
90745
71175
69763
3172
2841
2514
7618
6622
5793
17975
14076
13941
40301
31841
31254
95433
75474
72934
Average
Improvement (%)
QC
2-qubit Depth
3.6
16.4
-3.3
-0.3
11.5
63.2
10.5
16.8
3.3
5.6
13.7
65.1
0.8
7.3
-6.9
-0.7
6.2
63.8
4.2
8.3
-2.3
3.8
8.0
65.7
8.9
19.1
1.6
2.2
15.5
39.4
7.9
15.0
-1.0
6.4
15.4
-2.4
2.6
14.3
1.1
3.6
12.8
-2.1
2.4
12.2
61.2
2.6
10.6
-3.7
2.1
10.3
35.7
5.4
13.6
-1.9
2.2
11.5
49.4
Table 5 Comparison of the proposed approach and the one in [10] with the nearest neighbor limitation. The improvment column compares the results after applying [12] on both methods. The resulted circuits are available at
http://ceit.aut.ac.ir/˜arabzadeh/results/, and may be viewed with RCViewer+ [19].
Benchmark
Function
n
hwb8
8
hwb9
9
hwb10
10
hwb11
11
nth prime7
7
nth prime8
8
nth prime9
9
nth prime10
10
nth prime11
11
Average
R/P
#A
R
P
R
P
R
P
R
P
R
P
R
P
R
P
R
P
R
P
16
18
20
22
14
18
20
22
The Proposed Method
+Naive
+[12]
QC
Depth
QC
Depth
36684
32313
31553
20940
46788
14758
36045
9248
87310
74676
77860
46958
100228
31810
87389
19597
279496
248524
202903
112623
291014
89021
212616
41479
682182
605294
562817
297986
685944
205472
569876
104372
12264
10649
10922
9799
15106
7734
15897
6930
35976
29975
30796
26920
91984
76910
90511
54457
98686
76020
95362
54850
241538
199996
222865
124122
250526
79165
228777
49613
654910
577721
576047
308413
665132
361756
585165
195500
[10]+[12]
QC
Depth
36732
22720
91805
51181
228240
117893
611843
307114
15356
10130
42059
24574
99003
55737
248901
137091
625320
324005
Improvement (%)
QC
14.0
1.8
15.1
4.8
11.1
6.8
8.0
6.8
28.8
-3.5
26.7
8.5
3.6
10.4
8.0
7.8
6.4
14.6
4.4
Depth
7.8
59.2
8.2
61.7
4.4
64.8
2.9
66.0
3.2
31.5
-9.5
2.2
1.5
9.4
63.8
4.8
39.6
3.8
48.6
As can be seen in Table 5, the effect of the post-process method is more significant for [10] but altogether the results of the proposed LNN-based method are better than those of [10] after applying [12] on
both methods. Notice that using negative controls does not allow to increase the quantum cost. For odd
permutations, one more ancilla should be added. The two-qubit costs are compared in Table 4 and the
results show 13.6% and 11.5% improvement on average for the regular and parallel structures, respectively.
In the parallel structure, the average depth improvement of the N-th prime benchmarks is less than that of
hwbN functions since the input cycles of those functions are unstructured with different cycle lengths which
result in unbalanced subsets after distribution. Input cycle distributions after decomposition (DCM) and
distribution (DIST) are reported in Table 3. For hwbN functions, applying the distribution method leads to
3 sets with almost the same numbers of elementary cycles. We report the circuit depth for each set along
with the total depth after considering the effect of input storing and output restoring blocks in this table. As
12
M. Arabzadeh, M. Saheb Zamani, M. Sedighi, M. Saeedi
reported in Table 3, function nth prime8 has one disjoint input cycle. Accordingly, the resulting elementary
cycles should be assigned to one set by the proposed method.
In choosing the benchmark functions that were considered in this paper, the general guidelines presented
in [10] and [3] were considered. These guidelines stipulate that one of the scenarios in which the cycle-based
methods render significantly superior results is when the input function contains permutations without
regular patterns such as hwbN, N-th prime [10] functions. For this reason, only the results of these functions
are reported in this paper. As for other functions in [18], some are reported in [10] along with a discussion
on their suitability for the cycle-based approach (like Permanent). To avoid being repetitive, we did not
include this set in this paper. There are yet other benchmarks that include important arithmetic functions
like adders, multipliers and group arithmetic (e.g., in Galois Fields). Since the proposed cycle-based synthesis
method is a general synthesis approach, it may not produce interesting results compared to other approaches
specifically developed for those benchmark functions.
7 Conclusion
In this paper, a synthesis approach is proposed in order to reduce logical depth for architectures with limited
interactions which applies a cycle-based approach to synthesize a given specification. The proposed method
focuses on the interaction cost and depth besides the traditional quantum cost metric as a multi-objective
view in the large picture. To achieve this, we redesigned the elementary cycles in [10] with negative controls
and limited interaction between gate lines. Moreover, a new parallel circuit structure was proposed for
reversible logic in the presence of several ancillae registers. Altogether, the mentioned structure, which can
be used with other synthesis methods, filling with the proposed cycle-based synthesis method for interaction
cost leads to our whole flow for depth-optimized reversible circuit synthesis.
A given permutation is written as a set of disjoint cycles to be used in the proposed parallel circuit
structure. Then, the resulting cycles are distributed among the available n-line registers based on the bin
packing problem. The cycles are then synthesized on the assigned registers independently. Our experiments
and analysis show the effectiveness of the proposed approach with and without the interaction cost limitations for the attempted benchmarks and in the worst-case.
References
1. I. L. Markov and M. Saeedi. Constant-optimized quantum circuits for modular multiplication and exponentiation.
Quant. Inf. and Comput., 12(5&6):0361–0394, 2012.
2. S. Aaronson and D. Gottesman. Improved simulation of stabilizer circuits. Phys. Rev. A, 70:052328, 2004.
3. M. Saeedi and I. L. Markov. Synthesis and optimization of reversible circuits - a survey. ACM Computing Surveys,
e-print, arXiv:1110.2574, 2012.
4. D. Cheung, D. Maslov, and S. Severini. Translation techniques between quantum circuit architectures. In Workshop
on Quantum Information Processing, 2007.
5. R. Van Meter and M. Oskin. Architectural implications of quantum computing technologies. J. Emerg. Technol.
Comput. Syst., 2(1):31–63, 2006.
6. D. Maslov and M. Saeedi. Reversible circuit optimization via leaving the Boolean domain. IEEE Trans. on CAD,
30(6):806–816, 2011.
7. P. Gupta, A. Agrawal, and N. K. Jha. An algorithm for synthesis of reversible logic circuits. IEEE Trans. on CAD,
25(11):2317–2330, 2006.
8. D. Maslov, G. W. Dueck, and D. M. Miller. Techniques for the synthesis of reversible Toffoli networks. ACM Trans.
Des. Autom. Electron. Syst., 12(4):42, 2007.
9. R. Wille and R. Drechsler. BDD-based synthesis of reversible logic for large functions. Design Autom. Conf., pages
270–275, 2009.
10. M. Saeedi, M. Saheb Zamani, M. Sedighi, and Z. Sasanian. Reversible circuit synthesis using a cycle-based approach.
J. Emerg. Technol. in Comput. Syst., 6(4):1–26, December 2010.
11. D. M. Miller, R. Wille, and R. Drechsler. Reducing reversible circuit cost by adding lines. Int’l Symp. on Multiple-Valued
Logic, pages 217–222, 2010.
12. M. Saeedi, R. Wille, and R. Drechsler. Synthesis of quantum circuits for nearest neighbor architectures. Quant. Inf.
Proc., 10(3):355–377, 2011.
13. Y. Hirata, M. Nakanishi, S. Yamashita, and Y. Nakashima. An efficient conversion of quantum circuits to a linear
nearest neighbor architecture. Quant. Inf. and Comput., 11(1&2):0142–0166, 2011.
14. D. Maslov, G. W. Dueck, D. M. Miller, and C. Negrevergne. Quantum circuit simplification and level compaction.
IEEE Trans. on CAD, 27(3):436–444, March 2008.
15. C. Moore and M. Nilsson. Parallel quantum computation and quantum codes. SIAM Journal on Computing, 31:799–
815, 2001.
16. V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes. Synthesis of reversible logic circuits. IEEE Trans. on CAD,
22(6):710–722, June 2003.
17. M. Saeedi, M. Sedighi, and M. Saheb Zamani. A library-based synthesis methodology for reversible logic. Microelectron.
J., 41(4):185–194, Apr 2010.
Depth-Optimized Reversible Circuit Synthesis
13
18. D. Maslov. Reversible logic synthesis benchmarks page. http://webhome.cs.uvic.ca/˜dmaslov, 2011.
19. M. Arabzadeh, and M. Saeedi. RCviewer+, A viewer/analyzer for reversible and quantum circuits, version 2.41. available
at http://ceit.aut.ac.ir/QDA/RCV.htm, 2011.