# Improving Synthesis of Reversible Circuits: Exploiting Redundancies in Paths and Nodes of QMDDs 

Alwin Zulehner and Robert Wille<br>Institute for Integrated Circuits, Johannes Kepler University Linz, Austria<br>\{alwin.zulehner, robert.wille\}@jku.at


#### Abstract

In recent years, reversible circuits have become an established emerging technology through their variety of applications. Since these circuits employ a completely different structure from conventional circuitry, dedicated functional synthesis algorithms have been proposed. Although scalability has been achieved by using approaches based on decision diagrams, the resulting circuits employ a significant complexity measured in terms of quantum cost. In this paper, we aim for a reduction of this complexity. To this end, we review QMDD-based synthesis. Based on that, we propose optimizations that allow for a substantial reduction of the quantum costs by jointly considering paths and nodes in the decision diagram that employ a certain redundancy. In fact, in our experimental evaluation, we observe substantial improvements of up to three orders of magnitudes in terms of runtime and up to six orders of magnitudes (a factor of one million) in terms of quantum cost.


## 1 Introduction

In the recent years, reversible circuits - circuits that additionally allow to compute the inputs from the outputs - have become an established field in research due to their characteristics regarding low power design (based on the seminal work of Landauer [5] and Bennett [3]) and their application in quantum computations [10] - a new computation paradigm that allows for solving certain tasks substantially faster. Most recently, reversible circuits also found their application in the design of on-chip interconnects [18, 20], encoders [21], and verification [1].

Reversible circuits employ a completely different structure compared to conventional circuits, because not only the outputs are determined by the inputs, but also vice versa. This leads to a structure where circuits consist of a set of circuit lines that are passed through a cascade of reversible gates and, hence, disallow direct feedback and fanout. Consequently, dedicated synthesis approaches that establish the desired unique mapping from inputs to outputs are required. Although a variety of synthesis approaches exist, functional synthesis approaches are of special interest, since they result in a circuit where the number of circuit lines is minimal. Over the years, several such synthesis approaches have been proposed, ranging from exact ones (i.e. the number of gates is minimal) [4] to heuristic ones based on truth tables [12, 6]. However, since their underlying data structure is exponential they suffer from limited scalability. Therefore, methods
based on a more compact representation of the function to be synthesized such as decision diagrams [16, 15] or Boolean satisfiability [13] have been proposed. They, in contrast, yield rather costly circuits.

In this paper, we focus on QMDD-based synthesis [16], one of the scalable synthesis approaches listed above. This synthesis approach is based on Quantum Multiple-Valued Decision Diagrams (QMDDs $[7,11]$ ), a decision diagram introduced for the compact representation of reversible and quantum computations. The main premise of QMDD-based synthesis is to employ reversible gates which transform each QMDD node to the identity. By doing that for all nodes of a QMDD (representing the function to be synthesized) in a breadth first fashion, a circuit realizing the given function results. However, during this process often equal cases are considered iteratively; for each, the same cascade of gates are applied - leading to a large number of redundant steps and, hence, gates.

In this work, we introduce a new QMDD-based synthesis approach which aims to consider those redundant cases only once. To this end, a scheme is employed which considers all paths to the currently considered node together and performs logic minimization to reduce their number. Furthermore, nodes which can be transformed to the identity with the same cascade of gates are considered jointly. This allows for a substantial reduction of gates. Experimental evaluations show that the proposed scheme improves the current state of the art QMDD-based synthesis by several orders of magnitudes in terms of runtime as well as in terms of quantum cost (an abstract measure of the complexity of the resulting circuit).

This paper is structured as follows. In Section 2, we briefly review reversible functions and their representation as well as reversible circuits. Based on that, we review QMDD-based synthesis in Section 3 and discuss the proposed optimizations in Section 4. In Section 5, we discuss the improvement that can be achieved by the proposed optimization, whereas Section 6 experimentally confirms these improvements. Section 7 concludes the paper.

## 2 Background

In this section we briefly recapitulate reversible functions including their efficient representation as well as reversible circuits.

### 2.1 Reversible Functions

A Boolean function is reversible if the mapping from inputs to outputs establishes a bijection, i.e. the inputs can also be computed from the outputs.

Definition 1. A Boolean function $f: \mathbb{B}^{m} \rightarrow \mathbb{B}^{n}$ is reversible iff $n=m$ and there exists a unique mapping from inputs to outputs and vice versa.

Besides truth tables, reversible functions can also be represented by means of a permutation matrix.

Definition 2. Consider a reversible Boolean function $f: \mathbb{B}^{n} \rightarrow \mathbb{B}^{n}$. Then, the permutation matrix of $f$ is a $2^{n} \times 2^{n}$ matrix with entries $m_{i, j}, 0 \leq i, j<2^{n}$ such that

$$
m_{i, j}= \begin{cases}1 & \text { if } f(j)=i \\ 0 & \text { otherwise }\end{cases}
$$

A 1-entry in the permutation matrix means that an input (column) is mapped to an output (row) by $f$. All other entries of the permutation matrix are zero. Since a permutation represents a unique mapping (i.e. a reversible function), it contains exactly one 1-entry in each column and in each row.

Example 1. Consider the reversible function depicted in Fig. 1a. The function is reversible because the number of inputs is equal to the number of outputs and each output pattern occurs exactly once. Consequently, the input can be uniquely determined having the output only. The permutation matrix of this reversible function is shown in Fig. 1b. Here, reversibility can easily be seen since each column as well as each row contains exactly one 1-entry.

| $x_{1}$ | $x_{2}$ | $x_{3}$ | $x_{1}^{\prime}$ | $x_{2}^{\prime}$ | $x_{3}^{\prime}$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 1 | 0 |
| 0 | 0 | 1 | 0 | 1 | 1 |
| 0 | 1 | 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | 0 | 0 | 1 |
| 1 | 0 | 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 |

(a) Truth table

(b) Permutation matrix

Fig. 1. Representation of reversible functions

### 2.2 Quantum Multiple-Valued Decision Diagrams (QMDDs)

The description means for reversible functions discussed in the previous section is rather limited. Since the size of truth tables as well as the size of permutation matrices grows exponentially with the number of variables, a more scalable representation is desirable in order to represent large functions. However, permutation matrices can be represented more compactly - and with non-exponential space in many relevant cases - using so called Quantum Multiple-Valued Decision Diagrams (QMDDs [7, 11]). QMDDs were initially introduced to represent the
unitary matrices of quantum computations. Since quantum computations are reversible and the matrices are also of dimension $2^{n} \times 2^{n}$, QMDDs are perfectly suited for representing permutation matrices. For simplicity, we discuss only those aspects of QMDDs that are relevant for this paper and omit all quantum related issues.

The main idea of QMDDs is a recursive decomposition of a permutation matrix $M$ over its variables. A variable $x_{i}$ represents a mapping from the $i^{t h}$ input to the $i^{t h}$ output of the function. There exist four possible mappings of a variable, since an input may be mapped from 0 to 0 , from 0 to 1 , from 1 to 0 , or from 1 to 1 . Considering the most significant variable $x_{1}$ of the permutation matrix, these four different mappings exactly describe the four quadrants of the matrix, which we denote $M_{0 \rightarrow 0}, M_{1 \rightarrow 0}, M_{0 \rightarrow 1}$, and $M_{1 \rightarrow 1}$. This decomposition can be represented by a decision diagram node labeled $x_{1}$ with four outgoing edges, describing exactly those quadrants $M_{0 \rightarrow 0}, M_{1 \rightarrow 0}, M_{0 \rightarrow 1}$, and $M_{1 \rightarrow 1}$ from left to right.

This decomposition is recursively applied until a single entry is reached. Such an entry is then represented by a terminal. The compactness of QMDDs results - as for other types of decision diagrams - from sharing equal nodes (and thus representing equal sub-matrices). For simpler graphical visualization, we represent 0 -matrices (i.e. matrices containing 0 -entries only) with a 0 -stub, independent of their dimension.

Example 2. Consider again the permutation matrix depicted in Fig. 1b. The decomposition over its variables $x_{1}, x_{2}$, and $x_{3}$ yields the QMDD shown in Fig. 2. Note that the path highlighted in bold traverses the third edge of the node labeled $x_{1}$, the second edge of the node labeled $x_{2}$ and the first edge of the node labeled $x_{3}$. Consequently, this path describes a mapping of variable $x_{1}$ from 0 to 1 , a mapping of variable $x_{2}$ from 1 to 0 , and a mapping of variable $x_{3}$ from 0 to 0 , i.e. a mapping from input $x_{1} x_{2} x_{3}=010$ to output $x_{1} x_{2} x_{3}=100$.


Fig. 2. QMDD of the permutation matrix shown in Fig. 1b

For a formal definition of QMDDs, as well as manipulation algorithms such as matrix multiplication we refer to [11].

### 2.3 Reversible Circuits

The structure of reversible circuits is completely different to classical circuitry, because fanout and feedback are not directly allowed. A reversible circuit rather consists of a set of circuit lines (one for each variable of the realized reversible function) which are passed through a cascade of reversible gates. The values of the circuit lines may be changed in a reversible fashion by these gate or passed through unaltered. In this paper, we focus on Toffoli gates - a reversible gate that has been proven to be universal (i.e. all reversible functions can be realized using Toffoli gates only).

Definition 3. Consider a set $X=\left\{x_{1}, x_{2}, \ldots, x_{n}\right\}$ of $n$ circuit lines and $a$ sequence $G=g_{1}, g_{2}, \ldots, g_{k}$ of $k$ reversible gates. Then, the pair $C=(X, G)$ describes a reversible circuit. A Toffoli gate $g_{i}=\operatorname{TOF}\left(C_{i}, t_{i}\right)$ consists of a set $C_{i} \subseteq\left\{x_{i}^{-} \mid x_{i} \in X\right\} \cup\left\{x_{i}^{+} \mid x_{i} \in X\right\}$ of negative and positive control lines, and $a$ target line $t_{i}$. The Boolean value of the target line is inverted iff the Boolean value of all positive and negative control lines is 1 and 0, respectively. All circuit lines except the target line are passed unaltered through the gate.

Toffoli gates as defined above are self-inverse, i.e. applying a Toffoli gate twice results in the identity. For graphical visualization we use the symbols $\bullet, \bigcirc$, and $\oplus$ to represent a positive control line, a negative control line, and a target line, respectively.

Example 3. Consider the reversible circuit shown in Fig. 3. The circuit is composed of three Toffoli gates and is labeled with the intermediate values resulting when the input lines are assigned $x_{1} x_{2} x_{3}=010$. The first gate $g_{1}=$ $\operatorname{TOF}\left(\left\{x_{1}^{-}, x_{2}^{+}\right\}, x_{3}\right)$ inverts the value of target line $x_{3}$, because the negative control line $x_{1}$ is assigned 0 and the positive control line $x_{2}$ is assigned 1. Similarly, the second gate $g_{2}=\operatorname{TOF}\left(\left\{x_{3}^{+}\right\}, x_{1}\right)$ is activated, because the value of circuit line $x_{3}$ is now 1 . Consequently, the value of target line $x_{1}$ is changed to 1 . The last gate $g_{3}=\operatorname{TOF}\left(\left\{x_{2}^{+}, x_{3}^{-}\right\}, x_{1}\right)$ does not change the value of target line $x_{1}$, because the negative control line $x_{3}$ is assigned 1 . Therefore, all three circuit lines are passed through the gate unaltered in this case. Eventually, the circuit maps input $x_{1} x_{2} x_{3}=010$ to output $x_{1}^{\prime} x_{2}^{\prime} x_{3}^{\prime}=111$.


Fig. 3. Reversible circuit

The complexity of reversible circuits is usually measured in terms of quantum cost. These cost result from mapping the circuit to specific libraries of quantum gates. Two commonly used libraries for determining the quantum cost of a reversible circuit are the $N C V$ library [8] and the Clifford $+T$ library [2]. Here, the quantum cost are determined by the overall number of gates ( $N C V$-cost) or the length of the sequence of $T$-gates (and, therefore, denoted $T$-depth), respectively. As shown in Table 1, the $N C V$-cost as well as the $T$-depth of a Toffoli gate depends on the number of control lines. The quantum costs listed in Table 1 were determined using RevKit [14].

Table 1. Quantum cost of Toffoli gates

| Control lines | $N C V$-cost | $T$-depth |
| ---: | ---: | ---: |
| 1 | 1 | 0 |
| 2 | 5 | 3 |
| 3 | 20 | 12 |
| 4 | 40 | 24 |
| 5 | 60 | 36 |

Example 3 (continued). The NCV-cost and the T-depth of the circuit shown in Fig. 3 are 11 and 6, respectively.

## 3 QMDD-based Synthesis

In this section, we review QMDD-based synthesis (originally proposed in [16]). As discussed in Section 1, QMDD-based synthesis is a functional synthesis approach which yields a circuit with the minimal number of circuit lines. The main idea behind the algorithm is described as follows.

Assume the function to be synthesized is described by a permutation matrix $M$. Then, due to reversibility its inverse $M^{-1}$ exists, and their product $M \circ M^{-1}=I$ is the identity matrix. Consequently, if we find a cascade of reversible gates $G$ that transforms $M$ to the identity, we implicitly found a reversible circuit for $M^{-1}$. Reversing $G$ yields a reversible circuit that realizes $M$ (because the Toffoli gates are self-inverse). Therefore, an algorithm is required that transforms the QMDD representing $M$ to the identity. Since the identity matrix only maps input 0 to output 0 and input 1 to output 1 , the identity QMDD imposes the structure depicted in Fig. 4.

To obtain the desired identity QMDD, we traverse the QMDD in breadth-first manner and transform each node we encounter to the identity structure shown in Fig. 5 by applying Toffoli gates. To this end, the paths to the 1-terminal (called 1-paths in the following) through the second and third edge of the currently considered node have to be moved to the first and fourth edge, respectively. These 1-paths refer to the input of the encountered variables and therefore contain a


Fig. 4. Identity QMDD with $n$ variables


Fig. 5. Identity structure
negative literal $\bar{x}_{j}$ (positive literal $x_{j}$ ) whenever the first or third (second or fourth) edge of a node labeled $x_{j}$ is traversed.

In the following, we denote the sets of 1-paths through the first, second, third and fourth edge of the currently considered node by $P_{1}, P_{2}, P_{3}$, and $P_{4}$, respectively. Similarly, we refer to the sets of 0-paths (i.e. paths that terminate in a 0 -stub) with $\bar{P}_{1}, \bar{P}_{2}, \bar{P}_{3}$, and $\bar{P}_{4}$. Since the QMDD represents a permutation matrix, each column and each row of the matrix contains exactly one 1-entry. Therefore, the number of 0-paths in $\bar{P}_{1}\left(\bar{P}_{4}\right)$ is equal to the number of 1-paths in $P_{2}\left(P_{3}\right)$, i.e. $\left|\bar{P}_{1}\right|=\left|P_{2}\right|$. Moreover, $\bar{P}_{1}=P_{3}$ and $\bar{P}_{4}=P_{2}$.

Example 4. Consider the QMDD shown in Fig. 2 and assume that the root node is currently considered. The sets of 1-paths are $P_{1}=P_{4}=\left\{\bar{x}_{2} \bar{x}_{3}, \bar{x}_{2} x_{3}, x_{2} x_{3}\right\}$ and $P_{2}=P_{3}=\left\{x_{2} \bar{x}_{3}\right\}$.

To obtain the identity structure for the currently considered node labeled $x_{i}$, we swap the 1-paths in $P_{2}$ with the 0-paths in $\bar{P}_{1}$. This inherently swaps the 1-paths in $P_{3}$ with the 0-paths in $\bar{P}_{4}$. Swapping paths can be accomplished by applying Toffoli gates, since applying a Toffoli gate $\operatorname{TOF}\left(C, x_{i}\right)$ inverts the input of variable $x_{i}$ for all paths that are represented by $C$.

Example 4 (continued). The path $p=x_{2} \bar{x}_{3}$ is contained in the set $P_{2}$ of 1-paths through the second edge as well as in the set $\bar{P}_{1}$ of 0 -paths through the first edge. These two paths can be swapped by applying the Toffoli gate TOF $\left(\left\{x_{2}^{+}, x_{3}^{-}\right\}, x_{1}\right)$. This automatically swaps the 1-path in $P_{3}$ with the 0 -path in $\bar{P}_{4}$. The resulting QMDD is depicted in Fig. 6.


Fig. 6. QMDD resulting from transformation of the root node of the QMDD in Fig. 2

For a more formal description of how the currently considered node can be transformed to the desired identity structure we refer to [16], because understanding the main idea of QMDD-based synthesis (the breadth-first traversal of the nodes) is sufficient for the purpose of this paper.

If the currently considered node is not the root node of the QMDD, we have to ensure that no other node is affected by the applied gates. To this end, we add further control lines that describe the path to the currently considered node to each Toffoli gate that is applied. More precisely, if the path to the currently considered node traverses the first edge of another node (representing a mapping from 0 to 0 ), we add a negative control line for the corresponding variable. Analogously, we add a positive control line for the corresponding variable if the path traverses the fourth edge of that node ${ }^{1}$. If there exist $k$ such paths to the currently considered node, we have to replicate each Toffoli gate for each of those $k$ paths in order to eventually transform the currently considered node to the identity structure.

Example 5. Consider the QMDD depicted in Fig. 7 and assume that the node highlighted in blue is currently considered. This node can be transformed to the desired identity structure by applying a Toffoli gate with target line $x_{3}$. Since there exist two paths to this node, namely $\bar{x}_{1} x_{2}$ and $x_{1} \bar{x}_{2}$, we have to apply two gates $\operatorname{TOF}\left(\left\{x_{1}^{-} x_{2}^{+}\right\}, x_{3}\right)$ and $\operatorname{TOF}\left(\left\{x_{1}^{+} x_{2}^{-}\right\}, x_{3}\right)$ to eventually transform this node to the desired identity structure. The resulting circuit is shown in Fig. 8a.

## 4 Improving QMDD-based Synthesis

In the synthesis scheme originally proposed in [16] and reviewed above, the overall number of paths to the nodes of a certain variable grows exponentially with the number of variables above. This leads to a substantial number of gates with (partially) redundant sets of control lines. This poses a significant drawback to QMDD-based synthesis since

[^0]

Fig. 7. Paths to the currently considered node


Fig. 8. Gates required to transform the currently considered node from Fig. 7

- a significant number of gates is applied to transform each single node of the considered QMDD to the identity structure and
- the applied gates usually include rather large sets of control lines.

More precisely, the number of gates is heavily influenced by the number of paths to the currently considered node, because each gate that is required to transform the currently considered node to the identity has to be replicated for each path. Furthermore, the number of control lines of these Toffoli gates depend on the number of literals of the path from the root node to the currently considered node (because these literals are added to the Toffoli gates in form of control lines to ensure that no other node is affected). Since the overall costs of a reversible circuit depend on both, the total number of gates as well as the respective number of control lines, this makes circuits generated using QMDD-based synthesis rather expensive.

In order to address the problem, we propose two optimization techniques to reduce the cost of the circuits generated by QMDD-based synthesis, namely

1. a straightforward solution which performs logic minimization on the paths to the currently considered node to reduce the number of paths as well as the number of their literals and
2. a more elaborate approach which considers nodes that require the same sequence of Toffoli gates in order to get transformed to the identity structure jointly.

The straight forward solution utilizes logic minimization techniques to reduce the overall number of paths to the currently considered node as well as to reduce the overall number of literals in the paths. To this end, each path to the currently considered node is described as a product (conjunction) of its literals. Then, the exclusive sum (exclusive disjunction) of all these products is formed. The sum has to be exclusive, because applying a Toffoli gate an even times does not have any effect (since a Toffoli gate is self-inverse). The resulting Exclusive Sum of Products (ESoP) can be minimized using techniques such as proposed in [9]. Such a minimization reduces the overall number of products (and, hence, the number of paths and gate replications) as well as the number of literals in these products (and, hence, the number of control lines that have to be added to each gate).

Example 6. Consider again the QMDD depicted in Fig. 7 and assume that the node highlighted in blue is currently considered. There exist two paths to the currently considered node, namely $x_{1} \bar{x}_{2}$ and $\bar{x}_{1} x_{2}$. The ESoP of the two paths is then $x_{1} \bar{x}_{2} \oplus \bar{x}_{1} x_{2}$. Minimizing this ESoP yields $x_{1} \bar{x}_{2} \oplus \bar{x}_{1} x_{2}=x_{1} \oplus x_{2}$. Consequently, also the two paths $x_{1}$ and $x_{2}$ can be used to describe all paths to the currently considered node. The resulting gates are shown in Fig. 8b. Although the number of paths (and therefore the number of gates) did not change, the $N C V$-cost and $T$-depth are reduced from 10 to 2 and from 6 to 0 , respectively.

The more elaborated optimization approach aims to further reduce the cost of the circuits considering more than one node simultaneously. The general idea is based on the key observation that sometimes different QMDD nodes require the same sequence of Toffoli gates in order to get transformed to the identity. As described in Section 3, this sequence of Toffoli gates depends on the set $P_{1}$ of 1-paths through the first edge and the set $P_{2}$ of 1-paths through the second edge only. From these two sets, the other sets of 1-paths $\left(P_{3}=\bar{P}_{1}, P_{4}=\bar{P}_{2}\right)$ as well as the set of 0-paths can uniquely be determined. Cases frequently occur where nodes in the QMDD have equal sets of 1-paths $P_{1}$ and $P_{2}$, even though they are structurally different.

Example 7. Consider the QMDD depicted in Fig. 9. The root node already establishes the desired identity structure and the two nodes labeled $x_{2}$ (highlighted in blue) are structurally different. However, their sets of 1-paths are equal, i.e. $P_{1}=\left\{\bar{x}_{3}\right\}$ and $P_{2}=\left\{\bar{x}_{3}\right\}$ for both nodes. Consequently, both nodes can be transformed to the identity structure with the same sequence of Toffoli gates. One possible sequence is $\operatorname{TOF}\left(\left\{x_{2}^{+}\right\}, x_{3}\right), \operatorname{TOF}\left(\left\{x_{3}^{+}\right\}, x_{2}\right)$.

Without applying this scheme, the sequence of Toffoli gates has to be replicated twice - once for each node (including control line $x_{1}^{-}$for the left node and control line $x_{1}^{+}$for the right node). This resulting in the circuit shown in Fig. 10a. The gates $g_{1}$ and $g_{2}$ thereby transform the left node to the identity, whereas the gates $g_{3}$ and $g_{4}$ transform the right node to the identity.

Since QMDD nodes that employ an equal characteristic regarding their sets of 1-paths $P_{1}$ and $P_{2}$ can be transformed to the identity structure with the same


Fig. 9. QMDD nodes with equal sets $P_{1}$ and $P_{2}$


Fig. 10. Gates required to transform the nodes labeled $x_{2}$
sequence of Toffoli gates, they can considered jointly for synthesis purposes and thus processed together. To this end, we form the ESoP of the paths to all nodes with equal sets $P_{1}$ and $P_{2}$ and apply the logic minimization as described in the straight forward approach.

Example 7 (continued). Since both nodes labeled $x_{2}$ have equal sets of 1-paths $P_{1}$ and $P_{2}$, they can be considered jointly and processed together. The minimized ESoP of all paths to these nodes is $x_{1} \oplus \bar{x}_{1}=1$, a sum consisting of a single product without any literals. Therefore no additional control lines are required (all nodes labeled $x_{2}$ can be considered jointly). The resulting circuit that transforms all nodes labeled $x_{2}$ to the identity structure is shown in Fig. 10b. Compared to the gate sequence depicted in Fig. 10a we observe a reduction of the $N C V$-cost from 20 to 2 and a reduction of the $T$-depth from 12 to 0 .

## 5 Discussion

In this section, we briefly analyze the potential of the optimization scheme described above, i.e. we discuss how many nodes might be considered together in the best case. To this end, we assume that the nodes of $n$ variables are left to be processed, i.e. the currently considered nodes are sub-QMDDs with height $n$, and that the sequence of Toffoli gates that transforms these nodes to the identity structure is uniquely determined by the set of 1-paths $P_{1}$ and $P_{2}$.

First, we determine how many different sequences of Toffoli gates exist. Since we assume that the sequence of Toffoli gates is uniquely determined by $P_{1}$ and $P_{2}$, we analyze how many combinations of sets $P_{1}$ and $P_{2}$ exist: The QMDD of the currently considered node represents a $2^{n} \times 2^{n}$ permutation matrix (since all nodes above already employ the identity structure). Consequently, there must be exactly one 1-entry in each of the $2^{n}$ rows and in each of the $2^{n}$ columns. This means, that there must be exactly $2^{n-1} 1$-entries in the upper half of the matrix, i.e. $\left|P_{1}\right|+\left|P_{2}\right|=2^{n-1}$. These $2^{n-1} 1$-entries (1-paths) are arbitrarily distributed in the $2^{n}$ columns (in the sets $P_{1}$ and $P_{2}$ ). Consequently, there exist $\binom{2^{n}}{2^{n-1}}$ possibilities in which rows the 1-entries are located, i.e. possible different pairs of sets $\left(P_{1}, P_{2}\right)$. If we assume $n=2$, there exist $\binom{2^{2}}{2^{2-1}}=6$ different sequences that transform a currently considered node to the identity. These sequences as well as their corresponding sets $P_{1}$ and $P_{2}$ are depicted in Fig. 11.


Fig. 11. Sequences of Toffoli gates for $n=2$

As a second step, we analyze how many different sub-QMDDs with $n$ variables exist. Recall, that a QMDD composed of $n$ variables represents a permutation matrix of dimension $2^{n} \times 2^{n}$, i.e. a matrix that represents a permutation of $2^{n}$ elements. Since $2^{n}$ elements can be permuted in $2^{n}$ ! ways, there exist $2^{n}$ ! structurally different QMDDs with $n$ variables. Considering again that $n=2$, there exist $2^{2}!=4!=24$ structurally different QMDDs.

Having an arithmetic expression for the number of sequences as well as for the number of QMDDs allows one to analyze the potential of the proposed optimization. The resulting numbers for several values of $n$ are provided in Table 2 . As one can easily see, there are many more different QMDDs than sequences, because $\binom{2^{n}}{2^{n-1}} \ll 2^{n}$ !. Consider the case that the $n=3$ variables of the QMDD are not yet processed. In the worst case, the QMDD has 40320 nodes labeled with the currently considered variable. For each of those nodes, a sequence of Toffoli gates has to be determined. If we apply the proposed optimization (i.e. if we jointly consider nodes with equal sets of 1-paths), the number of nodes that have to be processed drops to 70 - reducing the computational effort by a factor of 576 . Furthermore, the logic minimization used to reduce the paths to the cur-
rently considered nodes is applied to a larger set of paths, which makes it more likely to obtain a more compact ESoP.

Table 2. Potential of the proposed optimization

| $n$ | No. sequences $\binom{2^{n}}{2^{n-1}}$ | No. QMDDs $2^{n}!$ |
| ---: | ---: | ---: |
| 2 | 6 | 24 |
| 3 | 70 | 40320 |
| 4 | 12870 | $2 \cdot 10^{13}$ |
| 5 | $6 \cdot 10^{8}$ | $2.6 \cdot 10^{35}$ |

Obviously, it is more likely that many nodes can be processed together if their currently considered variable is the label for a large number of nodes. Therefore, this optimization has a higher impact on large QMDDs than on small ones. Since we observed that large QMDDs tend to yield circuits with rather high quantum cost, we expect higher improvements for these cases.

## 6 Experimental Results

In this section, we evaluate the reduction of the quantum cost we achieve by applying the proposed optimizations to the QMDD-based synthesis algorithm. To this end, we have reimplemented the QMDD-based synthesis as originally proposed in [16] (including some minor optimizations regarding performance) in C++ using the QMDD package provided in [11] and the BDD package CUDD [17]. This implementation represents the current state-of-the-art and serves as baseline. Based on that, we have implemented the optimizations discussed in Section 4. In the following, Scheme $A$ denotes the optimization where the paths to the currently node are reduced using logic minimization ${ }^{2}$ and Scheme $B$ denotes the case if we additionally process nodes with equal sets of 1-paths jointly. As benchmarks served the reversible circuits provided at RevLib [19]. All experiments were conducted on a 4 GHz processor with 32 GB of memory running Linux 4.4.

Table 3 summarizes the experimental results. The first two columns list the name of the benchmark and the number of circuit lines $n$. Then, for each synthesis scheme, the runtime, the $N C V$-cost as well as the $T$-depth is listed. Finally, we list the reduction of the quantum cost for Scheme $A$ with respect to the baseline and for Scheme $B$ with respect to Scheme $A$. Since the improvement rates observed for $N C V$-cost and for $T$-depth are almost identical for each of the benchmarks (they deviate in a fraction of a percent only), we only list the improvement regarding $T$-depth in the last two columns of Table 3.

The obtained results clearly show a significant improvement in terms of runtime. While the original approach requires a significant amount of runtime for

[^1]some benchmarks (e.g. more than 1000 seconds for benchmarks sym9, rd84, and cycle10), the optimizations proposed in this paper allowed to synthesize all benchmarks within a few seconds (Schemes $A$ and $B$ ). Only one benchmark (cordic) required slightly more than a minute.

Furthermore, a substantial improvement in terms of quantum cost can be observed for the benchmarks. For all benchmarks that result in a circuit with a $T$-depth of more than half a million using the original approach, substantial improvements of several orders of magnitudes were determined. Consider for example the benchmarks plus127mod8192 and plus63mod8192. Performing logic optimizations on the paths to the currently considered node (i.e. Scheme $A$ ) already result in a reduction of the quantum cost by a factor of 145.21 . If we additionally transform nodes together that have equal sets of 1-paths (i.e. Scheme $B$ ), we get another improvement by a factor of 8029.35 and, hence, an overall improvement of six orders of magnitudes. On average, we observe an improvement by a factor of 4.22 for Scheme $A$ with respect to the original approach and an improvement by a factor of 5.35 of Scheme $B$ with respect to Scheme $A$. This results in an overall improvement by a factor of 22.57 .

## 7 Conclusion

In this paper, we reviewed the QMDD-based synthesis algorithm (proposed in [16]) for reversible circuits. Based on that review, we discovered cases that result in the same sequence of Toffoli gates, but are considered iteratively leading to a substantial overhead in the number of gates. To reduce the costs of the resulting circuits, we proposed optimizations that consider such redundant cases jointly during synthesis. Experimental evaluations show that substantial improvements to the current state-of-the-art can be achieved. More precisely, improvements of up to three orders of magnitudes were observed for the runtime and improvements up to six orders of magnitudes were observed regarding the quantum cost of the resulting circuits.

## Acknowledgements

This work has partially been supported by the European Union through the COST Action IC1405.

## References

1. L. G. Amarù, P. Gaillardon, R. Wille, and G. D. Micheli. Exploiting inherent characteristics of reversible circuits for faster combinational equivalence checking. In Design, Automation and Test in Europe, pages 175-180, 2016.
2. M. Amy, D. Maslov, M. Mosca, and M. Roetteler. A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. on CAD of Integrated Circuits and Systems, 32(6):818-830, 2013.
Table 3. T-depth improvements compared to the state-of-the-art

3. C. H. Bennett. Logical reversibility of computation. IBM J. Res. Dev, 17(6):525532, 1973.
4. D. Große, R. Wille, G. W. Dueck, and R. Drechsler. Exact multiple control Toffoli network synthesis with SAT techniques. IEEE Trans. on CAD, 28(5):703-715, 2009.
5. R. Landauer. Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3):183-191, 1961.
6. D. M. Miller, D. Maslov, and G. W. Dueck. A transformation based algorithm for reversible logic synthesis. In Design Automation Conf., pages 318-323, 2003.
7. D. M. Miller and M. A. Thornton. QMDD: A decision diagram structure for reversible and quantum circuits. In Int'l Symp. on Multi-Valued Logic, page 6, 2006.
8. D. M. Miller, R. Wille, and Z. Sasanian. Elementary quantum gate realizations for multiple-control Toffolli gates. In Int'l Symp. on Multi-Valued Logic, pages 288-293, 2011.
9. A. Mishchenko and M. Perkowski. Fast heuristic minimization of exclusive-sums-of-products. In Int'l Workshop on Applications of the Reed-Muller Expansion in Circuit Design, pages 242-250, 2001.
10. M. Nielsen and I. Chuang. Quantum Computation and Quantum Information. Cambridge Univ. Press, 2000.
11. P. Niemann, R. Wille, D. M. Miller, M. A. Thornton, and R. Drechsler. QMDDs: Efficient quantum function representation and manipulation. IEEE Trans. on CAD, 35(1):86-99, 2016.
12. V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes. Reversible logic circuit synthesis. In Int'l Conf. on CAD, pages 353-360, 2002.
13. M. Soeken, G. W. Dueck, and D. M. Miller. A fast symbolic transformation based algorithm for reversible logic synthesis. In Reversible Computation - 8th Int'l Conf., pages 307-321, 2016.
14. M. Soeken, S. Frehse, R. Wille, and R. Drechsler. RevKit: A toolkit for reversible circuit design. In Workshop on Reversible Computation, pages 69-72, 2010. RevKit is available at http://www.revkit.org.
15. M. Soeken, L. Tague, G. W. Dueck, and R. Drechsler. Ancilla-free synthesis of large reversible functions using binary decision diagrams. J. Symb. Comput., 73:1-26, 2016.
16. M. Soeken, R. Wille, C. Hilken, N. Przigoda, and R. Drechsler. Synthesis of reversible circuits with minimal lines for large functions. In ASP Design Automation Conf., pages 85-92, 2012.
17. F. Somenzi. CUDD: CU decision diagram package release 3.0. 0. 2015.
18. R. Wille, R. Drechsler, C. Osewold, and A. G. Ortiz. Automatic design of lowpower encoders using reversible circuit synthesis. In Design, Automation and Test in Europe, pages 1036-1041, 2012.
19. R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drechsler. RevLib: an online resource for reversible functions and reversible circuits. In Int'l Symp. on MultiValued Logic, pages 220-225, 2008. RevLib is available at http://www.revlib.org.
20. R. Wille, O. Keszocze, S. Hillmich, M. Walter, and A. G. Ortiz. Synthesis of approximate coders for on-chip interconnects using reversible logic. In Design, Automation and Test in Europe, 2016.
21. A. Zulehner and R. Wille. Taking one-to-one mappings for granted: Advanced logic design of encoder circuits. In Design, Automation and Test in Europe, 2017.

[^0]:    ${ }^{1}$ A path to the currently considered node can only traverse the first or the fourth edge of other nodes, because they already establish the identity structure.

[^1]:    ${ }^{2}$ We utilized the methods available at RevKit [14] for logic minimization.

