Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

SAGA: Synthesis Augmentation with Genetic Algorithms for In-Memory Sequence Optimization
thanks: Identify applicable funding agency here. If none, delete this.

Andey Robins and Mike Borowczak Department of Electrical and Computer Engineering
University of Central Florida
Orlando, USA
{andey.robins, mike.borowczak}@ucf.edu
Abstract

The von-Neumann architecture has a bottleneck which limits the speed at which data can be made available for computation. To combat this problem, novel paradigms for computing are being developed. One such paradigm, known as in-memory computing, interleaves computation with the storage of data within the same circuits. MAGIC, or Memristor Aided Logic, is an approach which uses memory circuits which physically perform computation through write operations to memory. Sequencing these operations is a computationally difficult problem which is directly correlated with the cost of solutions using MAGIC based in-memory computation. SAGA models the execution sequences as a topological sorting problem which makes the optimization well-suited for genetic algorithms. We then detail the formation and implementation of these genetic algorithms and evaluate them over a number of open circuit implementations. The memory-footprint needed for evaluating each of these circuits is decreased by up to 52% from existing, greedy-algorithm-based optimization solutions. Over the 10 benchmark circuits evaluated, these modifications lead to an overall improvement in the efficiency of in-memory circuit evaluation of 128% in the best case and 27.5% on average.

Index Terms:
Computer Aided Design, In-memory Computing, Machine Learning, Evolutionary Computation
©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

I Introduction

Data and the locations where it is used in computation are physically separated in modern computing architectures. Data is repeatedly moved physically nearer to where computations are performed before being used for whatever purpose it was retrieved. This paradigm of computer architecture has been instrumental in the development of modern computers and the ways we utilize them. As processing has sped up though, this transfer of information from storage to compute has formed a bottleneck which limits the computational speeds which can be achieved by state-of-the-art devices. Specialized processing units, such as Graphics Processing Units (GPUs) and other purpose-built hardware, often attempt to side-step the problem by increasing the throughput of information; however, the theoretical problem of moving data around is not solved by this approach; it is only mitigated in particular situations. Additionally, specialized processors are often tailored to a specific domain. As an example, GPUs excel at matrix and vector operations, but their strengths are under-utilized for highly sequential tasks.

An alternative paradigm to the von-Neumann architecture would be performing the computation at the same place the data is stored rather than transporting the data to another location for processing. This computing paradigm is aptly referred to as “processing in-memory” (PIM). Memristor Aided Logic (MAGIC) is an emerging PIM paradigm making use of parallel, write-based operations to perform calculations in-memory [1]. Updating the foundational architecture of a computing system has many abstract and novel challenges. One such problem is that performing a calculation requires scheduling operations for the computation; however, the number of in-memory cells needed to evaluate the function described is dependent on the scheduling order. State-of-the-art solutions model this dependence as a graph and seek to solve the scheduling challenge as a covering problem. In this work, we instead re-frame this problem as one of permutation, applying that idea to the development of a genetic algorithm (GA) which produces reductions in the memory footprint of execution up to 52.8% in the best case when compared to the prior state-of-the-art on standard benchmark circuits [2].

This work proposes an additional synthesis phase for area optimization of PIM circuits, referred to as SAGA or Synthesis Augmentation with Genetic Algorithms, which integrates a machine-learning framework for final optimizations and is organized as follows. Section II contextualizes this work by describing prior systems for optimizing the cost of MAGIC-based in-memory circuits before using them as a benchmark for comparison of memory footprints. Section III describes both the design of the evolutionary framework utilized to improve operation sequencing and also describes the experiments used to evolve more fit solutions. Section IV presents metrics and Section V discusses the improvement on benchmark circuits.

II Prior Work

SAGA seeks to combine GAs with a CAD workflow, and begins by contextualizing PIM computation as one of many emerging paradigms for computing. We discuss MAGIC, a specific PIM logic style, and recent frameworks for optimizing circuit synthesis for the technology underpinning it, before providing a brief description of the fundamental pieces of the GAs used.

II-A Emerging Paradigms

Limitations of contemporary computing architectures have led to the emergence of novel computing paradigms which work to overcome the limitations of Dennard scaling and the slowing of Moore’s Law [3, 4]. Quantum computing [5] may be the most well known alternative paradigm for computing, but others, though less well-known, approach the problem of novel architecture in various ways. Photonic computing [6], analog computing [7], and PIM [8] are examples of emerging solutions which all diverge from current architectures in meaningful ways.

As an emerging paradigm, in-memory computation can itself be subdivided into different categories. An entire field of study currently quantifies applications of analog circuits for performing computation [9]; and special focus has been recently paid towards applications of this domain for artificial intelligence applications [10, 11]. Digital in-memory computation though is preferable for many applications due to the approximations implicit to analog computation. These digital paradigms are often separated by whether they are parallel or non-parallel as well as if they are read or write based [12, 13]. SAGA specifically targets non-parallel operations using write-based PIM for area optimization.

II-B Processing In-Memory

MAGIC performs computation by loading a memristor with a logical value and applying voltage across the circuit. The memory value stored by this operation is then dependent on the value which was loaded, allowing for the construction of simple gates. By combining memristors, more complex circuits, such as NOR gates of various input sizes, are formed. Individual memristors are then wired into a grid structure known as a crossbar matrix [14] allowing for some CAD tools to make use of writes to evaluate multiple NOT operations in parallel.

Refer to caption
Figure 1: Sub-figure (a) details an execution graph with memory cell costs depending on the evaluation order of the graph. Two possible orders exist which fit the constraints described in Section III-A. Processing vertex C before D has cost 3 while processing vertex D before C has cost 4 as illustrated in sub-figure (b).

With respect to MAGIC, application of formal design principles has continued with advances in technology mapping directly improving either the number of cycles used to complete an operation, the number of memory cells required for the operation, or both, which improves the efficiency of the PIM circuit overall [15]. Since evaluating a function using MAGIC requires the translation of a netlist into a sequence of in-memory memristor operations, this sequencing has significant effects on the amount of memory required to evaluate a function. Figure 1 illustrates this trade-off in the execution order. When the graph is processed in the ordering ABCDEF𝐴𝐵𝐶𝐷𝐸𝐹A\rightarrow B\rightarrow C\rightarrow D\rightarrow E\rightarrow Fitalic_A → italic_B → italic_C → italic_D → italic_E → italic_F, only three memory cells are required. When the graph is processed ABDCEF𝐴𝐵𝐷𝐶𝐸𝐹A\rightarrow B\rightarrow D\rightarrow C\rightarrow E\rightarrow Fitalic_A → italic_B → italic_D → italic_C → italic_E → italic_F, it requires four memory cells. For a graph with only six vertices, swapping the processing order of two nodes in the graph directly improves the efficiency without changing any behavior of the circuit.

A number of frameworks and processes for simplifying the execution sequences for MAGIC have been introduced. One of these early frameworks was the Simple synthesis tool [16]. This tool provided a framework for CAD workflows to be translated into the realm of in-memory computation. This process first used the tool abc to minimize circuits in a technology agnostic manner [17]. Then, the framework performed an additional round of technology dependent optimization to exploit the specific constraints of MAGIC systems resulting in nearly halving the best execution costs at the time.

Another approach attempting to develop a framework for this process applied a look-ahead operation to optimize the sequencing of operations [18]. Based on the fan-out for each gate, the algorithm proposed was able to improve the overall number of cycles required for an operation. Their work follows a similar synthesis flow to Simple by applying their optimization approach to the synthesis output of abc before final technology mapping. This work-flow, of offloading initial circuit simplification to abc, is a process mirrored by the process detailed in Section III.

Eventually, Simple was followed by Simpler which made use of a metric referred to as “cell usage” [2] which can be calculated from the output of abc. The cell usage of each gate enabled a search strategy which once again reduced the execution cost of in-memory circuits substantially. Across multiple metrics, there were improvements such as a 63x average increase in area efficiency, a 10% decrease in the number of operations, and a 5x decrease in the latency of parallel operations.

Refer to caption
Figure 2: The synthesis flow from benchmark circuit specifications to genetically optimized execution sequences.

II-C Genetic Algorithms

Advances in the field of CAD and artificial intelligence (AI) have been tied historically [19], but the applications of AI and CAD to emerging computing paradigms has been more limited. Often the design and fabrication of other physical components through processes such as 3D printing [20] are the focus. Therefore, a secondary offering of this work is a demonstration of the feasibility of applying AI techniques within the CAD pipeline for circuit synthesis. GAs in particular have been applied to similar sequencing and scheduling problems in the CAD space [21], but not for PIM CAD.

GAs are an effective tool for problems where a large population of candidate solutions are able to be evaluated, and small changes between individuals can be quickly made. Sequence optimization problems are one such framing for which GAs are known to be effective. Historically, there has been more exploration of genetic evolution algorithms within the space of ”pseudo-Boolean optimization problems,” but sequence optimization has also benefited from these GAs [22].

A GA is composed of four elements: a means by which a population of individuals can be evaluated and sorted by efficacy (evaluation), a method to select a subset of the population to seed the next population (selection), a means to combine individuals to create more (reproduction), and methods to re-integrate and update individuals in the population (replacement) [23]. These individual components lead to an emergent behavior of optimizing the fitness function used in evaluation to find the most fit individual. For evolutionary problems which can be modeled as graphs, such as sequencing operations, the relationships between descendants of nodes and the graph at large are significantly affected by mutation operators [24].

III Methods

The end-to-end process in SAGA duplicates the initial processes used by prior MAGIC synthesis tools like Simple and Simpler. This is detailed in Figure 2. SAGA introduces an additional step following this common synthesis pass which applies GAs to the domain of PIM circuit synthesis optimization for the first time to our knowledge. Since this optimization occurs at the end of netlist synthesis and before technology mapping, SAGA may be integrated into other in-memory synthesis workflows which begin with common logical formula specifications (e.g. BLIF, PLA, etc.).

III-A Problem Formulation

For a given graph, each topological sort of the vertices corresponds directly with a sequence of PIM operations. Depending on the sequence of these operations, a variable number of memory cells are required to include all of the intermediary products in the computation; and finding a sequence to optimize the number of cells is an NP-Hard problem [25]. For a given sequence, the fitness of that sequence is the number of memory cells its execution would require. The problem is then realized as a minimization of the fitness function. To validate the efficacy of SAGA, we test the null hypothesis H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT: There is no statistically significant difference in the circuit synthesis between Simpler and SAGA for the benchmark circuits.

Previous works have applied traditional CAD synthesis flows to optimizing this problem. They have applied tools like abc [17] to minimize the logical expressions of a function and applied greedy algorithms to cover the resulting graph with circuit components. Novel approaches to framing the problem, such as decomposing the calculations into larger intermediary components [15], provide ways to reduce higher-level problems to the technological tools currently available. By reformulating the synthesis problem, this work instead structures the problem as one which can be tackled with genetic evolution and then applies evolutionary algorithms as a final optimization step in the synthesis work flow.

A circuit specification is taken from RevLib [26] and loaded into abc [17]. The circuit is then internally reduced by abc and mapped to a NOR-Inverter Graph, or NIG, using a custom circuit library as in Simpler [2]. This graph is then parsed into the genetic evolution algorithm and the population is optimized using a generational GA. Finally, an optimized execution sequence is output by SAGA to be incorporated into the final technology mapping and execution of the in-memory function. We describe the process by which the netlist produced by abc is translated to the GA domain and the genetic operators utilized in SAGA. Finally, we describe the configuration of the experimental population and stopping conditions for synthesis. To evaluate H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we employ a one-tailed t-test to determine if the differences in circuit efficiency are statistically significant with significance levels set at α=0.05𝛼0.05\alpha=0.05italic_α = 0.05.

III-B Netlist Transcription

Benchmark circuits from RevLib are initially specified in Programmable Logic Array (PLA) form [26]. The PLA encoding is then converted, synthesized and mapped to a minimal technology library of inverter and 2-input NOR gates by abc. The resulting NIG is then converted into a DAG with edges oriented in the direction of data flow. The graph is topologically sorted into a list of vertices corresponding to an execution sequence for the gates in the circuit using a breadth first traversal of the DAG beginning with all input vertices being visited first. Other approaches in the literature perform their optimizations at this sorting phase; this work, uses a computationally cheap sorting which produces a sub-optimal solution which is used to initialize the population for genetic optimization.

III-C Genetic Operators

Crossover is performed with a single point, order based crossover [27]. One chromosome is read up until the mutation point, then beginning at the mutation point, all sequence values are read and added if they are not already on the new chromosome. Mutation is performed by selecting a point on the sequence and swapping it with another value which preserves the validity of the sequence. Determining valid swap candidates can be done by filtering out all nodes which are directly dependent upon or descended from the currently mutating vertex. This can be efficiently computed once as the relationship of vertices in the graph doesn’t change during genetic evolution.

III-D Genetic Experiment

As SAGA presents an application of GAs as a final step in optimization, the online performance of the algorithm is of less importance than the final, offline fitness. As such, the hyperparameters are tuned with the mindset that larger populations which take longer to evaluate and combine are acceptable provided the algorithm produces execution sequences with minimal memory footprint. Additional work would be necessary to determine for which hyperparameters the synthesis time is minimized while preserving performance.

Populations of 2000 individuals are initially created with a 20% chance of a sequence being mutated once during each recombination. Sequences have their fitness evaluated by simulating each sequence in memory and recording the footprint. Simulation makes use of a mark-and-sweep algorithm to ensure that the minimal number of memory cells are utilized for each evaluation.

Following fitness evaluation, the half of the population which are least fit are discarded. Individuals are then sorted according to their fitness, paired with another individual next to them in sorted order, and undergo the ordered crossover operation previously described. After crossover, each individual in the population will have the mutation operator applied with probability 20%. Generations are evaluated until ϵitalic-ϵ\epsilonitalic_ϵ generations pass with no improvement to the fitness of the best individual in the population. Multiple values of ϵitalic-ϵ\epsilonitalic_ϵ are evaluated and discussed in Table III. The best observed sequence across all runs is reported in Table I.

TABLE I: Performance comparison between this work and the best performing prior works. Efficiency is calculated according to the formula Efficiency =106Area1Cycles1absentsuperscript106𝐴𝑟𝑒superscript𝑎1𝐶𝑦𝑐𝑙𝑒superscript𝑠1=10^{6}\cdot Area^{-1}\cdot Cycles^{-1}= 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ⋅ italic_A italic_r italic_e italic_a start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_C italic_y italic_c italic_l italic_e italic_s start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [2]. Efficiency metrics have been rounded to the nearest whole number for display where greater is better.
Simple MAGIC [16] Simpler MAGIC [2] SAGA [This Work] Improvement
Benchmark Cycles Area Efficiency Cycles Area Efficiency ϵitalic-ϵ\epsilonitalic_ϵ Cycles Area Efficiency Cycles Area Efficiency
5xp1 886 315 4 119 39 215 5000 160 29 216 -34.5% 25.6% 0.47%
9symml n/a n/a n/a 218 57 80 5000 306 57 57 -40.4% 0.0% -28.8%
clip 742 444 3 160 47 133 5000 233 40 107 -45.6% 14.9% -19.5%
cm150a 570 189 3 67 39 383 50 52 22 874 22.4% 43.6% 128%
cm162a 530 186 9 64 35 446 50 87 20 575 -35.9% 42.9% 28.9%
cm163a 522 183 10 66 36 421 50 76 17 774 -15.2% 52.8% 83.8%
misex1 1380 294 2 83 33 365 500 84 17 700 -1.20% 48.5% 91.8%
parity 1078 240 4 81 35 353 500 104 20 481 -28.4% 42.9% 36.3%
sao2 n/a n/a n/a 128 53 147 5000 213 43 109 -66.4% 18.9% -25.9%
x2 1404 168 4 73 33 415 5000 80 16 781 -9.59% 51.5% 88.2%

IV Results

Unless otherwise specified, averages refer to the geometric mean of a value for all benchmarks. Compared to the previous state-of-the-art, using SAGA to optimize MAGIC-based circuits leads to area improvements of up to 52.8% and cycle improvements of up to 22.4% with overall improvements in the total efficiency of the benchmark circuits up to 128%. Based on the data collected, the null-hypothesis, H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with an α=0.05𝛼0.05\alpha=0.05italic_α = 0.05 and a resulting p-value of 0.02934, is rejected. Thus the improvements from SAGA are significant.

TABLE II: The average percentage change of each statistic where negative values are decreases in the metric while positive are improvements.
Statistic Cycles Area Efficiency
Arithmetic Mean -25.5% 33.6% 38.3%
Geometric Mean -29.5% 32.3% 27.5%
Standard Deviation 25.3% 19.2% 56.7%
95% Confidence (-41.1, -9.82) (21.7, 45.5) (3.17, 73.5)

On average, the application of genetic evolution to the execution sequence was responsible for a 32.3% decrease in the number of memory cells required for each circuit compared to Simpler. Only one circuit didn’t have the number of memory cells required reduced with this method, namely the 9symml benchmark. In regards to the number of cycles required for each circuit, unlike with Simpler which reduced the cycle count by executing NOT operations in parallel, no operations are performed in parallel in this work. As a result, the number of cycles executed for each circuit is 29.5% higher on average than it was with Simpler. Despite this, one benchmark had a reduction in the number of cycles required by synthesizing a smaller circuit at the abc phase of synthesis. In the worst case, cycle counts were increased by 66.4% compared to Simpler. The implications of using purely sequential operations within this work are explored in further detail in Section V. The overall efficiency improvements were 27.5% on average for the benchmark circuits. More complete statistics including 95% confidence intervals on the impact of the changes is presented in Table II.

A common limitation of GAs is the computational effort required to generate highly optimized solutions can significantly exceed the effort required to find moderately optimized solutions. In an effort to demonstrate and quantify the impact of longer search time, the performance for various values of epsilon was also examined and presented in Table III.

V Discussion

In contrast to one of the prior state-of-the-art works no additional work to identify operations which can be performed in parallel is done. Even without these additional optimizations the efficiency increased in the majority of cases and on average for the benchmark circuits. The memory footprint, the primary target of optimization in this work, was constant or improved in all benchmark cases.

Reasons for not including a parallel optimization process are three-fold. First, as a demonstration that this technique is capable of improving the efficiency of PIM circuits with less specialized knowledge than hand-crafted CAD algorithms provides support to the argument that applying GAs for optimization of PIM operations is an effective choice. Secondly, modifying the fitness function to maximize efficiency rather than minimizing area could potentially improve the exact measures of efficiency, but would not fundamentally change the approach to the problem, which is the primary contribution of this work. Additionally, while NOT operations in the MAGIC paradigm can be simply parallelized due to the structure of MAGIC crossbars utilized in the literature, this structure may not necessarily fit for all PIM operation sequencing problems; however, the genetic evolution approaches to evaluating this synthesis and integrating it into the design work-flow could still be applied in a way to minimize the cost of physical implementation for subsequent technologies.

Due to the stage of the process in which this genetic evolution-backed optimization occurs, validation of the evolved sequence is a computationally cheap operation. In fact, it’s performed during fitness calculations to ensure solutions are valid throughout the search. This is accomplished by verifying the sequence can visit every node in the NIG without visiting a node whose parents have not been visited.

TABLE III: Performance comparison for various epsilon. The percentage change from one value of ϵitalic-ϵ\epsilonitalic_ϵ to the next is presented in the ΔeffsubscriptΔ𝑒𝑓𝑓\Delta_{eff}roman_Δ start_POSTSUBSCRIPT italic_e italic_f italic_f end_POSTSUBSCRIPT column.
ϵ=50italic-ϵ50\epsilon=50italic_ϵ = 50 ϵ=500italic-ϵ500\epsilon=500italic_ϵ = 500 ϵ=5000italic-ϵ5000\epsilon=5000italic_ϵ = 5000
Benchmark Cycles Area Efficiency Area Efficiency ΔeffsubscriptΔ𝑒𝑓𝑓\Delta_{e}ffroman_Δ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_f italic_f Area Efficiency ΔeffsubscriptΔ𝑒𝑓𝑓\Delta_{e}ffroman_Δ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT italic_f italic_f
5xp1 160 37 169 30 208 23.1% 29 216 3.85%
9symml 306 66 50 63 52 4.00% 57 54 9.52%
clip 233 50 86 43 100 16.3% 41 105 5%
cm150a 52 22 874 22 874 0% 22 874 0%
cm162a 87 21 547 21 547 0% 20 575 0%
cm163a 76 17 774 17 774 0% 17 774 4.76%
misex1 84 18 661 17 700 5.90% 17 700 0%
parity 104 25 385 20 481 24.9% 20 481 0%
sao2 213 47 100 44 107 7% 43 109 1.87%
x2 80 19 658 17 735 11.7% 16 781 6.26%

VI Conclusion

GAs are effective optimization tools at the final stage in the design process. Future work would be needed to assess the integration of these algorithms into earlier phases of the design workflow or the replacement of traditional design algorithms with machine learning algorithms such as this one. Furthermore, whether additional tuning to evolution hyperparameters and the fitness function lead to further improvements is an open question which would require more statistical analyses than this work is able to present. Finally, while the time needed to synthesize an individual circuit is not a primary focus of this work, in a production setting, expeditious synthesis is a valuable property. Modifications to the synthesis flow which lead to faster convergence may be desirable for applications of this technology. Further work would be necessary to explore the impacts to synthesis time caused by modifications to the configuration of the genetic evolution workflow.

This work presents a workflow for applying genetic evolution algorithms to the synthesis of in-memory memristor-based operation sequences for realizing boolean functions. The injection of machine learning algorithms into the CAD pipeline allows for final optimizations to be conducted beyond the current state of the art, human-crafted algorithms and is made available on GitHub [28]. By reformulating the synthesis of circuits problem from the more common framing as a covering problem to a problem of sequence permutation, GAs shine as the machine learning tool capable of optimizing the final footprints of circuits. Using SAGA leads to reductions in the number of in-memory memory cells needed for in-memory computing by 32% and overall improvements in efficiency of 27% on average across the evaluated benchmark circuits.

Acknowledgment

This work was funded in part by the ORCGS fellowship at the University of Central Florida. Furthermore, this work extends prior work from a project for the course EEE 5336 on CAD of VLSI at the University of Central Florida in Fall of 2023.

References

  • [1] S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, “Magic—memristor-aided logic,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, no. 11, pp. 895–899, 2014.
  • [2] R. Ben-Hur, R. Ronen, A. Haj-Ali, D. Bhattacharjee, A. Eliahu, N. Peled, and S. Kvatinsky, “Simpler magic: Synthesis and mapping of in-memory logic executed in a single row to improve throughput,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2434–2447, 2019.
  • [3] L. Johnsson and G. Netzer, “The impact of moore’s law and loss of dennard scaling: Are dsp socs an energy efficient alternative to x86 socs?” in Journal of Physics: Conference Series, vol. 762, no. 1.   IOP Publishing, 2016, p. 012022.
  • [4] W. Haensch, “Scaling is over—what now?” in 2017 75th Annual Device Research Conference (DRC).   IEEE, 2017, pp. 1–2.
  • [5] L. Gyongyosi and S. Imre, “A survey on quantum computing technology,” Computer Science Review, vol. 31, pp. 51–71, 2019.
  • [6] S. Xiang, Y. Han, Z. Song, X. Guo, Y. Zhang, Z. Ren, S. Wang, Y. Ma, W. Zou, B. Ma et al., “A review: Photonics devices, architectures, and algorithms for optical neural computing,” Journal of Semiconductors, vol. 42, no. 2, p. 023105, 2021.
  • [7] F. Zangeneh-Nejad, D. L. Sounas, A. Alù, and R. Fleury, “Analogue computing with metamaterials,” Nature Reviews Materials, vol. 6, no. 3, pp. 207–225, 2021.
  • [8] N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang, and P. Deaville, “In-memory computing: Advances and prospects,” IEEE Solid-State Circuits Magazine, vol. 11, no. 3, pp. 43–55, 2019.
  • [9] T. Soliman, F. Müller, T. Kirchner, T. Hoffmann, H. Ganem, E. Karimov, T. Ali, M. Lederer, C. Sudarshan, T. Kämpfe et al., “Ultra-low power flexible precision fefet based analog in-memory computing,” in 2020 IEEE International Electron Devices Meeting (IEDM).   IEEE, 2020, pp. 29–2.
  • [10] A. Antolini, C. Paolino, F. Zavalloni, A. Lico, E. F. Scarselli, M. Mangia, F. Pareschi, G. Setti, R. Rovatti, M. L. Torres et al., “Combined hw/sw drift and variability mitigation for pcm-based analog in-memory computing for neural network applications,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 13, no. 1, pp. 395–407, 2023.
  • [11] M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. Nandakumar, P. Narayanan, H. Tsai et al., “Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators,” Nature Communications, vol. 14, no. 1, p. 5282, 2023.
  • [12] M. R. H. Rashed, S. Thijssen, S. K. Jha, F. Yao, and R. Ewetz, “Stream: Towards read-based in-memory computing for streaming based data processing,” in 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2022, pp. 690–695.
  • [13] Y. Zha and J. Li, “Reconfigurable in-memory computing with resistive memory crossbar,” in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).   IEEE, 2016, pp. 1–8.
  • [14] M. R. H. Rashed, S. K. Jha, and R. Ewetz, “Hybrid analog-digital in-memory computing,” in 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD).   IEEE, 2021, pp. 1–9.
  • [15] M. R. H. Rashed, S. Thijssen, S. K. Jha, and R. Ewetz, “Automated synthesis for in-memory computing,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD).   IEEE, 2023, pp. 1–9.
  • [16] R. B. Hur, N. Wald, N. Talati, and S. Kvatinsky, “Simple magic: Synthesis and in-memory mapping of logic execution for memristor-aided logic,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).   IEEE, 2017, pp. 225–232.
  • [17] Berkeley Logic Synthesis and Verification Group, “ABC: A System for Sequential Synthesis and Verification, Release 221019,” http://www.eecs.berkeley.edu/ alanmi/abc/, 221019.
  • [18] D. N. Yadav, P. L. Thangkhiew, and K. Datta, “Look-ahead mapping of boolean functions in memristive crossbar array,” Integration, vol. 64, pp. 152–162, 2019.
  • [19] S. Jiaoying, L. Feng, and Z. Ning, “Artificial intelligence in computer aided design,” Computers in industry, vol. 8, no. 4, pp. 277–282, 1987.
  • [20] B. R. Hunde and A. D. Woldeyohannes, “Future prospects of computer-aided design (cad)–a review from the perspective of artificial intelligence (ai), extended reality, and 3d printing,” Results in Engineering, vol. 14, p. 100478, 2022.
  • [21] G. Squillero, “Artificial evolution in computer aided design: from the optimization of parameters to the creation of assembly programs,” Computing, vol. 93, pp. 103–120, 2011.
  • [22] B. Doerr, Y. Ghannane, and M. I. Brahim, “Towards a stronger theory for permutation-based evolutionary algorithms,” in Proceedings of the Genetic and Evolutionary Computation Conference, 2022, pp. 1390–1398.
  • [23] S. Sivanandam, S. Deepa, S. Sivanandam, and S. Deepa, Genetic algorithms.   Springer, 2008.
  • [24] B. Allen, A. Traulsen, C. E. Tarnita, and M. A. Nowak, “How mutation affects evolutionary games on graphs,” Journal of theoretical biology, vol. 299, pp. 97–105, 2012.
  • [25] R. Sethi, “Complete register allocation problems,” in Proceedings of the fifth annual ACM symposium on Theory of computing, 1973, pp. 182–195.
  • [26] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drechsler, “RevLib: An online resource for reversible functions and reversible circuits,” in Int’l Symp. on Multi-Valued Logic, 2008, pp. 220–225, RevLib is available at http://www.revlib.org.
  • [27] K. Deep and H. Mebrahtu, “New variations of order crossover for travelling salesman problem,” International Journal of Combinatorial Optimization Problems and Informatics, vol. 2, no. 1, pp. 2–13, 2011.
  • [28] A. Robins, “Saga,” https://github.com/andey-robins/saga, 2024.