Abstract
The combinatorial optimization problem (COP), which aims to find the optimal solution in discrete space, is fundamental in various fields. Unfortunately, many COPs are NP-complete, and require much more time to solve as the problem scale increases. Troubled by this, researchers may prefer fast methods even if they are not exact, so approximation algorithms, heuristic algorithms, and machine learning have been proposed. Some works proposed chaotic simulated annealing (CSA) based on the Hopfield neural network and did a good job. However, CSA is not something that current general-purpose processors can handle easily, and there is no special hardware for it. To efficiently perform CSA, we propose a software and hardware co-design. In software, we quantize the weight and output using appropriate bit widths, and then modify the calculations that are not suitable for hardware implementation. In hardware, we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor. COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration. The results show that COPPER can perform CSA remarkably well in both speed and energy.
摘要
组合优化问题(combinatorial optimization problem, COP)是一类在离散空间中寻找最优解的数学问题, 具有广泛的应用。然而, 许多组合优化问题是NP完全的, 随着问题规模的增加, 解决问题所需的时间急剧增加, 这促使研究人员寻求更快速的解决方法, 即使解不一定是最优的, 如近似算法、启发式算法和机器学习算法等。一些先前的工作基于 Hopfield神经网络提出了混沌模拟退火(chaotic simulated annealing, CSA), 并取得了良好的表现。然而, CSA的计算模式对当前的通用处理器并不友好, 且没有专用的计算硬件。为了高效地执行CSA, 我们提出一种软硬件联合的设计方案。在软件方面, 我们使用适当的位宽对权重和输出进行量化, 并修改那些不适合硬件实现的计算模式。在硬件方面, 我们设计了一种基于忆阻器的专用存内计算硬件架构COPPER。COPPER能够高效地运行修改后的量化CSA算法, 并支持流水线以获得进一步加速。结果表明, COPPER在执行CSA算法时, 速度和能耗方面都十分出色。
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Chen JR, Wu HQ, Gao B, et al., 2020. A parallel multi-bit programing scheme with high precision for RRAM-based neuromorphic systems. IEEE Trans Electron Dev, 67(5):2213–2217. https://doi.org/10.1109/TED.2020.2979606
Chen LN, Aihara K, 1995. Chaotic simulated annealing by a neural network model with transient chaos. Neur Netw, 8(6):915–930. https://doi.org/10.1016/0893-6080(95)00033-V
Chi P, Li SC, Xu C, et al., 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput Archit News, 44(3):27–39. https://doi.org/10.1145/3007787.3001140
Cipra BA, 1987. An introduction to the Ising model. Am Math Mon, 94(10):937–959. https://doi.org/10.1080/00029890.1987.12000742
Hopfield JJ, 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA, 79(8):2554–2558. https://doi.org/10.1073/pnas.79.8.2554
Hopfield JJ, Tank DW, 1985. “Neural” computation of decisions in optimization problems. Biol Cybern, 52(3):141–152. https://doi.org/10.1007/BF00339943
Hung JM, Huang YH, Huang SP, et al., 2022. An 8-Mb DC-current-free binary-to-8b precision ReRAM nonvolatile computing-in-memory macro using time-space-readout with 1286.4-21.6TOPS/W for edge-AI devices. IEEE Int Solid-State Circuits Conf, p.1–3. https://doi.org/10.1109/ISSCC42614.2022.9731715
Johnson MW, Amin MHS, Gildert S, et al., 2011. Quantum annealing with manufactured spins. Nature, 473(7346):194–198. https://doi.org/10.1038/nature10012
Karp RM, 1972. Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (Eds.), Complexity of Computer Computations. Springer, New York, USA, p.85–103. https://doi.org/10.1007/978-1-4684-2001-2_9
Li XC, Yuan ZH, Sun GY, et al., 2022. Tailor: removing redundant operations in memristive analog neural network accelerators. Proc 59th ACM/IEEE Design Automation Conf, p.1009–1014. https://doi.org/10.1145/3489517.3530500
Lucas A, 2014. Ising formulations of many NP problems. Front Phys, 2:5. https://doi.org/10.3389/fphy.2014.00005
Mirhoseini A, Goldie A, Yazgan M, et al., 2020. Chip placement with deep reinforcement learning. https://arxiv.org/abs/2004.10746
Shafiee A, Nag A, Muralimanohar N, et al., 2016. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News, 44(3):14–26. https://doi.org/10.1145/3007787.3001139
Shin SW, Smith G, Smolin JA, et al., 2014. How “quantum” is the D-wave machine? https://arxiv.org/abs/1401.7087
Song LH, Qian XH, Li H, et al., 2017. PipeLayer: a pipelined ReRAM-based accelerator for deep learning. IEEE Int Symp on High Performance Computer Architecture, p.541–552. https://doi.org/10.1109/HPCA.2017.55
Takemoto T, Hayashi M, Yoshimura C, et al., 2019. 2.6 A 2×30k-spin multichip scalable annealing processor based on a processing-in-memory approach for solving large-scale combinatorial optimization problems. IEEE Int Solid-State Circuits Conf, p.52–54. https://doi.org/10.1109/ISSCC.2019.8662517
Takemoto T, Yamamoto K, Yoshimura C, et al., 2021. 4.6 A 144Kb annealing system composed of 9×16Kb annealing processor chips with scalable chip-to-chip connections for large-scale combinatorial optimization problems. IEEE Int Solid-State Circuits Conf, p.64–66. https://doi.org/10.1109/ISSCC42613.2021.9365748
Vinyals O, Fortunato M, Jaitly N, 2015. Pointer networks. Proc 28th Int Conf on Neural Information Processing Systems, p.2692–2700.
Yamamoto K, Ando K, Mertig N, et al., 2020. 7.3 STATICA: a 512-spin 0.25M-weight full-digital annealing processor with a near-memory all-spin-updates-at-once architecture for combinatorial optimization with complete spinspin interactions. IEEE Int Solid-State Circuits Conf, p.138–140. https://doi.org/10.1109/ISSCC19947.2020.9062965
Yamaoka M, Yoshimura C, Hayashi M, et al., 2016. A 20k-spin Ising chip to solve combinatorial optimization problems with CMOS annealing. IEEE J Sol-State Circ, 51(1):303–309. https://doi.org/10.1109/JSSC.2015.2498601
Yang K, Duan QX, Wang YH, et al., 2020. Transiently chaotic simulated annealing based on intrinsic nonlinearity of memristors for efficient solution of optimization problems. Sci Adv, 6(33):eaba9901. https://doi.org/10.1126/sciadv.aba9901
Zhu ZH, Sun HB, Lin YJ, et al., 2019. A configurable multi-precision CNN computing framework based on single bit RRAM. Proc 56th Annual Design Automation Conf, Article 56. https://doi.org/10.1145/3316781.3317739
Author information
Authors and Affiliations
Contributions
Qiankun WANG led the research and was mainly responsible for implementing the algorithm, designing the hardware, and drafting the paper. Xingchen LI provided the design ideas and some data for the hardware part. Bingzhe WU sorted out the algorithm and pointed out the possibility of combining software and hardware. Ke YANG and Yuchao YANG laid the foundation for this research and provided some parameters for the algorithm. Wei HU provided the stability analysis of ReRAM and the latest research progress of ReRAM PIM macro. Guangyu SUN made many suggestions on the research, and revised and finalized the paper.
Corresponding author
Additional information
Compliance with ethics guidelines
Qiankun WANG, Xingchen LI, Bingzhe WU, Ke YANG, Wei HU, Guangyu SUN, and Yuchao YANG declare that they have no conflict of interest.
Project supported by the National Natural Science Foundation of China (Nos. 61832020, 62032001, 92064006, and 62274036), the Beijing Academy of Artificial Intelligence (BAAI) of China, and the 111 Project of China (No. B18001)
Rights and permissions
About this article
Cite this article
Wang, Q., Li, X., Wu, B. et al. COPPER: a combinatorial optimization problem solver with processing-in-memory architecture. Front Inform Technol Electron Eng 24, 731–741 (2023). https://doi.org/10.1631/FITEE.2200463
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2200463