research-article

Efficient Data Placement for Improving Data Access Performance on Domain-Wall Memory

Authors:

Xianzhang Chen,

Edwin Hsing-Mean Sha,

Qingfeng Zhuge,

Chun Jason Xue,

Yuangang WangAuthors Info & Claims

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 24, Issue 10

Pages 3094 - 3104

https://doi.org/10.1109/TVLSI.2016.2537400

Published: 01 October 2016 Publication History

Abstract

A domain-wall memory (DWM) is becoming an attractive candidate to replace the traditional memories for its high density, low-power leakage, and low access latency. Accessing data on DWM is accomplished by shift operations that move data located on nanowires to read/write ports. Due to this kind of construction, data accesses on DWM exhibit varying access latencies. Therefore, data placement (DP) strategy has a significant impact on the performance of data accesses on DWM. In this paper, we prove the nondeterministic polynomial time (NP)-completeness of the DP problem on DWM. For the DWMs organized in single DWM block cluster (DBC), we present integer linear programming formulations to solve the problem optimally. We also propose an efficient single DBC placement (S-DBC-P) algorithm to exploit the benefits of multiple read/write ports and data locality. Compared with the sequential DP strategy, S-DBC-P reduces 76.9% shift operations on average for eight-port DWMs. Furthermore, for DP problem on the DWMs organized in multiple DBCs, we develop an efficient multiple DBC placement (M-DBC-P) algorithm to utilize the parallelism of DBCs. The experimental results show that the M-DBC-P achieves 90% performance improvement over the sequential DP strategy.

References

[1]

S. S. P. Parkin, M. Hayashi, and L. Thomas, “ Magnetic domain-wall racetrack memory,” Science, vol. Volume 320, no. Issue 5873, pp. 190–194, 2008.

[2]

R. Venkatesan, V. Kozhikkottu, C. Augustine, A. Raychowdhury, K. Roy, and A. Raghunathan, “ TapeCache: A high density, energy efficient cache based on domain wall memory,” in Proc. Int. Symp. Low-Power Electron. Design (ISLPED), Aug. 2012, pp. 185–190.

Digital Library

[3]

A. Iyengar and S. Ghosh, “ Modeling and analysis of domain wall dynamics for robust and low-power embedded memory,” in Proc. 51st ACM/IEEE Design Autom. Conf. (DAC), Jun. 2014, pp. 1–6.

Digital Library

[4]

M. Mao, W. Wen, Y. Zhang, Y. Chen, and H. Li, “ Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory,” in Proc. 51st ACM/IEEE Design Autom. Conf. (DAC), Jun. 2014, pp. 1–6.

[5]

E. Park, S. Yoo, S. Lee, and H. Li, “ Accelerating graph computation with racetrack memory and pointer-assisted graph representation,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2014, pp. 1–4.

[6]

Y. Wang, H. Yu, D. Sylvester, and P. Kong, “ Energy efficient in-memory AES encryption based on nonvolatile domain-wall nanowire,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2014, pp. 1–4.

[7]

R. Rajkumar, I. Lee, L. Sha, and J. Stankovic, “ Cyber-physical systems: The next computing revolution,” in Proc. 47th ACM/IEEE Design Autom. Conf. (DAC), Jun. 2010, pp. 731–736.

Digital Library

[8]

Y. Guo, Q. Zhuge, J. Hu, J. Yi, M. Qiu, and E H.-M. Sha, “ Data placement and duplication for embedded multicore systems with scratch pad memory,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. Volume 32, no. Issue 6, pp. 809–817, 2013.

[9]

J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, and E. H.-M. Sha, “ Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors,” ACM Trans. Embedded Comput. Syst., vol. Volume 13, no. Issue 4, 2014, no. 79.

[10]

Z. Wang, Z. Gu, and Z. Shao, “ WCET-aware energy-efficient data allocation on scratchpad memory for real-time embedded systems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 23, no. Issue 11, pp. 2700–2704, 2015.

[11]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “ MiBench: A free, commercially representative embedded benchmark suite,” in Proc. IEEE Int. Workshop Workload Characterization, Dec. 2001, pp. 3–14.

[12]

S. Liao, S. Devadas, K. Keutzer, S. Tjiang, and A. Wang, “ Storage assignment to decrease code size,” ACM SIGPLAN Notices, vol. Volume 30, no. Issue 6, pp. 186–195, 1995.

Digital Library

[13]

S. Fukami et al., “ Low-current perpendicular domain wall motion cell for scalable high-speed MRAM,” in Proc. VLSI Technol. Symp., Jun. 2009, pp. 230–231.

[14]

R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, “ DWM-TAPESTRI An energy efficient all-spin cache using domain wall shift based writes,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2013, pp. 1825–1830.

[15]

M. R. Garey and D. S. Johnson, Computers and Intractability: A guide to the Theory of np-Completeness, San Francisco, CA, USA: Freeman, 1979.

[16]

T. Austin, E. Larson, and D. Ernst, “ SimpleScalar: An infrastructure for computer system modeling,” Computer, vol. Volume 35, no. Issue 2, pp. 59–67, 2002.

Digital Library

[17]

L. Thomas, R. Moriya, C. Rettner, and S. S. P. Parkin, “ Dynamics of magnetic domain walls under their own inertia,” Science, vol. Volume 330, no. Issue 6012, pp. 1810–1813, 2010.

[18]

Y. Zhang, W. S. Zhao, D. Ravelosona, J.-O. Klein, J. V. Kim, and C. Chappert, “ Perpendicular-magnetic-anisotropy CoFeB racetrack memory,” J. Appl. Phys., vol. Volume 111, no. Issue 9, pp. 093925-1–093925-5, 2012.

[19]

Y. Zhang, W. Zhao, J.-O. Klein, C. Chappert, and D. Ravelosona, “ Peristaltic perpendicular-magnetic-anisotropy racetrack memory based on chiral domain wall motions,” J. Phys. D, Appl. Phys., vol. Volume 48, no. Issue 10, pp. 1–6, 2015.

[20]

G. Sun et al., “ From device to system: Cross-layer design exploration of racetrack memory,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2015, pp. 1018–1023.

[21]

Y. Zhang, C. Zhang, J.-O. Klein, D. Ravelosona, G. Sun, and W. Zhao, “ Perspectives of racetrack memory based on current-induced domain wall motion: From device to system,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2015, pp. 381–384.

[22]

R. Venkatesan et al., “ Cache design with domain wall memories,” IEEE Trans. Comput., vol. Volume PP, no. Issue 99, p. pp.1, 2015.

[23]

S. Motaman, A. S. Iyengar, and S. Ghosh, “ Domain wall memory-layout, circuit and synergistic systems,” IEEE Trans. Nanotechnol., vol. Volume 14, no. Issue 2, pp. 282–291, 2015.

[24]

Z. Sun, X. Bi, A. K. Jones, and H. Li, “ Design exploration of racetrack lower-level caches,” in Proc. ACM/IEEE Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2014, pp. 263–266.

Digital Library

[25]

Z. Sun, X. Bi, W. Wu, S. Yoo, and H. Li, “ Array organization and data management exploration in racetrack memory,” IEEE Trans. Comput., vol. Volume PP, no. Issue 99, p. pp.1, 2014.

[26]

R. Venkatesan, S. G. Ramasubramanian, S. Venkataramani, K. Roy, and A. Raghunathan, “ STAG: Spintronic-tape architecture for GPGPU cache hierarchies,” in Proc. ACM/IEEE 41st Int. Symp. Comput. Archit. (ISCA), Jun. 2014, pp. 253–264.

Digital Library

[27]

E. Atoofian, “ Reducing shift penalty in domain wall memory through register locality,” in Proc. Int. Conf. Compil., Archit. Synth. Embedded Syst., Oct. 2015, pp. 177–186.

[28]

Z. Sun, W. Wu, and H. Li, “ Cross-layer racetrack memory design for ultra high density and low power consumption,” in Proc. 50th ACM/IEEE Design Autom. Conf. (DAC), May/Jun. 2013, pp. 1–6.

Digital Library

[29]

S. Gu, E. H.-M. Sha, Q. Zhuge, Y. Chen, and J. Hu, “ Area and performance co-optimization for domain wall memory in application-specific embedded systems,” in Proc. 52nd ACM/IEEE Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.

Digital Library

[30]

X. Chen, E. H.-M. Sha, Q. Zhuge, P. Dai, and W. Jiang, “ Optimizing data placement for reducing shift operations on domain wall memories,” in Proc. 52nd ACM/IEEE Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.

Digital Library

[31]

S. Gu, Q. Zhuge, J. Yi, J. Hu, and E. H.-M. Sha, “ Optimizing task and data assignment on multi-core systems with multi-port SPMs,” IEEE Trans. Parallel Distrib. Syst., vol. Volume 26, no. Issue 9, pp. 2549–2560, 2014.

[32]

J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha, “ Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 21, no. Issue 6, pp. 1094–1102, 2013.

Digital Library

[33]

Q. Gao, Q. Zhuge, J. Zhang, G. Zhu, and E. H.-M. Sha, “ Optimizing data distribution for loops on embedded multicore with scratch-pad memory,” J. Comput., vol. Volume 9, no. Issue 5, pp. 1066–1076, 2014.

Cited By

Xu RSha EZhuge QSong YWang HShi L(2023)Optimizing Data Placement for Hybrid SRAM+Racetrack Memory SPM in Embedded SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554842:3(847-859)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3185548
Khan AOllivier SHameed FCastrillon JJones A(2023)DownShift: Tuning Shift Reduction With Reliability for Racetrack MemoriesIEEE Transactions on Computers10.1109/TC.2023.325750972:9(2585-2599)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TC.2023.3257509
Hakert CKhan AChen KHameed FCastrillon JChen J(2023)ROLLED: Racetrack Memory Optimized Linear Layout and Efficient Decomposition of Decision TreesIEEE Transactions on Computers10.1109/TC.2022.319709472:5(1488-1502)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TC.2022.3197094
Show More Cited By

Recommendations

Improving phase change memory performance with data content aware access
ISMM 2020: Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management

Phase change memory (PCM) is a scalable non-volatile memory technology that has low access latency (like DRAM) and high capacity (like Flash). Writing to PCM incurs significantly higher latency and energy penalties compared to reading its content. A ...
Cache Design with Domain Wall Memory
Domain wall memory (DWM) is a recently developed spin-based memory technology in which several bits of data are densely packed into the domains of a ferromagnetic wire. DWM has shown great promise in enabling non-volatile memory with very high density and ...
Optimizing data placement for reducing shift operations on domain wall memories
DAC '15: Proceedings of the 52nd Annual Design Automation Conference

Domain Wall Memory (DWM) using nanowire with data access port, exhibits extraordinary high density, low power leakage, and low access latency. These properties enable DWM to become an attractive candidate for replacing traditional memories. However, ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume 24, Issue 10

October 2016

180 pages

ISSN:1063-8210

Issue’s Table of Contents

Copyright © 2016.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 October 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu RSha EZhuge QSong YWang HShi L(2023)Optimizing Data Placement for Hybrid SRAM+Racetrack Memory SPM in Embedded SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554842:3(847-859)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TCAD.2022.3185548
Khan AOllivier SHameed FCastrillon JJones A(2023)DownShift: Tuning Shift Reduction With Reliability for Racetrack MemoriesIEEE Transactions on Computers10.1109/TC.2023.325750972:9(2585-2599)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TC.2023.3257509
Hakert CKhan AChen KHameed FCastrillon JChen J(2023)ROLLED: Racetrack Memory Optimized Linear Layout and Efficient Decomposition of Decision TreesIEEE Transactions on Computers10.1109/TC.2022.319709472:5(1488-1502)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TC.2022.3197094
Tárrega HValero ALorente VPetit SSahuquillo JRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Fast-track cacheProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532383(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532383
Khan AGoens AHameed FCastrillon JDi Natale GFummi F(2020)Generalized data placement strategies for racetrack memoriesProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408693(1502-1507)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408693
Khan ARink NHameed FCastrillon J(2020)Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM MemoriesACM Transactions on Embedded Computing Systems10.1145/339623519:6(1-26)Online publication date: 29-Sep-2020
https://dl.acm.org/doi/10.1145/3396235
Wu TChen XLiu KXiao CLiu ZZhuge QSha E(2020)HydraFS: an efficient NUMA-aware in-memory file systemCluster Computing10.1007/s10586-019-02952-y23:2(705-724)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1007/s10586-019-02952-y
Khan AHameed FBläsing RParkin SCastrillon J(2019)ShiftsReduceACM Transactions on Architecture and Code Optimization10.1145/337248916:4(1-23)Online publication date: 26-Dec-2019
https://dl.acm.org/doi/10.1145/3372489
Khan ARink NHameed FCastrillon JChen JShrivastava A(2019)Optimizing tensor contractions for embedded devices with racetrack memory scratch-padsProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326351(5-18)Online publication date: 23-Jun-2019
https://dl.acm.org/doi/10.1145/3316482.3326351
Chen XSha EJiang WZhuge QChen JQin JZeng Y(2016)The design of an efficient swap mechanism for hybrid DRAM-NVM systemsProceedings of the 13th International Conference on Embedded Software10.1145/2968478.2968497(1-10)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2968478.2968497

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents