Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient Data Placement for Improving Data Access Performance on Domain-Wall Memory

Published: 01 October 2016 Publication History

Abstract

A domain-wall memory (DWM) is becoming an attractive candidate to replace the traditional memories for its high density, low-power leakage, and low access latency. Accessing data on DWM is accomplished by shift operations that move data located on nanowires to read/write ports. Due to this kind of construction, data accesses on DWM exhibit varying access latencies. Therefore, data placement (DP) strategy has a significant impact on the performance of data accesses on DWM. In this paper, we prove the nondeterministic polynomial time (NP)-completeness of the DP problem on DWM. For the DWMs organized in single DWM block cluster (DBC), we present integer linear programming formulations to solve the problem optimally. We also propose an efficient single DBC placement (S-DBC-P) algorithm to exploit the benefits of multiple read/write ports and data locality. Compared with the sequential DP strategy, S-DBC-P reduces 76.9% shift operations on average for eight-port DWMs. Furthermore, for DP problem on the DWMs organized in multiple DBCs, we develop an efficient multiple DBC placement (M-DBC-P) algorithm to utilize the parallelism of DBCs. The experimental results show that the M-DBC-P achieves 90% performance improvement over the sequential DP strategy.

References

[1]
S. S. P. Parkin, M. Hayashi, and L. Thomas, “ Magnetic domain-wall racetrack memory,” Science, vol. Volume 320, no. Issue 5873, pp. 190–194, 2008.
[2]
R. Venkatesan, V. Kozhikkottu, C. Augustine, A. Raychowdhury, K. Roy, and A. Raghunathan, “ TapeCache: A high density, energy efficient cache based on domain wall memory,” in Proc. Int. Symp. Low-Power Electron. Design (ISLPED), Aug. 2012, pp. 185–190.
[3]
A. Iyengar and S. Ghosh, “ Modeling and analysis of domain wall dynamics for robust and low-power embedded memory,” in Proc. 51st ACM/IEEE Design Autom. Conf. (DAC), Jun. 2014, pp. 1–6.
[4]
M. Mao, W. Wen, Y. Zhang, Y. Chen, and H. Li, “ Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory,” in Proc. 51st ACM/IEEE Design Autom. Conf. (DAC), Jun. 2014, pp. 1–6.
[5]
E. Park, S. Yoo, S. Lee, and H. Li, “ Accelerating graph computation with racetrack memory and pointer-assisted graph representation,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2014, pp. 1–4.
[6]
Y. Wang, H. Yu, D. Sylvester, and P. Kong, “ Energy efficient in-memory AES encryption based on nonvolatile domain-wall nanowire,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2014, pp. 1–4.
[7]
R. Rajkumar, I. Lee, L. Sha, and J. Stankovic, “ Cyber-physical systems: The next computing revolution,” in Proc. 47th ACM/IEEE Design Autom. Conf. (DAC), Jun. 2010, pp. 731–736.
[8]
Y. Guo, Q. Zhuge, J. Hu, J. Yi, M. Qiu, and E H.-M. Sha, “ Data placement and duplication for embedded multicore systems with scratch pad memory,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. Volume 32, no. Issue 6, pp. 809–817, 2013.
[9]
J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, and E. H.-M. Sha, “ Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors,” ACM Trans. Embedded Comput. Syst., vol. Volume 13, no. Issue 4, 2014, no. 79.
[10]
Z. Wang, Z. Gu, and Z. Shao, “ WCET-aware energy-efficient data allocation on scratchpad memory for real-time embedded systems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 23, no. Issue 11, pp. 2700–2704, 2015.
[11]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “ MiBench: A free, commercially representative embedded benchmark suite,” in Proc. IEEE Int. Workshop Workload Characterization, Dec. 2001, pp. 3–14.
[12]
S. Liao, S. Devadas, K. Keutzer, S. Tjiang, and A. Wang, “ Storage assignment to decrease code size,” ACM SIGPLAN Notices, vol. Volume 30, no. Issue 6, pp. 186–195, 1995.
[13]
S. Fukami et al., “ Low-current perpendicular domain wall motion cell for scalable high-speed MRAM,” in Proc. VLSI Technol. Symp., Jun. 2009, pp. 230–231.
[14]
R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, “ DWM-TAPESTRI An energy efficient all-spin cache using domain wall shift based writes,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2013, pp. 1825–1830.
[15]
M. R. Garey and D. S. Johnson, Computers and Intractability: A guide to the Theory of np-Completeness, San Francisco, CA, USA: Freeman, 1979.
[16]
T. Austin, E. Larson, and D. Ernst, “ SimpleScalar: An infrastructure for computer system modeling,” Computer, vol. Volume 35, no. Issue 2, pp. 59–67, 2002.
[17]
L. Thomas, R. Moriya, C. Rettner, and S. S. P. Parkin, “ Dynamics of magnetic domain walls under their own inertia,” Science, vol. Volume 330, no. Issue 6012, pp. 1810–1813, 2010.
[18]
Y. Zhang, W. S. Zhao, D. Ravelosona, J.-O. Klein, J. V. Kim, and C. Chappert, “ Perpendicular-magnetic-anisotropy CoFeB racetrack memory,” J. Appl. Phys., vol. Volume 111, no. Issue 9, pp. 093925-1–093925-5, 2012.
[19]
Y. Zhang, W. Zhao, J.-O. Klein, C. Chappert, and D. Ravelosona, “ Peristaltic perpendicular-magnetic-anisotropy racetrack memory based on chiral domain wall motions,” J. Phys. D, Appl. Phys., vol. Volume 48, no. Issue 10, pp. 1–6, 2015.
[20]
G. Sun et al., “ From device to system: Cross-layer design exploration of racetrack memory,” in Proc. IEEE Design Autom. Test Eur. Conf. Exhibit. (DATE), Mar. 2015, pp. 1018–1023.
[21]
Y. Zhang, C. Zhang, J.-O. Klein, D. Ravelosona, G. Sun, and W. Zhao, “ Perspectives of racetrack memory based on current-induced domain wall motion: From device to system,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2015, pp. 381–384.
[22]
R. Venkatesan et al., “ Cache design with domain wall memories,” IEEE Trans. Comput., vol. Volume PP, no. Issue 99, p. pp.1, 2015.
[23]
S. Motaman, A. S. Iyengar, and S. Ghosh, “ Domain wall memory-layout, circuit and synergistic systems,” IEEE Trans. Nanotechnol., vol. Volume 14, no. Issue 2, pp. 282–291, 2015.
[24]
Z. Sun, X. Bi, A. K. Jones, and H. Li, “ Design exploration of racetrack lower-level caches,” in Proc. ACM/IEEE Int. Symp. Low Power Electron. Design (ISLPED), Aug. 2014, pp. 263–266.
[25]
Z. Sun, X. Bi, W. Wu, S. Yoo, and H. Li, “ Array organization and data management exploration in racetrack memory,” IEEE Trans. Comput., vol. Volume PP, no. Issue 99, p. pp.1, 2014.
[26]
R. Venkatesan, S. G. Ramasubramanian, S. Venkataramani, K. Roy, and A. Raghunathan, “ STAG: Spintronic-tape architecture for GPGPU cache hierarchies,” in Proc. ACM/IEEE 41st Int. Symp. Comput. Archit. (ISCA), Jun. 2014, pp. 253–264.
[27]
E. Atoofian, “ Reducing shift penalty in domain wall memory through register locality,” in Proc. Int. Conf. Compil., Archit. Synth. Embedded Syst., Oct. 2015, pp. 177–186.
[28]
Z. Sun, W. Wu, and H. Li, “ Cross-layer racetrack memory design for ultra high density and low power consumption,” in Proc. 50th ACM/IEEE Design Autom. Conf. (DAC), May/Jun. 2013, pp. 1–6.
[29]
S. Gu, E. H.-M. Sha, Q. Zhuge, Y. Chen, and J. Hu, “ Area and performance co-optimization for domain wall memory in application-specific embedded systems,” in Proc. 52nd ACM/IEEE Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.
[30]
X. Chen, E. H.-M. Sha, Q. Zhuge, P. Dai, and W. Jiang, “ Optimizing data placement for reducing shift operations on domain wall memories,” in Proc. 52nd ACM/IEEE Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.
[31]
S. Gu, Q. Zhuge, J. Yi, J. Hu, and E. H.-M. Sha, “ Optimizing task and data assignment on multi-core systems with multi-port SPMs,” IEEE Trans. Parallel Distrib. Syst., vol. Volume 26, no. Issue 9, pp. 2549–2560, 2014.
[32]
J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha, “ Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. Volume 21, no. Issue 6, pp. 1094–1102, 2013.
[33]
Q. Gao, Q. Zhuge, J. Zhang, G. Zhu, and E. H.-M. Sha, “ Optimizing data distribution for loops on embedded multicore with scratch-pad memory,” J. Comput., vol. Volume 9, no. Issue 5, pp. 1066–1076, 2014.

Cited By

View all
  • (2023)Optimizing Data Placement for Hybrid SRAM+Racetrack Memory SPM in Embedded SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554842:3(847-859)Online publication date: 1-Mar-2023
  • (2023)DownShift: Tuning Shift Reduction With Reliability for Racetrack MemoriesIEEE Transactions on Computers10.1109/TC.2023.325750972:9(2585-2599)Online publication date: 1-Sep-2023
  • (2023)ROLLED: Racetrack Memory Optimized Linear Layout and Efficient Decomposition of Decision TreesIEEE Transactions on Computers10.1109/TC.2022.319709472:5(1488-1502)Online publication date: 1-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems  Volume 24, Issue 10
October 2016
180 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 October 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Optimizing Data Placement for Hybrid SRAM+Racetrack Memory SPM in Embedded SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554842:3(847-859)Online publication date: 1-Mar-2023
  • (2023)DownShift: Tuning Shift Reduction With Reliability for Racetrack MemoriesIEEE Transactions on Computers10.1109/TC.2023.325750972:9(2585-2599)Online publication date: 1-Sep-2023
  • (2023)ROLLED: Racetrack Memory Optimized Linear Layout and Efficient Decomposition of Decision TreesIEEE Transactions on Computers10.1109/TC.2022.319709472:5(1488-1502)Online publication date: 1-May-2023
  • (2022)Fast-track cacheProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532383(1-12)Online publication date: 28-Jun-2022
  • (2020)Generalized data placement strategies for racetrack memoriesProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408693(1502-1507)Online publication date: 9-Mar-2020
  • (2020)Optimizing Tensor Contractions for Embedded Devices with Racetrack and DRAM MemoriesACM Transactions on Embedded Computing Systems10.1145/339623519:6(1-26)Online publication date: 29-Sep-2020
  • (2020)HydraFS: an efficient NUMA-aware in-memory file systemCluster Computing10.1007/s10586-019-02952-y23:2(705-724)Online publication date: 1-Jun-2020
  • (2019)ShiftsReduceACM Transactions on Architecture and Code Optimization10.1145/337248916:4(1-23)Online publication date: 26-Dec-2019
  • (2019)Optimizing tensor contractions for embedded devices with racetrack memory scratch-padsProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326351(5-18)Online publication date: 23-Jun-2019
  • (2016)The design of an efficient swap mechanism for hybrid DRAM-NVM systemsProceedings of the 13th International Conference on Embedded Software10.1145/2968478.2968497(1-10)Online publication date: 1-Oct-2016

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media