Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

CMP cache performance projection: accessibility vs. capacity

Published: 01 March 2007 Publication History

Abstract

Efficient utilizing on-chip storage space on Chip-Multiprocessors (CMPs) has become an important research topic. Tradeoffs between data accessibility and effective on-chip capacity have been studied extensively. It requires costly simulations to understand a wide-spectrum of the design space. In this paper, we first develop an abstract model for understanding the performance impact with respect to data replication. To overcome the lack of real-time interactions among multiple cores in the abstract model, we propose a global stack simulation strategy to study the performance of a variety of cache organizations on CMPs. The global stack logically incorporates a shared stack and per-core private stacks to collect shared/private reuse (stack) distances for every memory reference in a single simulation pass. With the collected reuse distances, performance in terms of hits/misses and average memory access times can be calculated for various cache organizations. We verify the stack results against individual execution-driven simulations that consider realistic cache parameters and delays using a set of commercial multithreaded workloads. The results show that stack simulations can accurately model the performance of various cache organizations. The single-pass stack simulation results demonstrate that the effectiveness of various techniques for optimizing the CMP on-chip storage is closely related to the working sets of the workloads as well as to the total cache sizes.

References

[1]
A. Agarwal, M. Horowitz and J. Hennessy, "An analytical cache model," ACM Trans. on Computer Systems, Vol. 7, No. 2, May 1989.
[2]
B. Beckmann and D. Wood, "Managing Wire Delay in Large Chip-Multiprocessor Caches," Proc. of 37th Int'l Symp. on Microarchitecture, Dec. 2004, pp. 319--330.
[3]
B. M. Beckmann, M. R. Marty, and D. A. Wood, "ASR: Adaptive Selective Replication for CMP Caches," Proc. of the 39th Int'l Symp. on Microarchitecture, Dec. 2006.
[4]
B. T. Bennett and V. J. Kruskal, "LRU Stack Processing," IBM journal of R & D, July 1975, pp. 353--357.
[5]
E. Berg and E. Hagersten, "StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis," Proc. of Int'l Symp. on Performance Analysis of Systems and Software, March 2004.
[6]
E. Berg, H. Zeffer and E. Hagersten, "A Statistical Multiprocessor Cache Model," Proc. of Int'l Symp. on Performance Analysis of Systems and Software, March 2006.
[7]
D. Chandra, F. Guo, S. Kim and Y. Solihin, "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture", Proc. of 11th Int'l Symp. on HPCA, Feb. 2005, pp. 340--351.
[8]
J. Chang and G. Sohi, "Cooperative Caching for Chip Multiprocessors," Proc. of 33rd Int'l Symp. on Computer Architecture, June 2006.
[9]
Z. Chishti, M. D. Powell and T. N. Vijaykumar, "Optimizing Replication, Communication, and Capacity Allocation in CMPs," Proc. of 32nd Int'l Symp. on Computer Architecture, June 2005, pp. 357--368.
[10]
G. Edwards, S. Devadas, and L. Rudolph, "Analytical Cache Models with Applications to Cache Partitioning," Proc. of 15th Int'l Conf. on Supercomputing, June 2001, pp. 1--12.
[11]
B. Fraguela, R. Doallo, and E. Zapata, "Automatic Analytical Modeling for the Estimation of Cache Misses," Proc. of Int'l Conf. on Parallel Architectures and Compilation Techniques, Sep. 1999.
[12]
M. Hill and J. Smith, "Evaluating Associativity in CPU Caches", IEEE Transactions on Computers, Dec. 1989, pp. 1612--1630.
[13]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger and S. W. Keckler, "A NUCA Substrate for Flexible CMP Cache Sharing," Proc. of 19th Int'l Conf. on Supercomputing, June, 2005.
[14]
C. Kim, D. Burger, and S. Keckler, "An Adaptive Non-uniform Cache Structure for Wire-delay Dominated On-chip Caches," Proc. of 10th Int'l Conf. on, Oct. 2002.
[15]
Y. H. Kim, M. D. Hill and D. A. Wood, "Implementing Stack Simulation for Highly-associative Memories," Proc. of 1991 SIGMETRICS conf. on Measurement and Modeling of Computer Systems, May 1991, pp. 212--213.
[16]
R. Kumar, V. Zyuban, and D. M. Tullsen, "Interconnections in Multicore Architectures: Understanding Mechanisms, Overhead and Scaling." Proc. of 32nd Int'l Sump. on Computer Architecture, June 2005.
[17]
C. Liu, A. Sivasubramaniam and M. Kandemir, "Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs," Proc. of 10th Int'l Symp. on HPCA, Feb. 2004, pp. 176--185.
[18]
P. S. Magnusson et al. "Simics: A Full System Simulation Platform," IEEE Computer, Feb. 2002, pp. 50--58.
[19]
Matlab, http://www.mathworks.com/products/matlab/.
[20]
R. Mattson, J. Gecsei, D. Slutz, and I. Traiger, "Evaluation Techniques and Storage Hierarchies," IBM Systems Journal, 9, 1970, pp. 78--117.
[21]
Open source development labs database test 2. http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/osdl_dbt-2/.
[22]
E. Speight, H. Shafi, L. Zhang and R. Rajamony, "Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors," Proc. of 32nd Int'l Symp. on Computer Architecture, June 2005, pp. 346--356.
[23]
R. A. Sugumar and S. G. Abraham, "Set-associative Cache Simulation using Generalized Binomial Trees," ACM Transactions on Computer Systems, Vol. 13, No. 1, Feb. 1995, pp. 32--56.
[24]
G. E. Suh, L. Rudolph, and S. Devadas, "Dynamic Partitioning of Shared Cache Memory," The Journal of Supercomputing, 28(1), 2004, pp. 7--26.
[25]
J. G. Thompson, "Efficient Analysis of Caching Systems," Computer Science Division Technical Report UCB/Computer Science Dept. 87/374, University of California, Berkeley, October 1987.
[26]
X. Vera and J. Xue, "Let's Study Whole-Program Cache Behavior Analytically," Proc. of 8th Int'l Symp. on High Performance Computer Architecture, Feb. 2002.
[27]
C. E. Wu, Y. Hsu, Y. Liu, "Efficient Stack Simulation for Shared Memory Set-Associative Multiprocessor Caches," Proc. of 1993 Int'l Conf. on Parallel Processing, Aug. 1993.
[28]
Wu, Y. and Muntz, R. 1995, "Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches," IEEE Transactions on Parallel and Distributed Systems, Sep. 1995, pp. 930--942.
[29]
M. Zhang, and K. Asanovic, "Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors," Proc. of 32nd Int'l Symp. on Computer Architecture, June 2005, pp. 336--345.

Cited By

View all
  • (2012)Survey of scheduling techniques for addressing shared resources in multicore processorsACM Computing Surveys (CSUR)10.1145/2379776.237978045:1(1-28)Online publication date: 7-Dec-2012
  • (2009)SHARP controlProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669177(517-528)Online publication date: 12-Dec-2009
  • (2009)On chip cache quantitative optimization approachProceedings of the Second international conference on High Performance Computing and Applications10.1007/978-3-642-11842-5_77(550-556)Online publication date: 10-Aug-2009

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 35, Issue 1
March 2007
153 pages
ISSN:0163-5964
DOI:10.1145/1241601
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2007
Published in SIGARCH Volume 35, Issue 1

Check for updates

Author Tags

  1. CMP caches
  2. data replication
  3. performance modeling and projection
  4. stack simulation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Survey of scheduling techniques for addressing shared resources in multicore processorsACM Computing Surveys (CSUR)10.1145/2379776.237978045:1(1-28)Online publication date: 7-Dec-2012
  • (2009)SHARP controlProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669177(517-528)Online publication date: 12-Dec-2009
  • (2009)On chip cache quantitative optimization approachProceedings of the Second international conference on High Performance Computing and Applications10.1007/978-3-642-11842-5_77(550-556)Online publication date: 10-Aug-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media