Abstract
We describe the design of parallel trace-driven cache simulation for the purposes of evaluating different cache structures. As the research goes deeper, traditional simulation methods, which can only execute simulation operations in sequence, are no longer practical due to their long simulation cycles. An obvious way to achieve fast parallel simulation is to simulate the independent sets of a cache concurrently on different compute resources. We considered the use of generic GPU to accelerate cache simulation which exploits set-partitioning as the main source of parallelism. But we show this technique is not efficient in the case that just simulating one cache configuration, since a high correlation of the activity between different sets. Trace-sort and multi-configuration simulation in one single pass techniques are developed, taking advantage of the full programmability offered by the Compute Unified Device Architecture (CUDA) on the GPU. Our experimental results demonstrate that the cache simulator based on GPU-CPU platform gains 2.44x performance improvement compared to traditional sequential algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Uhlig., R.A., Mudge, T.N.: Trace-driven Memory Simulation: A survey. ACM Computing surveys 29 (1997)
Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation Techniques for Storage Hierarchies. IBM Systems Journal 9(2), 78–117 (1970)
Puzak, T.R.: Analysis of Cache Replacement Algorithms. Ph. D. Dissertation, University of Massachusetts, Amherst, MA (1985)
Wu, Y., Muntz, R.: Stack evaluation of arbitrary set-associative multiprocessor caches. IEEE Tram on Parallel and Distribured Systems 6(9), 930–942 (1995)
Milenkovi’c, A., Milenkovi’c, M.: An efficient single-pass trace compression technique utilizing instruction streams. ACM Transactions on Modeling and Computer Simulation 17(1), Article 2 (2007)
Ingalls, R.G., Rossetti, M.D., Smith, J.S., Peters, B.A. (eds.): Approximate Time-parallel Cache simulation. In: Proceedings of the 2004 Winter Simulation Conference, vol. 1, pp. 337–346 (2004)
Kiesling, T., Pohl, S.: Time-Parallel Simulation with Approximative State Matching, pads. In: 18th Workshop on Parallel and Distributed Simulation, pp. 195–202 (2004)
NVIDIA CUDA Programming Guide, http://developer.nvidia.com/cuda
ATI CTM Guide, http://ati.de/companyinfo/researcher/documents.html
Zamith, M.P.M., Clua, E.W.G., Conci, A., Montenegro, A., Leal-Toledo, R.C.P., Pagliosa, P.A., Valente, L., Feijo, B.: A game loop architecture for the GPU used as a math coprocessor in real-time applications. In: Computers in Entertainment (CIE), pp. 1–19 (2008)
Patney, A., Owens, J.D.: Real-time Reyes-style adaptive surface subdivision. In: ACM SIGGRAPH Asia 2008 papers, pp. 1–8 (2008)
Dotsenko, Y., Govindaraju, N.K., Sloan, P.-P., Boyd, C., Manferdelli, J.: Fast scan algorithms on graphics processors. In: Proceedings of the 22nd annual international conference on Supercomputing, pp. 205–213 (2008)
Thompson, C.J., Hahn, S., Oskin, M.: Using Modern Graphics Architectures for General-Purpose Computing: A Framework and Analysis. In: Proceedings of International Symposium on Microarchitecture, Istanbul, pp. 306–317 (2002)
Krüger, J., Westermann, R.: Linear algebra operators for GPU implementation of numerical algorithms. In: ACM SIGGRAPH 2005 Courses, p. 234 (2005)
Romero, S., Trenas, M.A., Gutierrez, E., Zapata, E.L.: Locality-improved FFT implementation on a graphics processor. In: Proceedings of the 7th WSEAS International Conference on Signal Processing, Computational Geometry Artificial Vision, pp. 58–63 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, H., Gao, X., Long, X., Wang, Z. (2009). GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-03644-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)