Abstract
Previous work shows the possibility of predicting the cache miss rate (CMR) for all inputs of a program. However, most optimization techniques need to know more than the miss rate of the whole program. Many of them benefit from knowing miss rate of each execution phase of a program for all inputs.
In this paper, we describe a method that divides a program into phases that have a regular locality pattern. Using a regression model, it predicts the reuse signature and then the cache miss rate of each phase for all inputs. We compare the prediction with the actual measurement. The average prediction is over 98% accurate for a set of floating-point programs. The predicted CMR-traces matches the simulated ones in spite of dramatic fluctuations of the miss rate over time. This technique can be used for improving dynamic optimization, benchmarking, and compiler design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, F., Cocke, J.: A proram data flow analysis procedure. Communications of the ACM 19, 137–147 (1976)
Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, San Francisco (2001)
Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., Dwarkadas, S.: Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: Proceedings of the 33rd International Symposium on Microarchitecture, Monterey, California (December 2000)
Balasubramonian, R., Dwarkadas, S., Albonesi, D.H.: Dynamically managing the communication-parallelism trade-off in future clustered processors. In: Proceedings of International Symposium on Computer Architecture, San Diego, CA (June 2003)
Burke, M., Cytron, R.: Interprocedural dependence analysis and parallelization. In: Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction, Palo Alto, CA (June 1986)
Callahan, D., Cocke, J., Kennedy, K.: Analysis of interprocedural side effects in a parallel programming environment. Journal of Parallel and Distributed Computing 5(5), 517–550 (1988)
Cascaval, C., Padua, D.A.: Estimating cache misses and locality using stack distances. In: Proceedings of International Conference on Supercomputing, San Francisco, CA (June 2003)
Chatterjee, S., Parker, E., Hanlon, P.J., Lebeck, A.R.: Exact analysis of the cache behavior of nested loops. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, UT (2001)
Dhodapkar, A.S., Smith, J.E.: Managing multi-configuration hardware via dynamic working-set analysis. In: Proceedings of International Symposium on Computer Architecture, Anchorage, Alaska (June 2002)
Dhodapkar, A.S., Smith, J.E.: Comparing program phase detection techniques. In: Proceedings of International Symposium on Microarchitecture (December 2003)
Ding, C., Zhong, Y.: Predicting whole-program locality with reuse distance analysis. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA (June 2003)
Duesterwald, E., Cascaval, C., Dwarkadas, S.: Characterizing and predicting program behavior and its variability. In: Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana (September 2003)
Fang, C., Carr, S., Onder, S., Wang, Z.: Reuse-distance-based miss-rate prediction on a per instruction basis. In: Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Washington DC (June 2004)
Ferrante, J., Sarkar, V., Thrash, W.: On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer, Heidelberg (1991)
Gallivan, K., Jalby, W., Gannon, D.: On the problem of optimizing data transfers for complex memory systems. In: Proceedings of the Second International Conference on Supercomputing, St. Malo, France (July 1988)
Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Langauges and Systems 21(4) (1999)
Havlak, P., Kennedy, K.: An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems 2(3), 350–360 (1991)
Hill, M.D.: Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley (November 1987)
Hsu, C.-H., Kermer, U.: The design, implementation and evaluation of a compiler algorithm for CPU energy reduction. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA (June 2003)
Hsu, C.-H., Kremer, U., Hsiao, M.: Compiler-directed dynamic frequency and voltage scaling. In: Workshop on Power-Aware Computer Systems, Cambridge, MA (November 2000)
Li, Z., Yew, P., Zhu, C.: An efficient data dependence analysis for parallelizing compilers. IEEE Transactions on Parallel and Distributed Systems 1(1), 26–34 (1990)
Huang, M., Renau, J., Torrellas, J.: Positional adaptation of processors: application to energy reduction. In: Proceedings of the International Symposium on Computer Architecture, San Diego, CA (June 2003)
Magklis, G., Scott, M.L., Semeraro, G., Albonesi, D.H., Dropsho, S.: Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In: Proceedings of the International Symposium on Computer Architecture, San Diego, CA (June 2003)
Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY (June 2004)
Mattson, R.L., Gecsei, J., Slutz, D., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM System Journal 9(2), 78–117 (1970)
McKinley, K.S., Temam, O.: Quantifying loop nest locality using SPEC 1995 and the perfect benchmarks. ACM Transactions on Computer Systems 17(4), 288–336 (1999)
Mellor-Crummey, J., Fowler, R., Whalley, D.B.: Tools for application-oriented performance tuning. In: Proceedings of the 15th ACM International Conference on Supercomputing, Sorrento, Italy (June 2001)
Shen, X., Zhong, Y., Ding, C.: Regression-based multi-model prediction of data reuse signature. In: Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, Mexico (November 2003)
Shen, X., Zhong, Y., Ding, C.: Locality phase prediction. In: Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA (2004) (to appear)
Sherwood, T., Sair, S., Calder, B.: Phase tracking and prediction. In: Proceedings of International Symposium on Computer Architecture, San Diego, CA (June 2003)
Triolet, R., Irigoin, F., Feautrier, P.: Direct parallelization of CALL statements. In: Proceedings of the SIGPLAN 1986 Symposium on Compiler Construction, Palo Alto, CA (June 1986)
Zhong, Y., Dropsho, S.G., Ding, C.: Miss rate prediction across all program inputs. In: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana (September 2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shen, X., Zhong, Y., Ding, C. (2005). Phase-Based Miss Rate Prediction Across Program Inputs. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_5
Download citation
DOI: https://doi.org/10.1007/11532378_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28009-5
Online ISBN: 978-3-540-31813-2
eBook Packages: Computer ScienceComputer Science (R0)