Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps

Published: 01 January 1992 Publication History

Abstract

A timestamp-based software-assisted cache coherence scheme that does not require any global communication to enforce the coherence of multiple private caches is proposed. It is intended for shared memory multiprocessors. The scheme is based on a compile-time marking of references and a hardware-based local incoherence detection scheme. The possible incoherence of a cache entry is detected and the associated entryis implicitly invalidated by comparing a clock (related to program flow) and a timestamp (related to the time of update in the cache). Results of a performance comparison, which is based on a trace-driven simulation using actual traces. between the proposed timestamp-based scheme and other software-assisted schemes indicate that the proposed scheme performs significantly better than previous software-assisted schemes, especially when the processors are carefully scheduled so as to maximize the reuse of cache contents. This scheme requires neither a shared resource nor global communication and is, therefore, scalable up to a large number of processors.

References

[1]
{1} A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An evaluation of directory schemes for cache coherence." in Proc. 15th Annu. Int. Symp. Comput. Architecture, June 1988, pp. 280-289.]]
[2]
{2} F. Allen, M. Burke, P. Charles, R. Cytron, and J. Ferrante, "An overview of the PTRAN analysis system for multiprocessing," in Proc. 1987 Int. Conf. Supercomput., June 1987.]]
[3]
{3} J.R. Allen and K. Kennedy, "PFC: A program to convert FORTRAN to parallel form," MASC Tech. Rep. 82-6, Dep. Math. Sci., Rice Univ., Mar. 1982.]]
[4]
{4} J. Archibald and J.-L. Baer, "An economical solution to the cache coherence problem," in Proc. 12th Annu. Int. Symp. Comput. Architecture, June 1955, pp. 355-362.]]
[5]
{5} BBN, Butterfly Parallel Processor Overview, Version 1. Dec. 1985.]]
[6]
{6} D. Callahan, "A global spproach to detection of parallelism," Rice Univ., Apr. 1987.]]
[7]
{7} L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Comput., vol. C-27, pp. 1112-1118, Dec. 1978.]]
[8]
{8} H. Cheong and A. Veidenbaum, "A version control approach to cache coherence," in Proc. 1989 Int. Conf. Supercomput., June 1989, pp. 322-330.]]
[9]
{9} H. Cheong and A. Veidenbaum, "Stale data detection and coherence enforcement using flow analysis," in Proc. 1988 Int. Conf. Parallel Processing, Vol. I Architecture , Aug. 1988, pp. 138-145.]]
[10]
{10} H. Cheong and A. Veidenbaum, "A cache coherence scheme with fast selective invalidation," in Proc. 15th Annu. Int. Symp. Comput. Architecture, June 1988, pp. 299-307.]]
[11]
{11} R. Cytron, "Doacross: Beyond vectorization for multiprocessors," in Proc. 1986 Int. Conf. Parallel Processing, IEEE, Aug. 1986, pp. 836-844.]]
[12]
{12} R. Cytron, S. Karlovsky, and K. P. McAuliffe, "Automatic management of programmable caches (extended abstract)," in Proc. 1988 Int. Conf. Parallel Processing, Vol. II Software, Aug. 1988, pp. 229-238.]]
[13]
{13} G. A. Darmohray and E. D. Brooks III, "Gaussian techniques on shared memory multiprocessor computers," unpublished Tech. Rep., UCRL- 97939, preprint.]]
[14]
{14} S. J. Eggers, "Simulation analysis of data sharing in shared memory multiprocessors," Univ. of California, Berkeley, Feb. 1989.]]
[15]
{15} D. Gajski, D. Kuck, D. Lawrie, and A. Sameh, "Cedar-A large scale multiprocessor," Comput. Architecture News, vol. 11, no. 1, pp. 7-11, Mar. 1983.]]
[16]
{16} W. Gentzsch, "Vectorization of computer programs with applications to computational fluid dynamics," vol. 8, Notes on Numerical Fluid Mechanics, Friedr. Vieweg & Sohn Verlagsgesellschaft mbH, Braunschweig 1984.]]
[17]
{17} J.R. Goodman, "Using cache memory to reduce processor-memory traffic," in Proc. 10th Annu. Int. Symp. Comput. Architecture, June 1983, pp. 124-131.]]
[18]
{18} A. Gottlieb, R. Grishman, C.P. Kruskal, K.P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer-Designing a MIMD, shared-memory parallel machine," in Proc. 9th Annu. Int. Symp. Comput. Architecture, Apr. 1982, pp. 27-42.]]
[19]
{19} R. R. Henry, "Address and instruction tracing for the VAX architecture," unpublished tech. rep., Aug. 1983.]]
[20]
{20} E. D. Brooks III, "Performance of the butterfly processor-memory interconnection in a vector environment," in Proc. 1985 Int. Conf. Parallel Processing, IEEE, Aug. 1985, pp. 21-24.]]
[21]
{21} E. D. Brooks III and G. A. Darmohray, "A parallel extension of C that is 99% fat free," unpublished tech. rep.]]
[22]
{22} R. H. Katz, S. J. Eggers, D.A. Wood, C. L. Perkinsk, and R. G. Sheldon, "Implementing a cache consistency protocol," in Proc. 12th Anna. Int. Symp. Comput. Architecture, June 1985, pp. 276-283.]]
[23]
{23} D. J. Kuck, The Structure of Computers and Computation. New York: Wiley, 1978.]]
[24]
{24} D. J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe, "The structure of an advanced vectorizer for pipelined processors," in Proc. Comput. Software Appl. Conf. (COMPSAC80), IEEE, Oct. 1980, pp. 709-715.]]
[25]
{25} R. L. Lee, "The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors," Center for Supercomputing Res. and Develop., Univ. of Illinois, CSRD Rep. 670, May 1987.]]
[26]
{26} R. L. Lee, P.-C. Yew, and D. H. Lawrie, "Multiprocessor cache design considerations," in Proc. 14th Anna. Int. Symp. Comput. Architecture, June 1987, pp. 253-262.]]
[27]
{27} K. P. McAuliffe, "Analysis of cache memories in highly parallel systems," New York Univ., May 1986.]]
[28]
{28} S.L. Min, "Memory hierarchy management schemes in large scale shared-memory multiprocessors," Univ. of Washington, 1989.]]
[29]
{29} S. L. Min and J.-L. Baer, "A timestamp-based cache coherence scheme," in Proc. 1989 Int. Conf. Parallel Processing, Vol. I Architecture, Aug. 1989, pp. 23-32.]]
[30]
{30} S. L. Min and J.-L. Baer, "A performance comparison of directory-based and timestamp-based cache coherence schemes," in Proc. 1990 Int. Conf. Parallel Processing, Vol. I Architecture, Aug. 1990, pp. 305-311.]]
[31]
{31} G.F. Pfister, W.C. Brantley, D.A. George, S. L. Harvey, W. J. Kleinfelder, K.P. McAuliff, E.A. Melton, V.A. Norton, and J. Weiss, "The IBM Research Parallel Processor Prototype (RP3): Introduction and architecture," in Proc. 1985 Int. Conf. Parallel Processing, IEEE, Aug. 1985, pp. 764-771.]]
[32]
{32} L. Rudolph and Z. Segall, "Dynamic decentralized cache consistency schemes for MIMD parallel processors, " in Proc. 12th Annu. Int. Symp. Comput. Architecture, June 1985, pp. 340-347.]]
[33]
{33} A. J. Smith, "CPU cache consistency with software support and using 'one time identifiers'," in Proc. Pacific Comput. Commun. Symp., Oct. 1985, pp. 22-24.]]
[34]
{34} A. J. Smith, "Line (block) size choice for CPU caches," IEEE Trans. Comput. , vol. C-36, pp. 1063-1075, Sept. 1987.]]
[35]
{35} L. Snyder, "Type architectures, shared memory and the corollary of modest potential," Annu. Rev. Comput. Sci., vol. 1, pp. 289-317, 1986.]]
[36]
{36} C. K. Tang, "Cache design in the tightly coupled multiprocessor system," in AFIPS Conf. Proc. Nat. Comput. Conf., 1976, pp. 749-753.]]
[37]
{37} C. P. Thacker and L.C. Stewart, "Firefly: A multiprocessor workstation," in Proc. Second Int. Conf. Architectural Support for Programming Languages Oper. Syst., Oct. 1987, pp. 164-172.]]
[38]
{38} A. V. Veidenbaum, "A compiler-assisted cache coherence solution for multiprocessors," in Proc. 1986 Int. Conf. Parallel Processing, Aug. 1986, pp. 1029-1036.]]
[39]
{39} M. Wolfe, "Optimizing compilers for supercomputers," Dep. Comput. Sci., Univ. of Illinois at Urbana-Champaign, UIUCDCS-R-82-1105, Oct. 1982.]]
[40]
{40} W. A. Wulf and C. G. Bell, "C.mmp-A multi-mini processor," in Proc. Fall Joint Comput. Conf., Montvale, NJ, Dec. 1972, pp. 765-777.]]
[41]
{41} W.C. Yen, D. W. L. Yen, and K.-S. Fu, "Data coherence problem in a multicache system," IEEE Trans. Comput., vol. C-34, pp. 56-65, Jan. 1985.]]

Cited By

View all
  • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: 1-Jan-2024
  • (2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021
  • (2018)Combining HW/SW mechanisms to improve NUMA performance of multi-GPU systemsProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00035(339-351)Online publication date: 20-Oct-2018
  • Show More Cited By

Index Terms

  1. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Parallel and Distributed Systems
      IEEE Transactions on Parallel and Distributed Systems  Volume 3, Issue 1
      January 1992
      125 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 January 1992

      Author Tags

      1. Index Termscache contents reuse
      2. buffer storage
      3. clocks
      4. compile-time marking
      5. hardware-based local incoherence detection
      6. multiple privatecaches
      7. parallel programming
      8. program flow
      9. references
      10. scalable cache coherence
      11. shared memory multiprocessors
      12. storage management
      13. timestamps
      14. trace-driven simulation

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Cyclebite: Extracting Task Graphs From Unstructured Compute-ProgramsIEEE Transactions on Computers10.1109/TC.2023.332750473:1(221-234)Online publication date: 1-Jan-2024
      • (2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021
      • (2018)Combining HW/SW mechanisms to improve NUMA performance of multi-GPU systemsProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00035(339-351)Online publication date: 20-Oct-2018
      • (2017)TC-Release++: An Efficient Timestamp-Based Coherence Protocol for Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.271967928:11(3313-3327)Online publication date: 1-Nov-2017
      • (2016)Efficient Timestamp-Based Cache Coherence Protocol for Many-Core ArchitecturesProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926270(1-13)Online publication date: 1-Jun-2016
      • (2015)FusionACM SIGARCH Computer Architecture News10.1145/2872887.275042143:3S(733-745)Online publication date: 13-Jun-2015
      • (2015)DeNovoSyncACM SIGARCH Computer Architecture News10.1145/2786763.269435643:1(545-559)Online publication date: 14-Mar-2015
      • (2015)DeNovoSyncACM SIGPLAN Notices10.1145/2775054.269435650:4(545-559)Online publication date: 14-Mar-2015
      • (2015)FusionProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750421(733-745)Online publication date: 13-Jun-2015
      • (2015)DeNovoSyncProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694356(545-559)Online publication date: 14-Mar-2015
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media