Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/255235.255275acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

Predictability of load/store instruction latencies

Published: 01 December 1993 Publication History
First page of PDF

References

[1]
Alpha Architecture Handbook- Preliminary Edition. Digital Equipment Corporation, Maynard, MA, 1992.
[2]
G. R. Beck, D. W. L. Yen, and T. L. Anderson. The Cydra 5 mini-supercomputer: architecture and implementation. The Journal o/ $upercomputing, 7(1/2):143-180, 1992.
[3]
D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proc. o/ASPLOS IV, pages 40- 52, 1991.
[4]
D. Callahan and A. Porterfield. Data cache performance of supercomputer applications. In $upercomputing '90, pages 564-572, 1990.
[5]
P. Chang, S. Mahlke, W. Chen, and W.W. Hwu. Profile-guided automatic inline expansion for C programs. So. ware-Practice and Experience, 22(5):349- 369, 1992.
[6]
T-F. Chen and J-L. Baer. Reducing memory latency via non-blocking and prefetching caches. In Proc. o/ ASPLO$ V, pages 51-61, 1992.
[7]
W. Y. Chen, S. A. Mahlke, and W.W. Hwu. Tolerating first level memory access latency in highperformance systems. In Intl. Conf. on Parallel Processing, pages 1-36 - 1-43, 1992.
[8]
F. Chow and J. Hennessy. Register allocation by priority-based coloring. In Proc. of the 198j Syrnp. on Compiler Construction, pages 222-232, 1984.
[9]
R. P. Colwell, R. P. Nix, J. J. O'Donnell, D. B. Papworth, and P. K. Rodman. A VLIW architecture for a trace scheduling compiler. IEEE Transactions on Computers, C-37(8):967-979, 1988.
[10]
J. A. Fisher. Trace scheduling: A technique for global microcode compaction, iEEE Transactions on Cornputers, C-30(7):478-490, July 1981.
[11]
W. W. Hwu and P. P. Chang. Achieving high instruction cache performance with an optimizing compiler. In Proc. of 16th Intl. Syrnp. on Computer Architecture, pages 242-251, 1989.
[12]
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Wafter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: an effective technique for vliw and superscalar compilation. J. of Supercomputing, 7(1/2):229-248, 1993.
[13]
V. Kathail, M. S. Schlansker, and B. R. Rau. HPL PlayDoh architecture specification: Version 1.0. Technical Report HPL-93-80, Hewlett-Packard Laboratories, 1993. in preparation.
[14]
D. R. Kerns and S. J. Eggers. Balanced scheduling: Instruction scheduling when memory latency is uncertain. In Proc. of the SIGPLAN '93 Conf. on Pro9. Lang. Design and Implementation, pages 278- 289, 1993.
[15]
A. C. Klaiber and H. M. Levy. Architecture for software controlled data prefetching, in Proc. of 18th Intl. Syrup. on Computer Architecture, pages 43-63, 1991.
[16]
M. Martonosi, A. Gupta, and T. Anderson. Memspy: Analyzing memory system bottlenecks in programs. In Proc. A CM SIGMETRICS Con/., pages 1-12, 1992.
[17]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78-117, 1970.
[18]
S. McFarling. Program optimization for instruction caches. In Proc. of ASPLOS IiI, 1989.
[19]
D. M. McNiven. Reduction in Main Memory Traffic through the Efficient use of Local Memory. PhD thesis, University of Illinois, 1988.
[20]
T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proc. of ASPLOS V, pages 62-73, 1992.
[21]
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle. The Cydra 5 departmental supercomputer: Design philosophies, decisions and trade-offs. IEEE Computer, 22(1):12-35, 1989.
[22]
R. A. Sugumar and S. G. Abraham. Multiconfiguration simulation algorithms for the evaluation of computer architecture designs. Technical Report CSE-TR-173-93, CSE Division, University of Michigan, 1992.

Cited By

View all
  • (2014)A data cache with multiple caching strategies tuned to different types of localityACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667170(217-226)Online publication date: 10-Jun-2014
  • (2012)Reducing off-chip memory traffic by selective cache management scheme in GPGPUsProceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units10.1145/2159430.2159443(110-119)Online publication date: 3-Mar-2012
  • (2008)Latency-tolerant software pipelining in a production compilerProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356073(104-113)Online publication date: 6-Apr-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 26: Proceedings of the 26th annual international symposium on Microarchitecture
December 1993
276 pages
ISBN:0818652802

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 December 1993

Check for updates

Qualifiers

  • Article

Conference

MICRO93
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)20
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2014)A data cache with multiple caching strategies tuned to different types of localityACM International Conference on Supercomputing 25th Anniversary Volume10.1145/2591635.2667170(217-226)Online publication date: 10-Jun-2014
  • (2012)Reducing off-chip memory traffic by selective cache management scheme in GPGPUsProceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units10.1145/2159430.2159443(110-119)Online publication date: 3-Mar-2012
  • (2008)Latency-tolerant software pipelining in a production compilerProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356073(104-113)Online publication date: 6-Apr-2008
  • (2007)Hybrid multi-core architecture for boosting single-threaded performanceACM SIGARCH Computer Architecture News10.1145/1241601.124160335:1(141-148)Online publication date: 1-Mar-2007
  • (2006)Decomposing memory performanceProceedings of the 5th international symposium on Memory management10.1145/1133956.1133970(95-103)Online publication date: 10-Jun-2006
  • (2006)Simple penalty-sensitive replacement policies for cachesProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128068(341-352)Online publication date: 3-May-2006
  • (2006)Enhancing last-level cache performance by block bypassing and early miss determinationProceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture10.1007/11859802_6(52-66)Online publication date: 6-Sep-2006
  • (2005)Enhancing Memory-Level Parallelism via Recovery-Free Value PredictionIEEE Transactions on Computers10.1109/TC.2005.11754:7(897-912)Online publication date: 1-Jul-2005
  • (2005)Performance of Runtime Optimization on BLASTProceedings of the international symposium on Code generation and optimization10.1109/CGO.2005.25(86-96)Online publication date: 20-Mar-2005
  • (2004)Compiler orchestrated prefetching via speculation and predicationACM SIGOPS Operating Systems Review10.1145/1037949.102441638:5(189-198)Online publication date: 7-Oct-2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media