Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/195473.195569acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

The performance impact of flexibility in the Stanford FLASH multiprocessor

Published: 01 November 1994 Publication History

Abstract

A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford FLASH multiprocessor, flexibility is obtained by requiring all transactions in a node to pass through a programmable node controller, called MAGIC. In this paper, we evaluate the performance costs of flexibility by comparing the performance of FLASH to that of an idealized hardwired machine on representative parallel applications and a multiprogramming workload. To measure the performance of FLASH, we use a detailed simulator of the FLASH and MAGIC designs, together with the code sequences that implement the cache-coherence protocol. We find that for a range of optimized parallel applications the performance differences between the idealized machine and FLASH are small. For these programs, either the miss rates are small or the latency of the programmable protocol can be hidden behind the memory access time. For applications that incur a large number of remote misses or exhibit substantial hot-spotting, performance is poor for both machines, though the increased remote access latencies or the occupancy of MAGIC lead to lower performance for the flexible design. In most cases, however, FLASH is only 2%–12% slower than the idealized machine.

References

[1]
Anant Agarwal et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. MiT/LCS Memo TM-454, Massachusetts Institute of Technology, 1991.
[2]
Todd A. Dutton et al. The Design of the DEC 3000 AXP Systems, Two High-performance Workstations. Digital Technical Journal, volume 4, number 4, pages 66-81. Digital Equipment Corporation, Maynard, MA, 1992.
[3]
Stephen Goldschmidt. Simulation of Multiprocessots: Accuracy and Performance. Ph.D. Thesis, Stanford University, June 1993.
[4]
John Heinlein et al. Integration of Message Passing and Shared Memory in the Stanford FLASH Multiprocessor. In Proceedings of the 6th Interna,tional Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 1994.
[5]
John Hennessy and David Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Mateo, CA, 1990.
[6]
Edmund A. Reese et al. A Phase-Tolerant 3.SGB/s Data-Communication Router for a Multiproc. essor Supercomputer Backplane. In Proceedings of the 1994 International Solid-State Circuits Conference, pages 296-297, San Francisco, CA, February }L994.
[7]
Jeffrey Kuskin et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st lnternal'ional Symposium on Computer Architecture, pages 302- 313, Chicago, IL, April 1994.
[8]
Rishiyur S. Nikhil, Gregory M. Papadopoulo,;, and Arvind. *T: A Multithreaded Massively Parallel Architecture. In Proceedings of the 19th International Symposium on Computer Architecture, pages 156-167, Gold Coast, Australia, May 1992.
[9]
Michael D. Noakes, Deborah A. Wallach, and William j. Dally. The J-Machine Multicomputer: An Architectural Evaluation. In Proceedings of the 20th International Symposium on Computer Architecture, pages 224-35, San Diego, CA, May l C~93.
[10]
Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st International Symposium on Computer Architecture, pages 325- 336, Chicago, IL, April 1994.
[11]
Edward Rothberg, Jaswinder Pal Singh, and Anoop Gupta. Working Sets, Cache Sizes, and Node Granularity for Large-Scale Multiprocessors. In Proceedings of the 20th International Symposium on Computer Architecture, pages 14-25, San Diego, CA, May 1993.
[12]
Mendel Rosenblum and Mani Varadarajan. SimOS: A Fast Operating System Simulation Environment. Technical Report CSL-TR-94-631, Stanford University, July 1994.
[13]
Richard Simoni. Cache Coherence Directories for Scalable Multiprocessors. Ph.D. Thesis, Technical Report CSL-TR-93-556, Stanford University, November 1992.
[14]
Michael David Smith. Support for Speculative Execution in High-Performance Processors. Ph.D. Thesis, Technical Report CSL-TR-93-556, Stanford University, November 1992.
[15]
Richard Stallman. Using and Porting GNU CC. Free Software Foundation, Cambridge, MA, June 1993.
[16]
Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford Parallel Applications for Shared-Memory. Computer Architecture News, 20(1):5-44, March 1992.
[17]
Steven Cameron Woo, Jaswinder Pal Singh, and John L. Hennessy. The Performance Advantages of Integrating Block Data Transfer in Cache-Coherent Multiprocessors. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 1994.

Cited By

View all
  • (2020)Retrofitting parallelism onto OCamlProceedings of the ACM on Programming Languages10.1145/34089954:ICFP(1-30)Online publication date: 3-Aug-2020
  • (2016)Heuristic Evaluation for Novice Programming SystemsACM Transactions on Computing Education10.1145/287252116:3(1-30)Online publication date: 8-Jun-2016
  • (2016)Assessing Problem-Based Learning in a Software Engineering Curriculum Using Bloom’s Taxonomy and the IEEE Software Engineering Body of KnowledgeACM Transactions on Computing Education10.1145/284509116:3(1-41)Online publication date: 20-May-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
November 1994
341 pages
ISBN:0897916603
DOI:10.1145/195473
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1994

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPLOS94
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)21
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Retrofitting parallelism onto OCamlProceedings of the ACM on Programming Languages10.1145/34089954:ICFP(1-30)Online publication date: 3-Aug-2020
  • (2016)Heuristic Evaluation for Novice Programming SystemsACM Transactions on Computing Education10.1145/287252116:3(1-30)Online publication date: 8-Jun-2016
  • (2016)Assessing Problem-Based Learning in a Software Engineering Curriculum Using Bloom’s Taxonomy and the IEEE Software Engineering Body of KnowledgeACM Transactions on Computing Education10.1145/284509116:3(1-41)Online publication date: 20-May-2016
  • (2015)Immutability Changes EverythingQueue10.1145/2857274.288403813:9(101-125)Online publication date: 21-Nov-2015
  • (2015)Time is an Illusion.Queue10.1145/2857274.287857413:9(57-72)Online publication date: 14-Nov-2015
  • (2015)Non-volatile StorageQueue10.1145/2857274.287423813:9(33-56)Online publication date: 7-Nov-2015
  • (2015)Internet Programmable IoT: On the role of APIs in IoTUbiquity10.1145/28228732015:November(1-10)Online publication date: 20-Nov-2015
  • (2015)Standards for TomorrowUbiquity10.1145/28225332015:November(1-12)Online publication date: 6-Nov-2015
  • (2015)Mobility Increases LocalizabilityACM Computing Surveys10.1145/267643047:3(1-34)Online publication date: 1-Apr-2015
  • (2012)Substantial improvements in the set-covering projection classifier CHIRP (composite hypercubes on iterated random projections)ACM Transactions on Knowledge Discovery from Data10.1145/2382577.23825836:4(1-18)Online publication date: 18-Dec-2012
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media