Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1898953.1899072acmotherconferencesArticle/Chapter ViewAbstractPublication PagesidpdsConference Proceedingsconference-collections
Article

Compatible phase co-scheduling on a CMP of multi-threaded processors

Published: 25 April 2006 Publication History

Abstract

The industry is rapidly moving towards the adoption of Chip Multi-Processors (CMPs) of Simultaneous Multi-Threaded (SMT) cores for general purpose systems. The most prominent use of such processors, at least in the near term, will be as job servers running multiple independent threads on the different contexts of the various SMT cores. In such an environment, the co-scheduling of phases from different threads plays a significant role in the overall throughput. Less throughput is achieved when phases from different threads that conflict for particular hardware resources are scheduled together, compared with the situation where compatible phases are co-scheduled on the same SMT core. Achieving the latter requires precise per-phase hardware statistics that the scheduler can use to rapidly identify possible incompatibilities among phases of different threads, thereby avoiding the potentially high performance cost of inter-thread contention.
In this paper, we devise phase co-scheduling policies for a dual-core CMP of dual-threaded SMT processors. We explore a number of approaches and find that the use of ready and in-flight instruction metrics permits effective co-scheduling of compatible phases among the four contexts. This approach significantly outperforms the worst static grouping of threads, and very closely matches the best static grouping, even outperforming it by as much as 7%.

References

[1]
D. Burger and T. Austin. The Simplescalar toolset, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.
[2]
F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernández. Dynamically Controlled Resource Allocation in SMT Processors. In Proceedings of the 37th International Symposium on Microarchitecture, pages 171-182, December 2005.
[3]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 340-351, February 2005.
[4]
A. El-Moursy and D. H. Albonesi. Front-End Policies for Improved Issue Efficiency in SMT Processors. In Proceedings of the 9th International Symposium on High Performance Computer Architecture, pages 31-40, February 2003.
[5]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance Of Multithreaded Chip Multiprocessors And Implications For Operating System Design. In Proceedings of USENIX 2005 Annual Technical Conference, pages 395-398, June 2005.
[6]
Intel Corporation. IA-32 Intel Architecture Optimization: Reference Manual. http://www.intel.com/design/pentium4/manuals, 2004.
[7]
R. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24-36, March/April 1999.
[8]
J. L. Kihm and D. A. Connors. Implementation of Fine-Grained Cache Monitoring for Improved SMT Scheduling. In Proceedings of the 22nd IEEE International Conference on Computer Design, pages 326-331, October 2004.
[9]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21-29, March 2005.
[10]
D. Koufaty and D. T. Marr. Hyperthreading Technology in the Net-burst Microarchitecture. IEEE Micro, 23(2):56-65, March 2003.
[11]
K. Luo, J. Gummaraju, and M. Franklin. Balancing Thoughput and Fairness in SMT Processors. In International Symposium on Performance Analysis of Systems and Software, pages 164-171, January 2001.
[12]
D. T. Marr, F. Binns, D. L. Hill, G. Hinton, D. A. Koufaty, J. A. Miller, and M. Upton. Hyper-Threading Technology Architecture and Microarchitecture. Intel Technology Journal, 6(1):4-15, February 2002.
[13]
C. McNairy and R. Bhatia. Montecito: A Dual-Core, Dual-Thread Itanium Processor. IEEE Micro, 25(2):10-20, March 2005.
[14]
R. Merritt. IBM Weaves Multithreading into Power5. EE Times, 2003.
[15]
S. Parekh, S. Eggers, and H. Levy. Thread-Sensitive Scheduling for SMT Processors. Technical report, University of Washington, 2000.
[16]
S. Raasch and S. Reinhardt. The Impact of Resource Partitioning on SMT Processors. In Proceedings of the 12th International Conference of Parallel Architectures and Compilation Techniques, pages 15-26, September 2003.
[17]
A. Settle, J. L. Kihm, A. Janiszewski, and D. A. Connors. Architectural Support for Enhanced SMT Job Scheduling. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 63-73, October 2004.
[18]
A. Snavely and D. M. Tullsen. Explorations in Symbiosis on Two Multithreaded Architectures. In Proceedings of the Workshop on Multithreaded Execution, Architecture, and Compilation, January 1999.
[19]
A. Snavely and D. M. Tullsen. Symbiotic Job Scheduling for a Simultaneous Multithreading Architecture. In Proceedings of the 9th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 234-244, November 2000.
[20]
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic Job Scheduling with Priorities for a Simultaneous Multithreading Processor. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, June 2002.
[21]
D. Tullsen, S. Eggers, H. Levy, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 191-202, May 1996.

Cited By

View all
  • (2018)Improving Resource Utilization through Demand Aware Process SchedulingProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225132(1-10)Online publication date: 13-Aug-2018
  • (2018)Resolving the GPU responsiveness dilemma through program transformationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6206-y12:3(545-559)Online publication date: 1-Jun-2018
  • (2016)Warped-slicerACM SIGARCH Computer Architecture News10.1145/3007787.300116144:3(230-242)Online publication date: 18-Jun-2016
  • Show More Cited By
  1. Compatible phase co-scheduling on a CMP of multi-threaded processors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      IPDPS'06: Proceedings of the 20th international conference on Parallel and distributed processing
      April 2006
      399 pages
      ISBN:1424400546

      Sponsors

      • IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing

      In-Cooperation

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 25 April 2006

      Check for updates

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Improving Resource Utilization through Demand Aware Process SchedulingProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225132(1-10)Online publication date: 13-Aug-2018
      • (2018)Resolving the GPU responsiveness dilemma through program transformationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6206-y12:3(545-559)Online publication date: 1-Jun-2018
      • (2016)Warped-slicerACM SIGARCH Computer Architecture News10.1145/3007787.300116144:3(230-242)Online publication date: 18-Jun-2016
      • (2016)Warped-slicerProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.29(230-242)Online publication date: 18-Jun-2016
      • (2015)FluidCheckACM Transactions on Architecture and Code Optimization10.1145/284262012:4(1-26)Online publication date: 22-Dec-2015
      • (2015)Resource-Aware Task SchedulingACM Transactions on Embedded Computing Systems10.1145/263855414:1(1-25)Online publication date: 21-Jan-2015
      • (2014)Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architecturesACM Transactions on Architecture and Code Optimization10.1145/257967411:1(1-25)Online publication date: 1-Feb-2014
      • (2014)Exploiting Performance Counters for Energy Efficient Co-Scheduling of Mixed Workloads on Multi-Core PlatformsProceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms10.1145/2556863.2556866(27-32)Online publication date: 20-Jan-2014
      • (2012)Optimal task assignment in multithreaded processorsACM SIGPLAN Notices10.1145/2248487.215100247:4(235-248)Online publication date: 3-Mar-2012
      • (2012)Probabilistic modeling for job symbiosis scheduling on SMT processorsACM Transactions on Architecture and Code Optimization10.1145/2207222.22072239:2(1-27)Online publication date: 15-Jun-2012
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media