Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2150976.2151002acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Optimal task assignment in multithreaded processors: a statistical approach

Published: 03 March 2012 Publication History

Abstract

The introduction of massively multithreaded (MMT) processors, comprised of a large number of cores with many shared resources, has made task scheduling, in particular task to hardware thread assignment, one of the most promising ways to improve system performance. However, finding an optimal task assignment for a workload running on MMT processors is an NP-complete problem. Due to the fact that the performance of the best possible task assignment is unknown, the room for improvement of current task-assignment algorithms cannot be determined. This is a major problem for the industry because it could lead to: (1)~A waste of resources if excessive effort is devoted to improving a task assignment algorithm that already provides a performance that is close to the optimal one, or (2)~significant performance loss if insufficient effort is devoted to improving poorly-performing task assignment algorithms.
In this paper, we present a method based on Extreme Value Theory that allows the prediction of the performance of the optimal task assignment in MMT processors. We further show that executing a sample of several hundred or several thousand random task assignments is enough to obtain, with very high confidence, an assignment with a performance that is close to the optimal one. We validate our method with an industrial case study for a set of multithreaded network applications running on an UltraSPARC~T2 processor.

References

[1]
OpenSPARC™ T2 Core Microarchitecture Specification. Sun Microsystems, Inc, 2007.
[2]
OpenSPARC™ T2 System-On-Chip (SOC) Microarchitecture Specification. Sun Microsystems, Inc, 2007.
[3]
Netra Data Plane Software Suite 2.0 Update 2 Reference Manual. Sun Microsystems, Inc, 2008.
[4]
Netra Data Plane Software Suite 2.0 Update 2 User's Guide. Sun Microsystems, Inc, 2008.
[5]
Software. Hardware. Complete. http://www.oracle.com/ocom/groups/public/@ocom/documents /webcontent/044518.pdf, 2010.
[6]
C. Acosta, F. Cazorla, A. Ramirez, and M. Valero. Thread to Core Assignment in SMT On-Chip Multiprocessors. In SBAC-PAD '09: Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009.
[7]
A. V. Aho and M. J. Corasick. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM, 18, 1975.
[8]
A. Azzalini. Statistical Inference Based on the Likelihood. Chapman and Hall, London, 1996.
[9]
A. A. Balkema and L. de Haan. Residual life time at great age. Annals of Probability, 2: 792--804, 1974.
[10]
J. Beirlant, Y. Goegebeur, J. Segers, and J. Teugels. Statistics of Extremes: Theory and Applications. John Wiley and Sons, Ltd, 2004.
[11]
E. Castillo. Extreme value theory in engineering. Academic Press, Inc., 1988.
[12]
E. Castillo and A. Hadi. Fitting the Generalized Pareto Distribution to data. Journal of the American Statistical Association, 92, 1997.
[13]
S. Kim, and Y. Solihin}contention_predictD. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005.
[14]
S. J. Chapman. Essentials of MATLAB Programming. Cengage Learning, 2009.
[15]
H. Chernoff. On the distribution of the likelihood ratio. Annals of Mathematical Statistics, 25, 1954.
[16]
W. G. Cochran. Sampling techniques. Wiley, 1977.
[17]
K. J. Connolly. Law of Internet Security and Privacy. Aspen Publishers, 2003.
[18]
M. De Vuyst, R. Kumar, and D. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In IPDPS '06: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, 2006.
[19]
D. Eckhoff, T. Limmer, and F. Dressler. Hash Tables for Efficient Flow Monitoring: Vulnerabilities and Countermeasures. In LCN '09: Proceedings of the 34th Conference on Local Computer Networks, 2009.
[20]
A. El-Moursy, R. Garg, D. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a CMP of multi-threaded processors. In IPDPS '06: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, 2006.
[21]
S. Eyerman and L. Eeckhout. Per-thread cycle accounting in SMT processors. In ASPLOS '09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, 2009.
[22]
S. Eyerman and L. Eeckhout. Probabilistic job symbiosis modeling for SMT processor scheduling. In ASPLOS '10: Proceeding of the 15th international conference on Architectural support for programming languages and operating systems, 2010.
[23]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, 2005.
[24]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979.
[25]
M. Gilli and E. Kellezi. An application of extreme value theory for measuring financial risk. Computational Economics, 27, 2006.
[26]
R. Gioiosa, F. Petrini, K. Davis, and F. Lebaillif-Delamare. Analysis of system overhead on parallel computers. In ISSPIT '04: 4th IEEE International Symposium on Signal Processing and Information Technology, Rome, Italy, 2004.
[27]
M. Greenwood. The natural duration of cancer. Reports on Public Health and Medical Subjects, 33: 1--26, 1926.
[28]
S. Grimshaw. Computing the maximum likelihood estimates for the Generalized Pareto Distribution to data. Technometrics, 35, 1993.
[29]
F. Guo and T. Chiueh. Traffic Analysis: From Stateful Firewall to Network Intrusion Detection System. In RPE Report, 2004.
[30]
J. R. M. Hosking and J. R. Wallis. Parameter and quantile estimation for the generalised pareto distribution. Technometrics, 29, 1987.
[31]
Internet Engineering Task Force (IETF). Request for Comments (RFC), http://www.rfc-editor.org/.
[32]
R. Jain, C. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In RTSS '02: Proceedings of the 23rd IEEE Real-Time Systems Symposium, 2002.
[33]
Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008.
[34]
E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 52 (282): 457--481, 1958.
[35]
E. Kellezi and M. Gilli. Extreme value theory for tail-related risk measures. Fame research paper series, International Center for Financial Asset Management and Engineering, 2000.
[36]
J. Kihm, A. Settle, A. Janiszewski, and D. A. Connors. Understanding the impact of inter-thread cache interference on ILP in modern SMT processors. The Journal of Instruction Level Parallelism, 7, 2005.
[37]
Kunze, Mudigonda, Jason, and Vin}Kokku04CaseR. Kokku, T. L. Riché, A. Kunze, J. Mudigonda, J. Jason, and H. M. Vin. A case for run-time adaptation in packet processing systems. SIGCOMM Comput. Commun. Rev., 34 (1), 2004.
[38]
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogenous multi-core architectures for multithreaded workload performance. In ISCA '04: Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004.
[39]
R. L. McGregor, C. D. Antonopoulos, and D. S. Nikolopoulos. Scheduling algorithms for effective thread pairing on hybrid multiprocessors. In IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
[40]
M. Norton. Optimizing Pattern Matching for Intrusion Detection, http://docs.idsresearch.org/OptimizingPatternMatchingForIDS.pdf, 2004.
[41]
Probe network monitor. http://www.ntop.org.
[42]
F. Petrini, D. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, 2003.
[43]
J. I. Pickands. Statistical inference using extreme value order statistics. Annals of Statististics, 3: 119--131, 1975.
[44]
P. Radojković, V. Cakarević, J. Verdú, A. Pajuelo, F. J. Cazorla, M. Nemirovsky, and M. Valero. Thread to strand binding of parallel network applications in massive multi-threaded systems. In PPoPP-2010: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Bangalore, India, 2010.
[45]
A. Settle, J. Kihm, A. Janiszewski, and D. Connors. Architectural support for enhanced SMT job scheduling. In PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, 2004.
[46]
D. Shelepov, J. C. S. Alcaide, S. Jeffery, A. Fedorova, N. Perez, Z. F. Huang, S. Blagodurov, and V. Kumar. Hass: A scheduler for heterogeneous multicore systems. In ACM SIGOPS Operating Systems Review, 2009.
[47]
T. Sherwood, G. Varghese, and B. Calder. A Pipelined Memory Architecture for High Throughput Network Processors. In ISCA '03: Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003.
[48]
E. Shmueli, G. Almasi, J. Brunheroto, J. Castanos, G. Dozsa, S. Kumar, and D. Lieber. Evaluating the effect of replacing CNK with Linux on the compute-nodes of Blue Gene/L. In ICS '08: Proceedings of the 22nd Annual International Conference on Supercomputing, 2008.
[49]
A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreaded processor. In ASPLOS 2000: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000.
[50]
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2002.
[51]
Snort network intrusion prevention and detection system. http://www.snort.org/.
[52]
N. Tajvidi. Design and implementation of statistical computations for Generalized Pareto Distributions. Technical Report, Chalmers University of Technology, 1996.
[53]
D. Tam, R. Azimi, and M. Stumm. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, 2007.
[54]
TCPDUMP/LIBPCAP public repository. http://www.tcpdump.org/.
[55]
S. B. Vardeman. Statistics for Engineering Problem Solving. PWS publishing company, 1993.
[56]
V. Cakarević, P. Radojković, J. Verdú, A. Pajuelo, F. Cazorla, M. Nemirovsky, and M. Valero. Characterizing the resource-sharing levels in the UltraSPARC T2 processor. In MICRO-42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, NY, USA, 2009.
[57]
J. Verdú. Analysis and Architectural Support for Parallel Stateful Packet Processing, PhD Thesis. Universitat Politècnica de Catalunya, 2008.
[58]
Vermont (VERsatile MONitoring Toolkit). http://vermont.berlios.de/.
[59]
S. S. Wilks. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics, 9, 1938.
[60]
S. S. Wilks. Mathematical Statistics. Princeton University Press, 1943.
[61]
Wireshark network protocol analyzer. http://www.wireshark.org/.
[62]
T. Wolf, N. Weng, and C.-H. Tai. Design considerations for network processor operating systems. In ANCS '05: Proceedings of ACM/IEEE Symposium on Architectures for Networking and Communication Systems, Princeton, NJ, 2005.

Cited By

View all
  • (2024)SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00068(705-715)Online publication date: 27-May-2024
  • (2023)SLITS: Sparsity-Lightened Intelligent Thread SchedulingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794367:1(1-23)Online publication date: 2-Mar-2023
  • (2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
March 2012
476 pages
ISBN:9781450307598
DOI:10.1145/2150976
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 40, Issue 1
    ASPLOS '12
    March 2012
    453 pages
    ISSN:0163-5964
    DOI:10.1145/2189750
    Issue’s Table of Contents
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 47, Issue 4
    ASPLOS '12
    April 2012
    453 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2248487
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. extreme value theory
  2. multithreading
  3. scheduling
  4. statistical estimation
  5. task assignment

Qualifiers

  • Research-article

Conference

ASPLOS'12

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00068(705-715)Online publication date: 27-May-2024
  • (2023)SLITS: Sparsity-Lightened Intelligent Thread SchedulingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794367:1(1-23)Online publication date: 2-Mar-2023
  • (2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
  • (2020)Modeling and optimizing NUMA effects and prefetching with machine learningProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392765(1-13)Online publication date: 29-Jun-2020
  • (2018)Improving Resource Utilization through Demand Aware Process SchedulingProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225132(1-10)Online publication date: 13-Aug-2018
  • (2017)Performance-Monitoring-Based Traffic-Aware Virtual Machine Deployment on NUMA SystemsIEEE Systems Journal10.1109/JSYST.2015.246965211:2(973-982)Online publication date: Jun-2017
  • (2016)Warped-slicerACM SIGARCH Computer Architecture News10.1145/3007787.300116144:3(230-242)Online publication date: 18-Jun-2016
  • (2016)Thread Assignment in Multicore/Multithreaded Processors: A Statistical ApproachIEEE Transactions on Computers10.1109/TC.2015.241753365:1(256-269)Online publication date: 1-Jan-2016
  • (2016)Warped-slicerProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.29(230-242)Online publication date: 18-Jun-2016
  • (2016)Utility Maximizing Thread Assignment and Resource Allocation2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.82(433-442)Online publication date: May-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media