Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3383669.3398282acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections
research-article

Automatic Core Specialization for AVX-512 Applications

Published: 30 May 2020 Publication History

Abstract

Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average.
In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.

References

[1]
[n.d.]. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page.
[2]
[n.d.]. Phoronix Test Suite. https://phoronix-test-suite.com/.
[3]
2018. Intel® 64 and IA-32 Architectures Software Developer's Manual - Volume 1: Basic Architecture.
[4]
2018. Intel® Xeon® Processor Scalable Family - Specification Update. Intel Corporation.
[5]
2019. Intel® 64 and IA-32 Architectures Optimization Reference Manual.
[6]
Adam Belay, Andrea Bittau, Ali Mashtizadeh, David Terei, David Mazières, and Christos Kozyrakis. 2012. Dune: Safe User-level Access to Privileged CPU Features. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI'12). 335--348.
[7]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[8]
Jonathan Corbet. 2019. Core scheduling. https://lwn.net/Articles/780703/
[9]
Jonathan Corbet. 2019. Many uses for Core scheduling. https://lwn.net/Articles/799454/
[10]
Travis Downs. 2018. Dirty upper 256 causes everything to run at AVX-512 frequencies. https://www.realworldtech.com/forum/?threadid=179700&curpostid=179700
[11]
Travis Downs. 2020. Gathering Intel on Intel AVX-512 Transitions. https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
[12]
Mathias Gottschlag, Yussuf Khalil, and Frank Bellosa. 2020. Dim Silicon and the Case for Improved DVFS Policies. arXiv preprint arXiv:2005.01498 (2020).
[13]
Wei Huang, Karthick Rajamani, Mircea R Stan, and Kevin Skadron. 2011. Scaling with design constraints: Predicting the future of big chips. IEEE Micro 31, 4 (2011), 16--29.
[14]
Con Kolivas. 2016. MuQSS - The Multiple Queue Skiplist Scheduler v0.105. http://ck-hack.blogspot.com/2016/10/muqss-multiple-queue-skiplist-scheduler.html
[15]
Vlad Krasnov. 2017. On the dangers of Intel's frequency scaling. https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/.
[16]
Rakesh Kumar, Alejandro Martinez, and Antonio Gonzalez. 2014. Efficient power gating of simd accelerators through dynamic selective devectorization in an hw/sw codesigned environment. ACM Transactions on Architecture and Code Optimization (TACO) 11, 3 (2014), 25.
[17]
Min Lee and Karsten Schwan. 2012. Region scheduling: efficiently using the cache architectures via page-level affinity. In ACM SIGARCH Computer Architecture News, Vol. 40. ACM, 451--462.
[18]
Daniel Lemire. 2018. AVX-512 throttling: heavy instructions are maybe not so dangerous. https://lemire.me/blog/2018/08/25/avx-512-throttling-heavy-instructions-are-maybe-not-so-dangerous/.
[19]
Aubrey Li. 2019. Core scheduling: prevent fast instructions from slowing you down. (Sept. 9 2019). https://linuxplumbersconf.org/event/4/contributions/430/ Linux Plumbers Conference.
[20]
Tong Li, Paul Brett, Rob Knauerhase, David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 1--12.
[21]
Sangyoung Park, Jaehyun Park, Donghwa Shin, Yanzhi Wang, Qing Xie, Massoud Pedram, and Naehyuck Chang. 2013. Accurate modeling of the delay and energy overhead of dynamic voltage and frequency scaling in modern microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 5 (2013), 695--708.
[22]
Efraim Rotem, Alon Naveh, Avinash Ananthakrishnan, Eliezer Weissmann, and Doron Rajwan. 2012. Power-management architecture of the intel microarchitecture code-named sandy bridge. Ieee micro 32, 2 (2012), 20--27.
[23]
Soumyaroop Roy, Nagarajan Ranganathan, and Srinivas Katkoori. 2009. A framework for power-gating functional units in embedded microprocessors. IEEE transactions on very large scale integration (VLSI) systems 17, 11 (2009), 1640--1649.
[24]
Robert Schöne, Thomas Ilsche, Mario Bielert, Andreas Gocht, and Daniel Hackenberg. 2019. Energy Efficiency Features of the Intel Skylake-SP Processor and Their Impact on Performance. arXiv preprint arXiv:1905.12468 (2019).
[25]
Livio Soares and Michael Stumm. 2010. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. USENIX Association, 33--46.
[26]
Michael B Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse. In 49th ACM/EDAC/IEEE Design Automation Conference. IEEE, 1131--1136.
[27]
Praveen Kumar Tiwari, Vignesh V Menon, Jayashri Murugan, Jayashree Chandrasekaran, Gopi Satykrishna Akisetty, Pradeep Ramachandran, Sravanthi Kota Venkata, Christopher A Bird, and Kevin Cone. 2018. Accelerating x265 with Intel® Advanced Vector Extensions 512. Technical Report. Intel.
[28]
Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 35--44.

Cited By

View all
  • (2024)Computation of X-ray and Neutron Scattering Patterns to Benchmark Atomistic Simulations against ExperimentsInternational Journal of Molecular Sciences10.3390/ijms2503154725:3(1547)Online publication date: 26-Jan-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2024)About Methods of Vector Addition over Finite Fields Using Extended Vector RegistersLarge-Scale Scientific Computations10.1007/978-3-031-56208-2_44(427-434)Online publication date: 24-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SYSTOR '20: Proceedings of the 13th ACM International Systems and Storage Conference
May 2020
118 pages
ISBN:9781450375887
DOI:10.1145/3383669
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AVX-512
  2. core specialization
  3. dim silicon

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SYSTOR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 108 of 323 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Computation of X-ray and Neutron Scattering Patterns to Benchmark Atomistic Simulations against ExperimentsInternational Journal of Molecular Sciences10.3390/ijms2503154725:3(1547)Online publication date: 26-Jan-2024
  • (2024)SIMDified Data Processing - Foundations, Abstraction, and Advanced TechniquesCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654694(613-621)Online publication date: 9-Jun-2024
  • (2024)About Methods of Vector Addition over Finite Fields Using Extended Vector RegistersLarge-Scale Scientific Computations10.1007/978-3-031-56208-2_44(427-434)Online publication date: 24-May-2024
  • (2023)Performance Portability Assessment: Non-negative Matrix Factorization as a Case StudyEuro-Par 2022: Parallel Processing Workshops10.1007/978-3-031-31209-0_18(239-250)Online publication date: 2-May-2023
  • (2022)To share or not to share vector registers?The VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-022-00744-231:6(1215-1236)Online publication date: 28-Apr-2022
  • (2021)SIMD-MIMD cocktail in a hybrid memory glassProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463782(1-12)Online publication date: 14-Jun-2021
  • (2020)AVX overhead profilingProceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3409963.3410488(59-66)Online publication date: 24-Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media