Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO56248.2022.00063acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications

Published: 18 December 2023 Publication History
  • Get Citation Alerts
  • Abstract

    User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements (30--250μs). These characteristics render existing energy-conserving techniques ineffective when processors are idle due to the long transition time (order of 100μs) from a deep CPU core idle power state (C-state). While prior works propose management techniques to mitigate this inefficiency, we tackle it at its root with AgileWatts (AW): a new deep CPU core C-state architecture optimized for datacenter server processors targeting latency-sensitive applications.
    AW drastically reduces the transition latency from deep CPU core idle power states while retaining most of their power savings based on three key ideas. First, AW eliminates the latency (several microseconds) of saving/restoring the core context when powering-off/-on the core in a deep idle state by i) implementing medium-grained power-gates, carefully distributed across the CPU core, and ii) retaining context in the power-ungated domain. Second, AW eliminates the flush latency (several tens of microseconds) of the L1/L2 caches when entering a deep idle state by keeping L1/L2 content power-ungated. A small control logic also remains ungated to serve cache coherence traffic. AW implements cache sleep-mode and leakage reduction for the power-ungated domain by lowering a core's voltage to the minimum operational level. Third, using a state-of-the-art power efficient all-digital phase-locked loop (ADPLL) clock generator, AW keeps the PLL active and locked during the idle state, cutting microseconds of wake-up latency at negligible power cost.
    Our evaluation with an accurate industrial-grade simulator calibrated against an Intel Skylake server shows that AW reduces the energy consumption of Memcached by up to 71% (35% on average) with <1% end-to-end performance degradation. We observe similar trends for other evaluated services (MySQL and Kafka). AW's new deep C-states C6A and C6AE reduce transition-time by up to 900× as compared to the deepest existing idle state C6, while consuming only 7% and 5% of the active state (C0) power, respectively.

    References

    [1]
    L. Barroso, M. Marty, D. Patterson, and P. Ranganathan, "Attack of the killer microseconds," Communications of the ACM, 2017.
    [2]
    L. A. Barroso, U. Hölzle, and P. Ranganathan, "The Datacenter as a Computer: Designing Warehouse-scale Machines," Synthesis Lectures on Computer Architecture, vol. 13, no. 3, pp. i--189, 2018.
    [3]
    G. Prekas, M. Kogias, and E. Bugnion, "Zygos: Achieving low tail latency for microsecond-scale networked tasks," in SOSP, 2017.
    [4]
    C.-H. Chou, L. N. Bhuyan, and D. Wong, "μDPM: Dynamic Power Management for the Microsecond Era," in HPCA, 2019.
    [5]
    S. Cho, A. Suresh, T. Palit, M. Ferdman, and N. Honarmand, "Taming the Killer Microsecond," in MICRO, 2018.
    [6]
    N. Dmitry and S.-S. Manfred, "On Micro-Services Architecture," INJOIT, 2014.
    [7]
    C. Gough, I. Steiner, and W. Saunders, "CPU Power Management," in Energy Efficient Servers: Blueprints for Data Center Optimization, 2015.
    [8]
    J. Haj-Yahya, A. Mendelson, Y. B. Asher, and A. Chattopadhyay, "Power Management of Modern Processors," in EEHPC, 2018.
    [9]
    G. Antoniou, H. Volos, D. B. Bartolini, T. Rollet, Y. Sazeides, and J. H. Yahya, "AgileP-kgC: An Agile System Idle State Architecture for Energy Proportional Datacenter Servers," arXiv preprint arXiv:2204.10466, 2022.
    [10]
    Intel Corporation, "CPU Idle Time Management." accessed Feb 2022, https://bit.ly/3Lz1SM3.
    [11]
    A. Rogers, D. Kaplan, E. Quinnell, and B. Kwan, "The Core-C6 (CC6) Sleep State of the AMD Bobcat x86 Microprocessor," in ISLPED, 2012.
    [12]
    R. Schöne, D. Molka, and M. Werner, "Wake-up Latencies for Processor Idle States on Current x86 Processors," Computer Science-Research and Development, 2015.
    [13]
    R. Schöne, T. Ilsche, M. Bielert, A. Gocht, and D. Hackenberg, "Energy Efficiency Features of the Intel Skylake-SP Processor and Their Impact on Performance," in HPCS, 2019.
    [14]
    Intel, "Intel Atom Processors Z5xx Series," 2010, https://intel.ly/3bw9qP3.
    [15]
    Intel Corporation, "Intel Idle driver for Linux." accessed Feb 2022, https://bit.ly/3GKRJbK.
    [16]
    X. Zhan, R. Azimi, S. Kanev, D. Brooks, and S. Reda, "CARB: A C-state Power Management Arbiter for Latency-critical Workloads," CAL, 2016.
    [17]
    Wikichip, "Skylake (server) - Microarchitectures - Intel," online, accessed November 2021 https://bit.ly/2MHEWkj.
    [18]
    A. Kumar and M. Trivedi, "Intel Xeon Scalable Processor Architecture Deep Dive," in Intel Press Workshops, 2017, https://bit.ly/3w0cTyU.
    [19]
    Anandtech, "The Microsoft Surface Pro (2017) Review: Evaluation," 2020, https://bit.ly/2WCB3yZ.
    [20]
    BAPCo, "MobileMark 2014," Mar 2019, https://bapco.com/products/mobilemark-2018.
    [21]
    N. Kurd, M. Chowdhury, E. Burton, T. P. Thomas, C. Mozak, B. Boswell, P. Mosalikanti, M. Neidengard, A. Deval, A. Khanna, N. Chowdhury, R. Rajwar, T. M. Wilson, and R. Kumar, "Haswell: A Family of IA 22 nm Processors," JSSC, 2014.
    [22]
    C. C. Chi, M. Alvarez-Mesa, and B. Juurlink, "Low-power high-efficiency video decoding using general-purpose processors," TACO, 2015.
    [23]
    J. Haj-Yahya, M. Alser, J. Kim, A. G. Yağlıkçı, N. Vijaykumar, E. Rotem, and O. Mutlu, "SysScale: Exploiting Multi-domain Dynamic Voltage and Frequency Scaling for Energy Efficient Mobile Processors." ISCA, 2020.
    [24]
    J. Haj-Yahya, Y. Sazeides, M. Alser, E. Rotem, and O. Mutlu, "Techniques for Reducing the Connected-Standby Energy Consumption of Mobile Devices," in HPCA, 2020.
    [25]
    J. Haj-Yahya, J. Park, R. Bera, J. Gómez Luna, E. Rotem, T. Shahroodi, J. Kim, and O. Mutlu, "BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems," in MICRO, 2021.
    [26]
    E. Fayneh, M. Yuffe, E. Knoll, M. Zelikson, M. Abozaed, Y. Talker, Z. Shmuely, and S. A. Rahme, "4.1 14nm 6th-generation Core Processor SoC with Low Power Consumption and Improved Performance," in ISSCC, 2016.
    [27]
    CPUbenchmark, "AMD vs Intel Market Share," accessed Nov 2020, https://bit.ly/3kV6kWY.
    [28]
    D. Lo, L. Cheng, R. Govindaraju, L. A. Barroso, and C. Kozyrakis, "Towards Energy Proportionality for Large-scale Latency-critical Workloads," in ISCA, 2014.
    [29]
    D. Meisner, B. T. Gold, and T. F. Wenisch, "Powernap: Eliminating Server Idle Power," ASPLOS, 2009.
    [30]
    J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur, and K. D. Panda, "Memcached Design on High Performance RDMA Capable Interconnects," in ICPP, 2011.
    [31]
    C.-H. Chou, D. Wong, and L. N. Bhuyan, "Dynsleep: Fine-grained Power Management for a Latency-critical Data Center Application," in ISLPED, 2016.
    [32]
    Intel, "Diagram for Skylake-SP core," 2019, https://intel.ly/3Ctjwfr.
    [33]
    J. Mandelblat, "Technology Insight: Intel's Next Generation Microarchitecture Code Name Skylake," in Intel Developer Forum, San Francisco, 2015.
    [34]
    J. Reinders, "Intel AVX-512 Instructions," Intel Software Developer Zone, Jun, 2017.
    [35]
    J. Haj-Yahya, J. S. Kim, A. G. Yaglikci, I. Puddu, L. Orosa, J. G. Luna, M. Alser, and O. Mutlu, "IChannels: Exploiting Current Management Mechanisms to Create Covert Channels in Modern Processors," ISCA, 2021.
    [36]
    S. M. Tam, H. Muljono, M. Huang, S. Iyer, K. Royneogi, N. Satti, R. Qureshi, W. Chen, T. Wang, H. Hsieh, S. Vora, and E. Wang, "SkyLake-SP: A 14nm 28-Core Xeon® Processor," in ISSCC, 2018.
    [37]
    E. A. Burton, G. Schrom, F. Paillet, J. Douglas, W. J. Lambert, K. Radhakrishnan, and M. J. Hill, "FIVR - Fully integrated voltage regulators on 4th generation Intel® Core™ SoCs," in APEC, 2014.
    [38]
    A. Nalamalpu, N. Kurd, A. Deval, C. Mozak, J. Douglas, A. Khanna, F. Paillet, G. Schrom, and B. Phelps, "Broadwell: A Family of IA 14nm Processors," in VLSI Circuits, 2015.
    [39]
    Intel, "Icelake, 10th Generation Intel® Core™ Processor Families," July 2019, https://intel.ly/3frvxpK.
    [40]
    E. Rotem, A. Naveh, D. Rajwan, A. Ananthakrishnan, and E. Weissmann, "Power Management Architecture of the 2nd Generation Intel® Core Microarchitecture, Formerly Codenamed Sandy Bridge," in HotChips, 2011.
    [41]
    J. Haj-Yahya, E. Rotem, A. Mendelson, and A. Chattopadhyay, "A Comprehensive Evaluation of Power Delivery Schemes for Modern Microprocessors," in ISQED, 2019.
    [42]
    T. Singh, S. Rangarajan, D. John, C. Henrion, S. Southard, H. McIntyre, A. Novak, S. Kosonocky, R. Jotwani, A. Schaefer, E. Chang, J. Bell, and M. Co, "3.2 Zen: A Next-generation High-performance× 86 Core," in ISSCC, 2017.
    [43]
    T. Singh, A. Schaefer, S. Rangarajan, D. John, C. Henrion, R. Schreiber, M. Rodriguez, S. Kosonocky, S. Naffziger, and A. Novak, "Zen: An Energy-Efficient High-Performance - x86 Core," JSSC, 2018.
    [44]
    T. Burd, N. Beck, S. White, M. Paraschou, N. Kalyanasundharam, G. Donley, A. Smith, L. Hewitt, and S. Naffziger, "Zeppelin: An SoC for Multichip Architectures," JSSC, 2019.
    [45]
    N. Beck, S. White, M. Paraschou, and S. Naffziger, "Zeppelin: An SoC for multichip architectures," in ISSCC, 2018.
    [46]
    Z. Toprak-Deniz, M. Sperling, J. Bulzacchelli, G. Still, R. Kruse, S. Kim, D. Boerstler, T. Gloekler, R. Robertazzi, K. Stawiasz, T. Diemoz, G. English, D. Hui, P. Muench, and J. Friedrich, "5.2 distributed system of digitally controlled microregulators enabling per-core DVFS for the POWER8 TM microprocessor," in ISSCC, 2014.
    [47]
    Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose, "Microarchitectural Techniques for Power Gating of Execution Units," in ISLPED, 2004.
    [48]
    J. H. Yahya, J. S. Kim, A. G. Yağlıkçı, J. Park, E. Rotem, Y. Sazeides, and O. Mutlu, "DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors," in HPCA, 2022.
    [49]
    A. B. Kahng, S. Kang, T. S. Rosing, and R. Strong, "Many-Core Token-Based Adaptive Power Gating," TCAD, 2013.
    [50]
    R. Chadha and J. Bhasker, "Architectural Techniques for Low Power," in An ASIC Low Power Primer. Springer, 2013.
    [51]
    K. Usami, T. Shirai, T. Hashida, H. Masuda, S. Takeda, M. Nakata, N. Seki, H. Amano, M. Namiki, M. Imai, M. Kondo, and H. Nakamura, "Design and Implementation of Fine-grain Power Gating with Ground Bounce Suppression," in VLSI Design, 2009.
    [52]
    K. Agarwal, H. Deogun, D. Sylvester, and K. Nowka, "Power Gating With Multiple Sleep Modes," in ISQED, 2006.
    [53]
    A. Abba and K. Amarender, "Improved Power Gating Technique for Leakage Power Reduction," IJES, 2014.
    [54]
    P. Larsson, "di/dt Noise in CMOS Integrated Circuits," in Analog Design Issues in Digital VLSI Circuits and Systems. Springer, 1997.
    [55]
    C. J. Akl, R. A. Ayoubi, and M. A. Bayoumi, "An Effective Staggered-Phase Damping Technique for Suppressing Power-Gating Resonance Noise During Mode Transition," in ISQED, 2009.
    [56]
    A. B. Kahng, S. Kang, T. Rosing, and R. Strong, "TAP: Token-based Adaptive Power Gating," in ISLPED, 2012.
    [57]
    J. Haj-Yahya, A. Mendelson, Y. B. Asher, and A. Chattopadhyay, Energy Efficient High Performance Processors: Recent Approaches for Designing Green High Performance Computing. Springer, 2018.
    [58]
    D.-V. One, "Intel® xeon® processor e7-8800/4800/2800 v2 product family," 2014, https://intel.ly/2ZGA9FJ.
    [59]
    S. Jahagirdar, V. George, I. Sodhi, and R. Wells, "Power management of the third generation Intel Core micro architecture formerly codenamed Ivy Bridge," in HotChips, 2012.
    [60]
    B. Howse and R. Smith, "Tick tock on the rocks: Intel delays 10nm, adds 3rd gen 14nm core product kaby lake," 2015.
    [61]
    G. Gerosa, S. Curtis, M. D'Addeo, B. Jiang, B. Kuttanna, F. Merchant, B. Patel, M. H. Taufique, and H. Samarchi, "A sub-2W low power IA processor for Mobile Internet Devices in 45 nm High-k Metal Gate CMOS," JSSC, 2008.
    [62]
    S. Jahagirdar, V. George, J. B. Conrad, R. Milstrey, S. A. Fischer, A. Naveh, and S. Rotem, "Method and Apparatus for a Zero Voltage Processor Sleep State," Mar. 22 2012, uS Patent App. 13/220,413.
    [63]
    D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D. W. Stout, S. W. Gould, and J. M. Cohn, "Managing Power and Performance for System-on-chip Designs Using Voltage Islands," in ICCAD, 2002.
    [64]
    H. Mahmoodi-Meimand and K. Roy, "Data-retention Flip-flops for Power-down Applications," in ISCAS, 2004.
    [65]
    J. Rabinowicz and S. Greenberg, "A New Physical Design Flow for a Selective State Retention Based Approach," JLPEA, 2021.
    [66]
    L. Gwennap, "P6 Microcode can be Patched," Microprocessor Report, 1997.
    [67]
    M. Ermolov, D. Sklyarov, and M. Goryachy, "Undocumented X86 Instructions to Control The CPU at the Microarchitecture Level in Modern Intel Processors," 2021.
    [68]
    D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, "Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System," in PACT, 2009.
    [69]
    D. Hackenberg, D. Molka, and W. E. Nagel, "Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems," in MICRO, 2009.
    [70]
    M. Huang, M. Mehalel, R. Arvapalli, and S. He, "An Energy Efficient 32-nm 20-MB Shared on-die L3 cache for Intel® Xeon® processor E5 family," JSSC, 2013.
    [71]
    K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy Caches: Simple Techniques for Reducing Leakage Power," ISCA, 2002.
    [72]
    W. Chen, S.-L. Chen, S. Chiu, R. Ganesan, V. Lukka, W. W. Mar, and S. Rusu, "A 22nm 2.5 MB Slice on-die L3 Cache for the Next Generation Xeon® Processor," in VLSI Circuits, 2013.
    [73]
    S. Rusu, H. Muljono, D. Ayers, S. Tam, W. Chen, A. Martin, S. Li, S. Vora, R. Varada, and E. Wang, "5.4 Ivytown: A 22nm 15-core Enterprise Xeon® Processor Family," in ISSCC, 2014.
    [74]
    S. Rusu, H. Muljono, D. Ayers, S. Tam, W. Chen, A. Martin, S. Li, S. Vora, R. Varada, and E. Wang, "A 22 nm 15-core Enterprise Xeon® Processor Family," JSSC, 2014.
    [75]
    E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Rajwan, "Power-management Architecture of the Intel Microarchitecture Code-named Sandy Bridge," IEEE Micro, 2012.
    [76]
    D. Flynn, R. Aitken, A. Gibbons, and K. Shi, Low Power Methodology Manual: for System-on-chip Design. Springer Science & Business Media, 2007.
    [77]
    P. Petrica, A. M. Izraelevitz, D. H. Albonesi, and C. A. Shoemaker, "Flicker: A Dynamically Adaptive Architecture for Power Limited Multicore Systems," in ISCA, 2013.
    [78]
    J. Haj-Yihia, A. Yasin, Y. B. Asher, and A. Mendelson, "Fine-grain Power Breakdown of Modern Out-of-order Cores and its Implications on Skylake-based Systems," TACO, 2016.
    [79]
    M. Cho, S. T. Kim, C. Tokunaga, C. Augustine, J. P. Kulkarni, K. Ravichandran, J. W. Tschanz, M. M. Khellah, and V. De, "Postsilicon Voltage Guard-band Reduction in a 22 nm Graphics Execution Core using Adaptive Voltage Scaling and Dynamic Power Gating," JSSC, 2016.
    [80]
    V. J. Reddi, S. Kanev, W. Kim, S. Campanoni, M. D. Smith, G.-Y. Wei, and D. Brooks, "Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors Via Software-guided Thread scheduling," in MICRO, 2010.
    [81]
    R. Thomas, K. Barber, N. Sedaghati, L. Zhou, and R. Teodorescu, "Core Tunneling: Variation-aware Voltage Noise Mitigation in GPUs," in HPCA, 2016.
    [82]
    M. Shevgoor, J.-S. Kim, N. Chatterjee, R. Balasubramonian, A. Davis, and A. N. Udipi, "Quantifying the Relationship Between the Power Delivery Network and Architectural Policies in a 3D-stacked Memory Device," in MICRO, 2013.
    [83]
    E. Grochowski, D. Ayers, and V. Tiwari, "Microarchitectural Simulation and Control of di/dt-induced Power Supply Voltage Variation," in HPCA, 2002.
    [84]
    M. S. Gupta, K. K. Rangan, M. D. Smith, G.-Y. Wei, and D. Brooks, "DeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors," in HPCA, 2008.
    [85]
    J. Haj-Yihia, Y. B. Asher, E. Rotem, A. Yasin, and R. Ginosar, "Compiler-directed Power Management for Superscalars," TACO, 2015.
    [86]
    J. Leng, Y. Zu, and V. J. Reddi, "GPU Voltage Noise: Characterization and Hierarchical Smoothing of Spatial and Temporal Voltage Noise Interference in GPU Architectures," in HPCA, 2015.
    [87]
    V. J. Reddi, M. S. Gupta, G. Holloway, G.-Y. Wei, M. D. Smith, and D. Brooks, "Voltage Emergency Prediction: Using Signatures to Reduce Operating Margins," in HPCA, 2009.
    [88]
    T. N. Miller, R. Thomas, X. Pan, and R. Teodorescu, "VRSync: Characterizing and Eliminating Synchronization-induced Voltage Emergencies in Many-core Processors," in ISCA, 2012.
    [89]
    S. Nithin, G. Shanmugam, and S. Chandrasekar, "Dynamic Voltage (IR) Drop Analysis and Design Closure: Issues and Challenges," in ISQED, 2010.
    [90]
    K. Radhakrishnan, M. Swaminathan, and B. K. Bhattacharyya, "Power Delivery for High-Performance Microprocessors---Challenges, Solutions, and Future Trends," IEEE Transactions on Components, Packaging and Manufacturing Technology, 2021.
    [91]
    J. Haj-Yahya, M. Alser, J. S. Kim, L. Orosa, E. Rotem, A. Mendelson, A. Chattopadhyay, and O. Mutlu, "FlexWatts: A Power-and Workload-Aware Hybrid Power Delivery Network for Energy-Efficient Microprocessors," in MICRO, 2020.
    [92]
    S. Shekhar, A. K. Jain, and N. Winer, "Power Delivery Impedance Impact of Power Gating Schemes," in SPI, 2016.
    [93]
    R. Jotwani, S. Sundaram, S. Kosonocky, A. Schaefer, V. Andrade, G. Constant, A. Novak, and S. Naffziger, "An x86-64 core implemented in 32nm SOI CMOS," in ISSCC, 2010.
    [94]
    D. DiTomaso, A. Sikder, A. Kodi, and A. Louri, "Machine Learning Enabled Power-aware Network-on-chip Design," in DATE, 2017.
    [95]
    A. Rahman, S. Das, T. Tuan, and S. Trimberger, "Determination of Power Gating Granularity for FPGA Fabric," in CICC, 2006.
    [96]
    B. Zimmer, P.-F. Chiu, B. Nikolić, and K. Asanović, "Reprogrammable Redundancy for SRAM-Based Cache Vmin Reduction in a 28-nm RISC-V Processor," JSSC, 2017.
    [97]
    G. Hyun and T. Kim, "Allocation of State Retention Registers Boosting Practical Applicability to Power Gated Circuits," in ICCAD, 2019.
    [98]
    W. Chen, S.-L. Chen, S. Chiu, R. Ganesan, V. Lukka, W. W. Mar, and S. Rusu, "Presentation of: A 22nm 2.5MB slice on-die L3 cache for the next generation Xeon® Processor," 2013, https://bit.ly/3bYXDJe.
    [99]
    G. G. Shahidi, "Chip Power Scaling in Recent CMOS Technology Nodes," IEEE Access, 2018.
    [100]
    K. Luria, J. Shor, M. Zelikson, and A. Lyakhov, "Dual-Mode Low-Drop-Out Regulator/Power Gate With Linear and On-Off Conduction for Microprocessor Core On-Die Supply Voltages in 14 nm," JSSC, 2016.
    [101]
    M. Huang, Y. Lu, S.-W. Sin, U. Seng-Pan, and R. P. Martins, "A Fully Integrated Digital LDO With Coarse-Fine-Tuning and Burst-Mode Operation," TCAS II, 2016.
    [102]
    K. Jeong, A. B. Kahng, S. Kang, T. S. Rosing, and R. Strong, "MAPG: Memory Access Power Gating," in DATE, 2012.
    [103]
    G. Lakkas, "MOSFET Power Losses and How They Affect Power-supply Efficiency," Analog Applications Journal, 2016.
    [104]
    E. Asyabi, A. Bestavros, E. Sharafzadeh, and T. Zhu, "Peafowl: In-application CPU Scheduling to Reduce Power Consumption of In-memory Key-value Stores," in SoCC, 2020.
    [105]
    A. M. El-Husseini and M. Morrise, "Clocking Design Automation in Intel's Core i7 and Future Designs," in ICCAD, 2011.
    [106]
    G. Shamanna, N. Kurd, J. Douglas, and M. Morrise, "Scalable, sub-1W, sub-10ps Clock Skew, Global Clock Distribution Architecture for Intel® Core™ i7/i5/i3 Microprocessors," in VLSI Circuits, 2010.
    [107]
    A. Gendler, E. Knoll, and Y. Sazeides, "I-DVFS: Instantaneous Frequency Switch During Dynamic Voltage and Frequency Scaling," IEEE Micro, 2021.
    [108]
    D. Peterson and O. Bringmann, "Fully-automated Synthesis of Power Management controllers from UPF," in ASP-DAC, 2019.
    [109]
    S. Roy, N. Ranganathan, and S. Katkoori, "State-retentive Power Gating of Register Files in Multicore Processors Featuring Multithreaded In-order Cores," IEEE TC, 2010.
    [110]
    S. Battle, A. D. Hilton, M. Hempstead, and A. Roth, "Flexible Register Management Using Reference Counting," in HPCA, 2012.
    [111]
    H. Jeon, G. S. Ravi, N. S. Kim, and M. Annavaram, "GPU Register File Virtualization," in MICRO, 2015.
    [112]
    Wikipedia, "Apple M1," online, accessed August 2022, https://en.wikipedia.org/wiki/Apple_M1.
    [113]
    E. Rotem, Y. Mandelblat, V. Basin, E. Weissmann, A. Gihon, R. Chabukswar, R. Fenger, and M. Gupta, "Alder Lake Architecture," in HotChips, 2021.
    [114]
    "Memcached: A Distributed Memory Object Caching System," online, accessed November 2021 https://memcached.org/.
    [115]
    R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani, "Scaling Memcache at Facebook," in NSDI, 2013.
    [116]
    J. Yang, Y. Yue, and K. V. Rashmi, "A large scale analysis of hundreds of in-memory cache clusters at twitter," in OSDI, 2020.
    [117]
    Pinterest, "Pymemcache: A Comprehensive, Fast, Pure-Python Memcached Client." online, accessed June 2022 https://github.com/pinterest/pymemcache.
    [118]
    Y. Gan, M. Liang, S. Dev, D. Lo, and C. Delimitrou, "Sage: Practical and Scalable ML-driven Performance Debugging in Microservices," in ASPLOS, 2021.
    [119]
    K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch, "System-level Implications of Disaggregated Memory," in HPCA, 2012.
    [120]
    K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," in ISCA, 2013.
    [121]
    Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, "Characterizing Facebook's Memcached Workload," IEEE Internet Computing, 2014.
    [122]
    J. Leverich, "Mutilate: High-performance Memcached Load Generator," 2014.
    [123]
    A. Mirhosseini, B. L. West, G. W. Blake, and T. F. Wenisch, "Q-zilla: A Scheduling Framework and Core Microarchitecture for Tail-tolerant Microservices," in HPCA, 2020.
    [124]
    M. Chow, D. Meisner, J. Flinn, D. Peek, and T. F. Wenisch, "The Mystery Machine: End-to-end Performance Analysis of Large-Scale Internet Services," in OSDI, 2014.
    [125]
    R. Nishtala, P. Carpenter, V. Petrucci, and X. Martorell, "Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads," in HPCA, 2017.
    [126]
    G. Prekas, M. Primorac, A. Belay, C. Kozyrakis, and E. Bugnion, "Energy Proportionality and Workload Consolidation for Latency-Critical Applications," in SoCC, 2015.
    [127]
    A. Mirhosseini, B. L. West, G. W. Blake, and T. F. Wenisch, "Express-lane Scheduling and Multithreading to Minimize the Tail Latency of Microservices," in ICAC, 2019.
    [128]
    A. Sriraman and T. F. Wenisch, "μTune: Auto-Tuned Threading for OLDI Microservices," in OSDI, 2018.
    [129]
    C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, R. G. Dreslinski, J. Mars, and L. Tang, "Reining in Long Tails in Warehouse-Scale Computers with Quick Voltage Boosting Using Adrenaline," TOCS, 2017.
    [130]
    C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, J. Mars, L. Tang, and R. G. Dreslinski, "Adrenaline: Pinpointing and Reining in Tail Queries with Quick Voltage Boosting," in HPCA, 2015.
    [131]
    A. Sriraman and T. F. Wenisch, "μSuite: a Benchmark Suite for Microservices," in IISWC, 2018.
    [132]
    J. Li, N. K. Sharma, D. R. K. Ports, and S. D. Gribble, "Tales of the Tail: Hardware, OS, and Application-Level Sources of Tail Latency," in SOCC, 2014.
    [133]
    J. Kreps, N. Narkhede, and J. Rao, "Kafka: A Distributed messaging system for log processing," in NetDB, 2011.
    [134]
    Intel, "Intel Xeon Silver 4114 Processor," online, accessed November 2021 https://intel.ly/3x7rx7N.
    [135]
    B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny, "Workload Analysis of a Large-scale Key-value Store," in SIGMETRICS, 2012.
    [136]
    H. Kasture, D. B. Bartolini, N. Beckmann, and D. Sanchez, "Rubik: Fast Analytical Power Management for Latency-critical Systems," in MICRO, 2015.
    [137]
    S. Kanev, K. Hazelwood, G.-Y. Wei, and D. Brooks, "Tradeoffs Between Power Management and Tail Latency in Warehouse-scale Applications," in IISWC, 2014.
    [138]
    X. Fan, W.-D. Weber, and L. A. Barroso, "Power Provisioning for a Warehouse-sized Computer," ISCA, 2007.
    [139]
    A. Mirhosseini, A. Sriraman, and T. F. Wenisch, "Enhancing Server Efficiency in the Face of Killer Microseconds," in HPCA, 2019.
    [140]
    C. Jin, X. Bai, C. Yang, W. Mao, and X. Xu, "A Review of Power Consumption Models of Servers in Data Centers," Applied Energy, 2020.
    [141]
    T. L. Vasques, P. Moura, and A. de Almeida, "A Review on Energy Efficiency and Demand Response with Focus on Small and Medium Data Centers," Energy Efficiency, 2019.
    [142]
    Intel, "6th Generation Intel® Processor for U/Y-Platforms Datasheet," 2020, https://intel.ly/37rtnU7.
    [143]
    Intel, "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3A, 3B, and 3C," online, accessed July 2019, https://intel.ly/3gVj2Fy.
    [144]
    A. Yasin, N. Rosenzweig, E. Weissmann, and E. Rotem, "Performance Scalability Prediction," Nov. 28 2017, US Patent 9,829,957.
    [145]
    J. Haj-Yihia, A. Yasin, and Y. Ben-Asher, "DOEE: Dynamic Optimization Framework for Better Energy Efficiency," in HiPC, 2015.
    [146]
    A. Yasin, J. Haj-Yahya, Y. Ben-Asher, and A. Mendelson, "A Metric-guided Method for Discovering Impactful Features and Architectural Insights for Skylake-based Processors," TACO, 2019.
    [147]
    K.-D. Lange, "Identifying Shades of Green: The SPECpower Benchmarks," Computer, 2009.
    [148]
    Nginx, "Nginx Official Website," online, accessed November 2021, http://nginx.org.
    [149]
    Cisco, "Performance Tuning Guide for Cisco UCS M5 Servers - White Paper," accessed Nov 2021, https://bit.ly/3nEq4CY.
    [150]
    Dell, "BIOS Performance and Power Tuning Guidelines for Dell PowerEdge 12th Generation Servers," accessed Nov 2021, https://bit.ly/3llqoFh.
    [151]
    Lenovo, "Tuning UEFI Settings for Performance and Energy Efficiency on Intel Xeon Scalable Processor-Based ThinkSystem Servers," accessed Nov 2021, https://lenovopress.com/lp1477.pdf.
    [152]
    E. Rotem, R. Ginosar, A. Mendelson, and U. C. Weiser, "Power and Thermal Constraints of Modern System-on-a-Chip Computer," in THERMINIC, 2013.
    [153]
    E. Rotem, "Intel Architecture, Code Name Skylake Deep Dive: A New Architecture to Manage Power Performance and Energy Efficiency," in IDF, 2015.
    [154]
    M. Jalili, I. Manousakis, Í. Goiri, P. A. Misra, A. Raniwala, H. Alissa, B. Ramakrishnan, P. Tuma, C. Belady, M. Fontoura et al., "Cost-efficient Overclocking in Immersion-cooled Datacenters," in ISCA, 2021.
    [155]
    L. Zhou, L. N. Bhuyan, and K. Ramakrishnan, "Swan: a two-step power management for distributed search engines," in ISLPED, 2020.
    [156]
    A. Raghavan, Y. Luo, A. Chandawalla, M. Papaefthymiou, K. P. Pipe, T. F. Wenisch, and M. M. Martin, "Computational sprinting," in HPCA, 2012.
    [157]
    A. Raghavan, L. Emurian, L. Shao, M. Papaefthymiou, K. P. Pipe, T. F. Wenisch, and M. M. Martin, "Computational Sprinting on a Hardware/Software Testbed," ASPLOS, 2013.
    [158]
    A. Raghavan, L. Emurian, L. Shao, M. Papaefthymiou, K. P. Pipe, T. F. Wenisch, and M. M. Martin, "Utilizing Dark Silicon to Save Energy with Computational Sprinting," IEEE Micro, 2013.
    [159]
    A. Raghavan, Y. Luo, A. Chandawalla, M. Papaefthymiou, K. P. Pipe, T. F. Wenisch, and M. M. Martin, "Designing for Responsiveness with Computational Sprinting," IEEE Micro, 2013.
    [160]
    K.-D. Kang, G. Park, H. Kim, M. Alian, N. S. Kim, and D. Kim, "NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical Workloads," in MICRO, 2021.
    [161]
    D. H. Kim, C. Imes, and H. Hoffmann, "Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics," in ICCPS, 2015.
    [162]
    S. Albers and A. Antoniadis, "Race to Idle: New Algorithms for Speed Scaling with a Sleep State," TALG, 2014.
    [163]
    M. A. Awan and S. M. Petters, "Enhanced Race-to-Halt: A Leakage-aware Energy Management Approach for Dynamic Priority Systems," in ECRTS, 2011.
    [164]
    E. Rotem, R. Ginosar, C. Weiser, and A. Mendelson, "Energy Aware Race to Halt: A Down to EARtH Approach for Platform energy management," CAL, 2012.
    [165]
    J. Doweck, W.-F. Kao, A. K.-y. Lu, J. Mandelblat, A. Rahatekar, L. Rappoport, E. Rotem, A. Yasin, and A. Yoaz, "Inside 6th-generation intel core: New microarchitecture code-named skylake," IEEE Micro, vol. 37, no. 2, pp. 52--62, 2017.
    [166]
    Y. Liu, S. C. Draper, and N. S. Kim, "SleepScale: Runtime Joint Speed Scaling and Sleep States Management for Power Efficient Data Centers," in ISCA, 2014.
    [167]
    F. Yao, J. Wu, S. Subramaniam, and G. Venkataramani, "WASP: Workload Adaptive Energy-latency Optimization in Server Farms Using Server Low-power States," in CLOUD, 2017.
    [168]
    D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch, "Power Management of Online Data-Intensive Services," in ISCA, 2011.
    [169]
    S. Pelley, D. Meisner, T. F. Wenisch, and J. W. VanGilder, "Understanding and Abstracting Total Data Center Power," in WEED, 2009.
    [170]
    R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, and X. Zhu, "No "Power" Struggles: Coordinated Multi-level Power Management for the Data Center," in ASPLOS, 2008.
    [171]
    V. Delaluz, A. Sivasubramaniam, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin, "Scheduler-based DRAM Energy Management," in DAC, 2002.
    [172]
    V. Delaluz, M. Kandemir, N. Vijaykrishnan, A. Sivasubramaniam, and M. J. Irwin, "DRAM Energy Management Using Software and Hardware Directed Power Mode Control," in HPCA, 2001.
    [173]
    B. Diniz, D. Guedes, W. Meira Jr, and R. Bianchini, "Limiting the Power Consumption of Main Memory," in ISCA, 2007.
    [174]
    K. K. Chang, A. G. Yağlıkçı, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, and O. Mutlu, "Understanding Reduced-voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms," SIGMETRICS, 2017.
    [175]
    H. David, C. Fallin, E. Gorbatov, U. R. Hanebutte, and O. Mutlu, "Memory Power Management via Dynamic Voltage/Frequency Scaling," in ICAC, 2011.
    [176]
    Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini, "MemScale: Active Low-power Modes for Main Memory," in ASPLOS, 2011.
    [177]
    J. Chen and L. K. John, "Predictive Coordination of Multiple On-chip Resources for Chip Multiprocessors," in ICS, 2011.
    [178]
    Q. Deng, D. Meisner, A. Bhattacharjee, T. F. Wenisch, and R. Bianchini, "Coscale: Coordinating CPU and Memory System DVFS in Server Systems," in MICRO, 2012.
    [179]
    W. Felter, K. Rajamani, T. Keller, and C. Rusu, "A Performance Conserving Approach for Reducing Peak Power Consumption in Server Systems," in ICS, 2005.
    [180]
    X. Li, R. Gupta, S. V. Adve, and Y. Zhou, "Cross-Component Energy Management: Joint Adaptation of Processor and Memory," TACO, 2007.
    [181]
    H. Zhang and H. Hoffmann, "Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques," in ASPLOS, 2016.
    [182]
    C. Imes, H. Zhang, K. Zhao, and H. Hoffmann, "Handing DVFS to hardware: Using Power Capping to Control Software Performance," Technical Report, 2018.
    [183]
    Q. Deng, D. Meisner, A. Bhattacharjee, T. F. Wenisch, and R. Bianchini, "MultiScale: memory system DVFS with multiple memory controllers," in ISLPED, 2012.
    [184]
    Barroso, Luiz André, Jeffrey Dean, and Urs Holzle. "Web search for a planet: The Google cluster architecture." IEEE Micro 2003.
    [185]
    Jeon, M., He, Y., Elnikety, S., Cox, A. and Rixner, S. "Adaptive parallelism for web search". EuroSys 2013.
    [186]
    D. Wong and M. Annavaram. "Knightshift: Scaling the Energy Proportionality Wall Through Server-level Heterogeneity". In MICRO, 2012.
    [187]
    S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y. Ding, J. He, and C. Xu. "Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis". In SoCC, 2021.
    [188]
    Oracle, "MySQL Workbench", online, accessed August 2022, https://www.mysql.com/products/workbench/.
    [189]
    Github, "Spark-Bench", online, accessed August 2022, https://codait.github.io/spark-bench/
    [190]
    Apache, "Apache Hive", online, accessed August 2022, https://hive.apache.org/
    [191]
    Jiang, Hailin, Malgorzata Marek-Sadowska, and Sani R. Nassif. "Benefits and Costs of Power-gating Technique." ICCD, 2005.
    [192]
    James Lewis and Martin Fowler. "Microservices." online, accessed June 2022 https://martinfowler.com/articles/microservices.html
    [193]
    Twitter. "Decomposing Twitter: Adventures in Service Oriented Architecture." online, accessed June 2022 https://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture
    [194]
    Rob Brigham. "DevOps at Amazon: A Look at Our Tools and Processes." online, accessed June 2022 https://www.slideshare.net/AmazonWebServices/devops-at-amazon-a-look-at-our-tools-and-processes
    [195]
    Jalili, Majid, et al. "Cost-efficient overclocking in immersion-cooled datacenters." ISCA 2021.
    [196]
    Global Petrol Prices, Electricity prices for households, September 2021 https://www.globalpetrolprices.com/electricity_prices/
    [197]
    Schöne, Robert, et al. "Energy efficiency aspects of the AMD Zen 2 architecture." CLUSTER 2021.
    [198]
    AMD EPYC 7313P Energy Consumption Test, https://metebalci.com/blog/epyc-energy-consumption-test/
    [199]
    Tuning UEFI Settings for Performance and Energy Efficiency on AMD Processor-Based ThinkSystem Servers https://lenovopress.lenovo.com/lp1267.pdf
    [200]
    Performance Tuning for Cisco UCS C225 M6 and C245 M6 Rack Servers with 3rd Gen AMD EPYC https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/performance-tuning-wp.html

    Cited By

    View all
    • (2023)Sleep Well: Pragmatic Analysis of the Idle States of Intel ProcessorsProceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies10.1145/3632366.3632385(1-10)Online publication date: 4-Dec-2023

    Index Terms

    1. AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture
            October 2022
            1498 pages
            ISBN:9781665462723

            Sponsors

            Publisher

            IEEE Press

            Publication History

            Published: 18 December 2023

            Check for updates

            Qualifiers

            • Research-article

            Conference

            MICRO '22
            Sponsor:

            Acceptance Rates

            Overall Acceptance Rate 484 of 2,242 submissions, 22%

            Upcoming Conference

            MICRO '24

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)21
            • Downloads (Last 6 weeks)3
            Reflects downloads up to 11 Aug 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2023)Sleep Well: Pragmatic Analysis of the Idle States of Intel ProcessorsProceedings of the IEEE/ACM 10th International Conference on Big Data Computing, Applications and Technologies10.1145/3632366.3632385(1-10)Online publication date: 4-Dec-2023

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media