An integrated GPU power and performance model

S Hong, H Kim - Proceedings of the 37th annual international …, 2010 - dl.acm.org
Proceedings of the 37th annual international symposium on Computer architecture, 2010dl.acm.org
GPU architectures are increasingly important in the multi-core era due to their high number
of parallel processors. Performance optimization for multi-core processors has been a
challenge for programmers. Furthermore, optimizing for power consumption is even more
difficult. Unfortunately, as a result of the high number of processors, the power consumption
of many-core processors such as GPUs has increased significantly. Hence, in this paper, we
propose an integrated power and performance (IPP) prediction model for a GPU architecture …
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly.
Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement.
We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature.
With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.
ACM Digital Library