Heterogeneity-Conscious Parallel Query Execution: Getting a better mileage while driving faster!

T Mühlbauer, W Rödiger, R Seilbeck… - Proceedings of the …, 2014 - dl.acm.org
T Mühlbauer, W Rödiger, R Seilbeck, A Kemper, T Neumann
Proceedings of the Tenth International Workshop on Data Management on New …, 2014dl.acm.org
Physical and thermal restrictions hinder commensurate performance gains from the ever
increasing transistor density. While multi-core scaling helped alleviate dimmed or dark
silicon for some time, future processors will need to become more heterogeneous. To this
end, single instruction set architecture (ISA) heterogeneous processors are a particularly
interesting solution that combines multiple cores with the same ISA but asymmetric
performance and power characteristics. These processors, however, are no free lunch for …
Physical and thermal restrictions hinder commensurate performance gains from the ever increasing transistor density. While multi-core scaling helped alleviate dimmed or dark silicon for some time, future processors will need to become more heterogeneous. To this end, single instruction set architecture (ISA) heterogeneous processors are a particularly interesting solution that combines multiple cores with the same ISA but asymmetric performance and power characteristics. These processors, however, are no free lunch for database systems. Mapping jobs to the core that fits best is notoriously hard for the operating system or a compiler. To achieve optimal performance and energy efficiency, heterogeneity needs to be exposed to the database system.
In this paper, we provide a thorough study of parallelized core database operators and TPC-H query processing on a heterogeneous single-ISA multi-core architecture. Using these insights we design a heterogeneity-conscious job-to-core mapping approach for our high-performance main memory database system HyPer and show that it is indeed possible to get a better mileage while driving faster compared to static and operating-system-controlled mappings. Our approach improves the energy delay product of a TPC-H power run by 31% and up to over 60% for specific TPC-H queries.
ACM Digital Library