Velox: meta's unified execution engine
Proceedings of the VLDB Endowment, 2022•dl.acm.org
The ad-hoc development of new specialized computation engines targeted to very specific
data workloads has created a siloed data landscape. Commonly, these engines share little
to nothing with each other and are hard to maintain, evolve, and optimize, and ultimately
provide an inconsistent experience to data users. In order to address these issues, Meta has
created Velox, a novel open source C++ database acceleration library. Velox provides
reusable, extensible, high-performance, and dialect-agnostic data processing components …
data workloads has created a siloed data landscape. Commonly, these engines share little
to nothing with each other and are hard to maintain, evolve, and optimize, and ultimately
provide an inconsistent experience to data users. In order to address these issues, Meta has
created Velox, a novel open source C++ database acceleration library. Velox provides
reusable, extensible, high-performance, and dialect-agnostic data processing components …
The ad-hoc development of new specialized computation engines targeted to very specific data workloads has created a siloed data landscape. Commonly, these engines share little to nothing with each other and are hard to maintain, evolve, and optimize, and ultimately provide an inconsistent experience to data users. In order to address these issues, Meta has created Velox, a novel open source C++ database acceleration library. Velox provides reusable, extensible, high-performance, and dialect-agnostic data processing components for building execution engines, and enhancing data management systems. The library heavily relies on vectorization and adaptivity, and is designed from the ground up to support efficient computation over complex data types due to their ubiquity in modern workloads. Velox is currently integrated or being integrated with more than a dozen data systems at Meta, including analytical query engines such as Presto and Spark, stream processing platforms, message buses and data warehouse ingestion infrastructure, machine learning systems for feature engineering and data preprocessing (PyTorch), and more. It provides benefits in terms of (a) efficiency wins by democratizing optimizations previously only found in individual engines, (b) increased consistency for data users, and (c) engineering efficiency by promoting reusability.
ACM Digital Library