Abstract
In an effort to keep up with increasing data sizes, applications in the datacenter are scaled out to large number of machines. Even though this allows them to tackle complex problems, data movement bottlenecks of various types appear, limiting overall performance and scalability. Pushing computation closer to the data reduces bottlenecks, and in this work we explore this idea in the context of distributed key-value stores.
Efficient compute is important in the datacenter, especially at scale. Driven by the stagnation of single-threaded CPU performance, specialized hardware and hybrid architectures are emerging and could hold the answer for more efficient compute and data management. In this work we use Field Programmable Gate Arrays (FPGA) to break traditional trade-offs and limitations, and to explore design scenarios that were previously infeasible.
This dissertation focuses on distributed storage, a building block in scale out applications. It explores how such a service can benefit from specialized hardware nodes. We focus in particular on providing complex near-data computation with the goal of reducing the data movement bottleneck between application layers. Furthermore, this work addresses a shortcoming in the design of most distributed storage nodes, namely the mismatch between computational power and network/storage media bandwidth.
The mismatch is present because, if regular server machines are used, there is plenty of processing power to implement various filtering and processing operations, but the overall architecture is over-provisioned compared to the network. In contrast, if specialized hardware nodes are used (e.g. network-attached flash) the internal and external bandwidths are better matched, but these nodes will not be able to carry out complex processing near the data without slowing data access down. Our solution, Caribou, proposes a balanced design point: small-footprint hardware nodes that, even though offer high throughput and low latency, are also flexible to adapt to different workloads and processing types without being over-provisioned.
The work presented in this dissertation is not a one-off effort: it provides an extensible and i modular architecture for storage nodes that can be used as a platform for implementing near-data processing ideas for various application domains. The lessons are be applicable for different storage media or networking technologies as well.