Lambda Architecture

Lambda Architecture

Lambda Architecture
Lambda Architecture is a useful framework to think about designing big data applications. Nathan
Marz designed this generic architecture addressing common requirements for big data based on his
experience working on distributed data processing systems at Twitter.
key requirements in building this architecture
- Fault-tolerance against hardware failures and human errors
- Support for a variety of use cases that include low latency querying as well as updates
- Linear scale-out capabilities, meaning that throwing more machines at the problem should
help with getting the job done
- Extensibility so that the system is manageable and can accommodate newer features easily

three major components
1. Batch layer that provides the following functionality
1. managing the master dataset, an immutable, append-only set of raw data
2. pre-computing arbitrary query functions, called batch views.
2. Serving layer—This layer indexes the batch views so that they can be queried in ad hoc
with low latency.

3. Speed layer—This layer accommodates all requests that are subject to low latency
requirements. Using fast and incremental algorithms, the speed layer deals with recent
data only.
Each of these layers can be realized using various big data technologies. For instance, the batch
layer datasets can be in a distributed filesystem, while MapReduce can be used to create batch
views that can be fed to the serving layer. The serving layer can be implemented using NoSQL
technologies such as HBase, while querying can be implemented by technologies such as Apache
Drill or Impala. Finally, the speed layer can be realized with data streaming technologies such as
Apache Storm or Spark Streaming.

1. All data entering the system is dispatched to both the batch layer and the speed layer for
processing.
2. The batch layer has two functions: (i) managing the master dataset (an immutable,
append-only set of raw data), and (ii) to pre-compute the batch views.
3. The serving layer indexes the batch views so that they can be queried in low-latency,
ad-hoc way.
4. The speed layer compensates for the high latency of updates to the serving layer and deals
with recent data only.

5. Any incoming query can be answered by merging results from batch views and real-time
views.

Lambda Architecture

More Related Content

Lambda Architecture