Many people promise fast data as the next step after big data. The idea of creating a complete end-to-end data pipeline that combines Spark, Akka, Cassandra, Kafka, and Apache Mesos came up two years ago, sometimes called the SMACK stack. The SMACK stack is an ideal environment for handling all sorts of data-processing needs which can be nightly batch-processing tasks, real-time ingestion of sensor data or business intelligence questions. The SMACK stack includes a lot of components which have to be deployed somewhere. Let’s see how we can create a distributed environment in the cloud with Terraform and how we can provision a Mesos-Cluster with Mesosphere Datacenter Operating System (DC/OS) to create a powerful fast data platform.
11. SWISS ARMY KNIFE FOR DATA PROCESSING
ETL Jobs
μ-Batching on Streams
SQL and Joins on non-RDBMS
Graph Operations on non-Graphs
Super Fast Map/Reduce
12. 4 . 2
4 . 3
How does it fit to a λ-Architectures?
Spark operations can be run unaltered in either batch or
stream mode
Serving layer uses a Resilient Distributed Dataset (RDD)
Speed layer can uses DStream
13. Mesos
DISTRIBUTED KERNEL FOR THE CLOUD
Links machines to one logical instance
Static deployment of Mesos
Dynamic deployment of the workload
Good integration with Hadoop, Kafka, Spark, and Akka
14. 4 . 4
FRAMEWORK FOR REACTIVE APPLICATIONS
Highly performant - 50 million messages per machine in
a second
Simple concurrency via asynchronous processing
Elastic, resilient and without single point of failure
Used for applications that can process or query data
15. 4 . 5
PERFORMANT AND ALWAYS-UP NOSQL DATABASE
Linear scaling - approx. 10'000 requests per machine
and second
No downtime
Comfort of a column index with append-only
performance Data-Safety over multiple data-centers
Strong in denormalized models
16. 4 . 6
Kafka
MESSAGING SYSTEM FOR BIG DATA APPLICATIONS
Fast - delivers hundreds of MegaBytes per second to
1000s of clients
Scales - partitions data to manageable volumes
Managing backpressure
Distributed - from the ground up
30. 5 . 6
Command Line Interface
$ dcos
Command line utility for the Mesosphere Datacenter Operating
System (DC/OS). The Mesosphere DC/OS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DC/OS installation.
Available DC/OS commands:
config Get and set DC/OS CLI configuration properties
help Display command line usage information
marathon Deploy and manage applications on the DC/OS
node Manage DC/OS nodes
package Install and manage DC/OS packages
service Manage DC/OS services
task Manage DC/OS tasks
Get detailed command description with 'dcos <command> --help'.
33. 5 . 9
Service Discovery
DNS-based Proxy-based Application-aware
easy to integrate no port conflicts developer fully in control
and full-feature
SRV records fast failover implementation effort
no health checks no UDP requires distributed
state management (ZK,
etcd or Consul)
TTL management of VIPs
(Minuteman) or service
ports (Marathon-lb)
34. 5 . 10
A Records
An A record associates a hostname to an IP address
bz@cc ~/$ dig app.marathon.mesos
; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> app.marathon.mesos
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9336
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;app.marathon.mesos. IN A
;; ANSWER SECTION:
app.marathon.mesos. 60 IN A 10.0.3.201
app.marathon.mesos. 60 IN A 10.0.3.199
;; Query time: 2 msec
;; SERVER: 10.0.5.98#53(10.0.5.98)
35. 5 . 11
SRV Records
A SRV record associates a service name to a hostname
and an IP port
bz@cc ~/$ dig _app._tcp.marathon.mesos SRV
; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> _app._tcp.marathon.mesos SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31708
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; QUESTION SECTION:
;_app._tcp.marathon.mesos. IN SRV
;; ANSWER SECTION:
_app._tcp.marathon.mesos. 60 IN SRV 0 0 10148 app-qtugm-s5.marathon.
_app._tcp.marathon.mesos. 60 IN SRV 0 0 13289 app-t49o6-s2.marathon.
;; ADDITIONAL SECTION:
36. 5 . 12
DNS Pattern
Service CT-IP
Avail
DI
Avail
Target Host Target
Port
A (Target
Resolution)
{task}.{proto}.framework.domain no no {task}.framework.slave.domain host-
port
slave-ip
yes no {task}.framework.slave.domain host-
port
slave-ip
no yes {task}.framework.domain di-
port
slave-ip
yes yes {task}.framework.domain di-
port
container-
ip
{task}.{proto}.framework.slave.domain n/a n/a {task}.framework.slave.domain host-
port
slave-ip