Building a fully-automated Fast Data Platform

1
Building a fully-automated Fast Data
Platform
Bernd Zuther, codecentric AG

2 . 1
Outline
Fast Data
SMACK
DC/OS
Extend DC/OS cluster

3 . 1
In the beginning of Big Data there
was
HADOOP

3 . 2
BATCH
SEEMS TO BE GOOD
map and reduce was everywhere

3 . 3
But now Business does not wait.
It always demands more...
EVER FASTER

3 . 4
Updating machine learning models as new information
arrives
Detecting anomalies, faults, performance problems, etc.
and taking timely action
Aggregating and processing data on arrival for
downstream storage and analytics

3 . 5
λ-Architecture
Batch Layer
Speed Layer
Serving Layer
Master
Dataset
batch
view
batch
view
...
realtime
view
realtime
view
Query
Query
New
Data

3 . 6
Fast Data
Fast Data covers a range of new
systems and approaches, which
balance various tradeoffs to deliver
timely, cost-efﬁcient data processing,
as well as higher developer
productivity.

3 . 7
Requirements for a
Fast Data Architecture
Reliable data ingestion
Flexible storage and query options
Sophisticated analytics tools

4 . 1
SMACK
S park
M esos
A kka
C assandra
K afka

SWISS ARMY KNIFE FOR DATA PROCESSING
ETL Jobs
μ-Batching on Streams
SQL and Joins on non-RDBMS
Graph Operations on non-Graphs
Super Fast Map/Reduce

4 . 2
4 . 3
How does it ﬁt to a λ-Architectures?
Spark operations can be run unaltered in either batch or
stream mode
Serving layer uses a Resilient Distributed Dataset (RDD)
Speed layer can uses DStream

Mesos
DISTRIBUTED KERNEL FOR THE CLOUD
Links machines to one logical instance
Static deployment of Mesos
Dynamic deployment of the workload
Good integration with Hadoop, Kafka, Spark, and Akka

4 . 4
FRAMEWORK FOR REACTIVE APPLICATIONS
Highly performant - 50 million messages per machine in
a second
Simple concurrency via asynchronous processing
Elastic, resilient and without single point of failure
Used for applications that can process or query data

4 . 5
PERFORMANT AND ALWAYS-UP NOSQL DATABASE
Linear scaling - approx. 10'000 requests per machine
and second
No downtime
Comfort of a column index with append-only
performance Data-Safety over multiple data-centers
Strong in denormalized models

4 . 6
Kafka
MESSAGING SYSTEM FOR BIG DATA APPLICATIONS
Fast - delivers hundreds of MegaBytes per second to
1000s of clients
Scales - partitions data to manageable volumes
Managing backpressure
Distributed - from the ground up

4 . 7
4 . 8
Big Ball of Mud
Source 1
Source 2
Log/Files
Source
Akka Ingest 1
Akka Ingest 2
Spark Ingest 1

4 . 9
Kafka as a Multiplexer-Demultiplexer

4 . 12
Beneﬁts and downsides of Zeppelin
No Jar-Wars
Easy analytics
New technology

5 . 2
DC/OS Architecture
DCOS Master (1..3)
Zookeeper
Mesos Master
Process
Mesos DNS
Marathon
Admin Router
DCOS Private Agent (0..n)
Mesos Agent Process
Mesos Containerizer
Docker Containerizer
DCOS Public Agent (0..n)
Mesos Agent Process
Mesos Containerizer
Docker Containerizer
Public Internet
User

5 . 3
DC/OS Network Security
Admin
Public Internet
Secure by port
number or IP address
Master Nodes
Public
Public Agents
Private
Private Agents

5 . 6
Command Line Interface
$ dcos
Command line utility for the Mesosphere Datacenter Operating
System (DC/OS). The Mesosphere DC/OS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DC/OS installation.
Available DC/OS commands:
config Get and set DC/OS CLI configuration properties
help Display command line usage information
marathon Deploy and manage applications on the DC/OS
node Manage DC/OS nodes
package Install and manage DC/OS packages
service Manage DC/OS services
task Manage DC/OS tasks
Get detailed command description with 'dcos <command> --help'.

5 . 7
SMACK Installation - Databases/Tools
dcos package install --yes cassandra
dcos package install --yes kafka
dcos package install --yes spark
dcos kafka topic add METRO-Vehicles

5 . 8
SMACK Installation - Custom Application
cat > /opt/smack/conf/bus-demo-ingest.json << EOF
{
"id": "/ingest",
"container": {
"type": "DOCKER",
"volumes": [],
"docker": {
"image": "codecentric/bus-demo-ingest",
"network": "HOST",
"privileged": false,
"parameters": [],
"forcePullImage": true
}
},
"env": {
"CASSANDRA_HOST": "$CASSANDRA_HOST",
"CASSANDRA_PORT": "$CASSANDRA_PORT",

5 . 9
Service Discovery
DNS-based Proxy-based Application-aware
easy to integrate no port conﬂicts developer fully in control
and full-feature
SRV records fast failover implementation effort
no health checks no UDP requires distributed
state management (ZK,
etcd or Consul)
TTL management of VIPs
(Minuteman) or service
ports (Marathon-lb)

5 . 10
A Records
An A record associates a hostname to an IP address
bz@cc ~/$ dig app.marathon.mesos
; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> app.marathon.mesos
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9336
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;app.marathon.mesos. IN A
;; ANSWER SECTION:
app.marathon.mesos. 60 IN A 10.0.3.201
app.marathon.mesos. 60 IN A 10.0.3.199
;; Query time: 2 msec
;; SERVER: 10.0.5.98#53(10.0.5.98)

5 . 11
SRV Records
A SRV record associates a service name to a hostname
and an IP port
bz@cc ~/$ dig _app._tcp.marathon.mesos SRV
; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> _app._tcp.marathon.mesos SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31708
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; QUESTION SECTION:
;_app._tcp.marathon.mesos. IN SRV
;; ANSWER SECTION:
_app._tcp.marathon.mesos. 60 IN SRV 0 0 10148 app-qtugm-s5.marathon.
_app._tcp.marathon.mesos. 60 IN SRV 0 0 13289 app-t49o6-s2.marathon.
;; ADDITIONAL SECTION:

5 . 12
DNS Pattern
Service CT-IP
Avail
DI
Avail
Target Host Target
Port
A (Target
Resolution)
{task}.{proto}.framework.domain no no {task}.framework.slave.domain host-
port
slave-ip
yes no {task}.framework.slave.domain host-
port
slave-ip
no yes {task}.framework.domain di-
port
slave-ip
yes yes {task}.framework.domain di-
port
container-
ip
{task}.{proto}.framework.slave.domain n/a n/a {task}.framework.slave.domain host-
port
slave-ip

Beneﬁts and downsides of DC/OS
Layer that abstract hardware
Applicaton run in a sandbox with and without Docker
Buildin service discovery
Effort to training the technology
Monitoring gets a bigger rule

5 . 13
6 . 1
Extend our DC/OS cluster

6 . 2
Add new Network Security Zone
Master
Public Internet
Master Nodes
Public
Public Agents
Private
Private Agents
Admin
VPN Cli

6 . 3
Add ELK
Filebeat
Agent Nodes
Filebeat
Agent Nodes
Filebeat
Agent Nodes
Logstash Elasticsearch Kibana

6 . 4
Download Filebeat
- "content": |
[Unit]
Description=ELK: Download Filebeat
After=network-online.target
Wants=network-online.target
ConditionPathExists=!/opt/filebeat/filebeat
[Service]
Type=oneshot
StandardOutput=journal+console
StandardError=journal+console
ExecStartPre=/usr/bin/curl --fail --retry 20 --continue-at - --location
ExecStartPre=/usr/bin/mkdir -p /opt/filebeat /tmp/filebeat /etc/filebea
ExecStartPre=/usr/bin/tar -axf /tmp/filebeat.tar.xz -C /tmp/filebeat --
ExecStart=-/bin/mv /tmp/filebeat/filebeat /opt/filebeat/filebeat
ExecStartPost=-/usr/bin/rm -rf /tmp/filebeat.tar.xz /tmp/filebeat
"name": |-
filebeat-download.service

6 . 5
Start Filebeat
- "command": |-
start
"content": |
[Unit]
Description=ELK: Filebeat collectes log file and send them to logstash
Requires=filebeat-download.service
After=filebeat-download.service
[Service]
Type=simple
StandardOutput=journal+console
StandardError=journal+console
ExecStart=/opt/filebeat/filebeat -e -c /etc/filebeat/filebeat.yml -
"enable": !!bool |-
true
"name": |-
filebeat.service

6 . 6
Working with Cloudformation
Easy integration in a build pipeline
Hard to maintain
Hard to extend
Not Cloud-agnostic (only support AWS)

6 . 7
Terraform
BUILD, COMBINE, AND LAUNCH INFRASTRUCTURE
Infrastructure as code
Combine Multiple Providers (AWS, Azure, etc.)
Evolve your Infrastructure

6 . 8
Terraform
resource "aws_launch_configuration" "public_slave" {
security_groups = ["${aws_security_group.public_slave.id}"]
image_id = "${lookup(var.coreos_amis, var.aws_region)}"
instance_type = "${var.public_slave_instance_type}"
key_name = "${aws_key_pair.dcos.key_name}"
user_data = "${template_file.public_slave_user_data.rendered}"
associate_public_ip_address = true
lifecycle {
create_before_destroy = false
}
}

6 . 9
Beneﬁts of Terraform
Easy integration in a build pipeline
Easier to maintain
Easier to extend
Cloud-agnostic (AWS, Azure, etc.)
Need some time until new resources are adopted

6 . 10
Create infrastructure with Jenkins

6 . 11
Terraform - DC/OS Source & Real World
Example
https://github.com/ANierbeck/BusFloatingData
https://github.com/zutherb/terraform-dcos/

7 . 1
Summary
SMACK helps you to build a near realtime Fast Data
platform
Kafka & Akka can be used for reliable data ingestion
Cassandra provides a ﬂexible storage and query options
Mesos enables fault-tolerant and elastic distributed
systems
Zeppelin is a sophisticated analytics tool
Terraform makes it easy to integrate our infrastructure
with a build pipeline

7 . 2
Lessons Learned
Cassandra is good for known problems
When dealing with unknown problems it is better to
store raw data with Apache Parquet
Automate everything
Bleeding edge sometimes sucks (Zeppelin, S3a, Spark,
etc.)

7 . 3
Is your infrastructure a pet

7 . 4
Treat your infrastructure like cattle

7 . 5
If you want your infrastructure like cattle
KEEP CALM
AND
AUTOMATE EVERYTHING

7 . 6
Feedback
@Bernd_Z
http://github.com/zutherb
http://zutherb.github.io/Building-a-full-automated-Fast-
Data-Platform/slides/

7 . 7
The End

Copyright 2016

Building a fully-automated Fast Data Platform

More Related Content

Building a fully-automated Fast Data Platform