Micro services vs hadoop

Microservices vs
Hadoop ecosystem
Marton Elek
2017 february

2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Microservice definition
”An approach to developing a single application as a
 suite of small services, each running in its own process
 and communicating with lightweight mechanisms, often an HTTP resource
API.
 These services are built around business capabilities and independently
deployable by fully automated deployment machinery.”
– https://martinfowler.com/articles/microservices.html

Hadoop cluster
 The definition is almost true for a Hadoop cluster as well

Dockerized Hadoop cluster
 How can we use the tools from microservice architecture in hadoop
ecosystem?
 A possible approach to install cluster (hadoop, spark, kafka, hive) based on
– separated docker containers
– Smart configuration management (using well-known tooling from microservices
architectures)
 Goal: rapid prototyping platform
 Easy switch between
– versions (official HDP, snapshot build, apache build)
– configuration (ha, kerberos, metrics, htrace…)
 Developers/Ops tool
– Easy != easy for any user without knowledge about the tool
 Not goal:
– replace current management plaforms (eg. Ambari)

What are the Microservices (Theory)
Collection of patterns/best practices
 II. Dependencies
– Explicitly declare and isolate dependencies
 III. Config
– Store config in the environment
 VI. Processes
– Execute the app as one or more stateless processes
 VIII. Concurrency
– Scale out via the process model
 XII. Admin processes
– Run admin/management tasks as one-off processes
12 Factory apps (http://12factor.net)

What are the Microservices (Practice)
 Spring started as a
– Dependency injection framework
 Spring Boot ecosystem
– Easy to use starter projects
– Lego bricks for various problems
• JDBC access
• Database access
• REST
• Health check
 Spring Cloud -- elements to build microservices (based on Netflix stack)
– API gateway
– Service registry
– Configuration server
– Distributed tracing
– Client side load balancing
public class TimeStarter {
@Autowired
TimeService timerService;
public Date now() {
long timeService = timerService.now();
}
}

Microservices with Spring Cloud

Monolith application
 Monolith but modular application example
auth service
timer service
upload service
report service
Rest call

Monolith application
 Monolith but modular application example
auth service
timer service
upload service
report service
Rest call
@EnableAutoConfiguration
@RestController
@ComponentScan
public class TimeStarter {
@Autowired
TimeService timerService;
@RequestMapping("/now")
public Date now() {
return timerService.now();
}
public static void main(String[] args) {
SpringApplication.run(TimeStarter.class, args);
}
}

Microservice version
 First problem: how can we find the right backend port form the frontend?
auth service
timer service
upload service
report service
Rest call
Rest call
Rest call
Rest call

Solution: API Gateway
 First problem: how can we find the right backend port form the frontend?
auth service
timer service
upload service
report service
API gateway
Rest call

API Gateway
 Goals: Hide available microservices behind a service facade pattern
– Routing, Authorization
– Deployment handling, Canary testing, Blue/Green deployment
– Logging, SLA, Auditing
 Implementation examples:
– Spring cloud Api Gateway (based on Netflix Zuul)
– Netflix Zuul based implementation
– Twitter Finagle based implementation
– Amazon API gateway
– Simple Nginx reverse proxy configuration
– Traefik, Kong
 Usage in Hadoop ecosystem
– For prototyping: Only if the scheduler/orchestrator starts the service on a random host
– For security: Apache Knox

Service registry
 Problem: how to configure API gateway to automatically route to all the
services
auth service
timer service
upload service
report service
API gateway
Rest call
?

Service registry
 Solution: Use service registry
– Components should be registered to the service registry automatically
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway

Service registry
 Goal: Store the location and state of the available services
– Health check
– DNS interface
 Implementation examples:
– Spring cloud: Eureka
– Netflix eureka based implementation
– Consul.io
– etcd, zookeeper
– Simple workaround: DNS or hosts file
 Usage in Hadoop ecosystem
– Most of the components needs info about the location of nameserver(s) and other
master components

Configuration server
 Problem: how can we configure multiple components
– ”Store config in the environment” (12factor)
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
Config
?
Config
?
Config
?
Config
?

Configuration server
 Problem: how can we configure multiple components
auth service
timer service
upload service
report service
Rest call
Service registry
API gateway
Configuration
Config server

Config server
 Goals: One common place for all of the configuration
– Versioning
– Auditing
– Multiple environment support: Use (almost) the same configuration from DEV to PROD
environment
– Solution for sensitive data
 Solution examples:
– Spring Cloud config service
– Zookeeper
– Most of the service registry have key->value store (Consul, etcd)
– Any persistence datastore (But the versioning is a question)
 For Hadoop ecosystem:
– Most painful point: the same configuration elements (eg. core-site.xml) is needed at
multiple location
– Ambari and other management tools try to solve the problem (but not with the focus of
rapid prototyping)

Config server – configuration management
 Config server structure: [branch]/name-profile.extension
 Merge properties for name=timer and profile(environment)=dev
 URL from the config server
– http://config:8888/timer-dev.properties
• server.port=6767
• aws.secret.key=zzz
• exit.code=-23
 Local file system structure (master branch)
– timer.properties
• server.port=6767
– dev.properties
• aws.secret.key=xxx
– application.properties
• exit.code=-23
Config server

Summary
 Tools used in microservice architecture
 Key components:
– Config server
– Service registry
– API gateway
 Configuration server
– Versioning
– One common place to distribute configuration
– Configuration preprocessing!!!
• transformation
• the content of the configuration should be defined, it could be format
independent
• But the final configuration should be visible

Docker based Hadoop cluster

 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
apache-hadoop-X.X.tar.gz
1. Configuration server
2. Service registry
3. API gatway
Microservice architecture elements
How to do it with Hadoop?

 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
2. Service registry
3. API gatway
4. +1 Packaging
Microservice architecture elements
Do it with Hadoop

Packaging: Docker
 Packaging: Docker
– Docker Engine:
• a portable,
• lightweight runtime and
• packaging tool
– Docker Hub,
• a cloud service for sharing applications
– Docker Compose:
• Predefined recipes (environment variables, network, …)
 My docker containers: http://hub.docker.com/elek/

Docker decisions
 One application per container
– More flexible
– More simple (configuration preprocess + start)
– One deployable unit
 Microservice-like: prefer more similar units against smaller but bigger one
 Using host network for clusters
10.8.0.5
172.13.0.1
172.13.0.5
172.13.0.2
10.8.0.6
172.13.0.3
172.13.0.4
172.13.0.9
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.6
10.8.0.6
10.8.0.6
10.8.0.6
Host networkBridge network

Repositories
 elek/bigdata-docker:
– example configuration
– docker-compose files
– ansible scripts
– getting started
entrypoint
 elek/docker-bigdata-base (base image for all the containers)
– Contains all the configuration loading (and some documentation)
– Use CONFIG_TYPE environment variable to select configuration method
• CONFIG_TYPE=simple (configuration from environment variables – for local env)
• CONFIG_TYPE=consul (configuration from consul – for distributed environment)
 elek/docker-…. (hadoop/spark/hive/...)
– Docker images for the components

Local demo
 Local run, using host network
– More configuration is needed
– Auto scaling is supported
– https://github.com/elek/bigdata-docker/tree/master/compose
bridge network
172.13.0.1
172.13.0.5
172.13.0.2

 bin
– hdfs
– yarn
– mapred
 etc/hadoop
– core-site.xml
– mapred-site.xml
– hdfs-site.xml
 include
 lib
 libexec
 sbin
 share
1. Packaging
3. Service registry
4. API gateway
Components
Do it with Hadoop

Service registry/configuration server
 Service registry
– Health check support
– DNS support
 Key-value store
– Binary data is supported
 Based on agents and servers
 Easy to use REST API
 RAFT based consensus protocol

Service registry/configuration server
 Git2Consul
– Mirror git repositories to
consul
 Consul template
– Advanced Template engine
– Renders a template
(configuration file) based on
the information from the
consul
– Run/restart a process on
change
 Registrator
– Listen on docker event
stream
– Register new components to
consul
hdfs-namenode
Consul
Configuration (git)
datanode
datanode
datanode
hdfs-datanode
consul-template
git2consul
Registrator
docker event
stream

Weave scope
 Agents to monitor
– network connections between components
– cpu
– memory
 Supports Docker, Swarm, Weave network, …
 Easy install
 Transparent
 Pluggable
 Only problems:
– Temporary docker containers

Distributed demo
 Distributed run with host network
– https://github.com/elek/bigdata-docker/tree/master/consul
– Configuration is hosted in a consul instance
– Dynamic update
10.8.0.5
10.8.0.5
10.8.0.5
10.8.0.5

TODO
 More profiles and configuration set
– Ready to use kerberos/HA environments
– On the fly keytab/keystore generation (?)
 Scripting/tool improvement
– Autorestart in case of service registration change
 Configuration for more orcherstration/scheduling
– Nomad?
– Docker Swarm?
 Easy image creation for specific builds
 Improve docker images
– Predefined volume/port definition
– Consolidate default values

Thank You

Micro services vs hadoop

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Micro services vs hadoop

Similar to Micro services vs hadoop (20)

Recently uploaded

Recently uploaded (20)

Micro services vs hadoop