[Report a bug]                
Spark day 2017 - Spark on Kubernetes
• Senior Software Engineer of SK Telecom
• Commercial Products
• Big Data Discovery Solution (~’17)
• Hadoop DW (~’15)
• PaaS(CloudFoundry) (~’13)
• Iaas (OpenStack) (~’13)
• Mail to : jerryjung@apache.org
Spark deployment using Kubernetes
Spark on Kubernetes
Open Source
Framework for
managing, and

Kubernetes provides a common API and
self-healing framework which
automatically handles machine failures
and application deployments, logging,
and monitoring.

Clusters - set of compute, storage, network
Pods - colocated group of application containers
that share volumes and a networking stack
Replication Controllers - ensure a specific number
of pods, manage pods, status updates
Services - cluster wide service discovery
Node #1
Pod #1
Node #5
Pod #2
Pod #8
8080 8080 8080
Support for Event Stream Processing
Fast Data Queries in Real Time
Improved Programmer Productivity
Fast Batch Processing of Large Data Set

Driver Process that contains the SparkContext
Executor Process that executes one or more Spark tasks
Master Process that manages applications across the cluster
Worker Process that manages executors on a particular node
Driver Program
Cluster Manager
Worker Node
Worker Node
Worker Node
cluster mode client mode

$(statefulset name)-$(ordinal)
Node #1 …. #n
spark submit

Node Manager # 1…N
shuffle plugin


External Shuffle
Long-Running ETL jobs
Interactive application or Server
Any application with large shuffles
1. shuffle plugin add jar
2. yarn-site.xml add plugin
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 50
spark.dynamicAllocation.maxExecutors 100
spark.dynamicAllocation.initialExecutors 50
spark.dynamicAllocation.schedulerBacklogTimeout 5s
spark.dynamicAllocation.executorIdleTimeout 60
cf) Mesos - Coarse-Grained Mode
3. edit spark-default.conf

2 3
--deploy-mode cluster 
--class org.apache.spark.examples.SparkPi 
--master k8s://https://{k8s address}
--kubernetes-namespace default 
--conf spark.executor.instances=5 
--conf spark.app.name=spark-pi 
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 

apiVersion: extensions/v1beta1
kind: DaemonSet
app: spark-shuffle-service
spark-version: 2.1.0
name: shuffle
app: spark-shuffle-service
spark-version: 2.1.0
- name: temp-volume
path: '/var/tmp' # change this path according to your cluster configuration.
- name: shuffle
image: kubespark/spark-shuffle:v2.1.0-kubernetes-0.2.0
--deploy-mode cluster 
--class org.apache.spark.examples.GroupByTest 
--master k8s://https://{k8s address} 
--kubernetes-namespace default 
--conf spark.app.name=group-by-test 
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 
--conf spark.dynamicAllocation.enabled=true 
--conf spark.shuffle.service.enabled=true 
--conf spark.kubernetes.shuffle.namespace=default 
--conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0" 
local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.2.0-SNAPSHOT.jar 10 40000 2
apiVersion: extensions/v1beta1
kind: Deployment
name: spark-resource-staging-server
replicas: 1
apiVersion: v1
kind: Service
name: spark-resource-staging-service
type: NodePort
resource-staging-server-instance: default
- protocol: TCP
port: 10000
targetPort: 10000
nodePort: 31000

--deploy-mode cluster 
--class org.apache.spark.examples.SparkPi 
--master k8s://{k8s address} 
--kubernetes-namespace default 
--conf spark.executor.instances=5 
--conf spark.app.name=spark-pi 
--conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.initcontainer.docker.image=kubespark/spark-init:v2.1.0-kubernetes-0.2.0 
--conf spark.kubernetes.resourceStagingServer.uri=http://{node ip}:31000 

 *big data



betterCode Workshop: Effizientes DevOps-Tooling mit Go
betterCode Workshop:  Effizientes DevOps-Tooling mit GobetterCode Workshop:  Effizientes DevOps-Tooling mit Go
betterCode Workshop: Effizientes DevOps-Tooling mit Go

bettterCode, 24.06.2021, Online: Workshop of Mario-Leander Reimer (@LeanderReimer, Principal Software Architect at QAware) & Markus Zimmermann (@markus_zm, Senior Software Engineer at QAware) == Please download slides in case they are blurred! === Use the right tool and language for the job! Especially in the DevOps tooling area, Go has established itself as a simple, reliable and efficient programming language. In this workshop, we learned about suitable application areas and implementing quite a few tools.

godevops toolingcloud native
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes

MongoDB Ops Manager is an enterprise-grade end-to-end database management, monitoring, and backup solution. Kubernetes has clearly won the orchestration-platform "wars". In this session we'll take a deep dive on how you can leverage both these technologies to host your MongoDB deployments within your Kubernetes infrastructure whether that's OpenShift, PKS, Azure AKS, or just upstream. This talk will review the core technologies, such as containers, Kubernetes, and MongoDB Ops Manager. You'll also have a chance to see real-live demos of MongoDB running on Kubernetes and managed with MongoDB Ops Manager with the MongoDB Enterprise Kubernetes Operator.

Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes services expose these units to enable dynamic load balancing while maintaining session affinity. It also provides self-healing capabilities by restarting containers that fail, replacing them, and killing containers that don't respond to their health check.

artificial intelligencedaytaaidocker

Spark day 2017 - Spark on Kubernetes