Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved.
Docker on Hadoop
Scott Shaw
Sr. Solutions Engineer
February 6, 2019
© Cloudera, Inc. All rights reserved. 2
What is Docker?
1. Docker is synonymous with containers
2. A container is a standard unit of software
3. Docker provides for a container standardization
4. Docker allows for an image repository
© Cloudera, Inc. All rights reserved. 3
Benefits of Containerization
• Flexible: Even the most complex applications can be containerized
• Lightweight: Containers leverage and share the host kernel
• Interchangeable: You can deploy updates and upgrades on the fly
• Portable: You can build locally, deploy to the cloud, and run anywhere
• Scalable: You can increase and automatically distribute container replicas
• Stackable: You can stack services vertically and on the fly
Souce: https://docs.docker.com/get-started/
© Cloudera, Inc. All rights reserved. 4
How are Containers Different from Virtual Machines?
© Cloudera, Inc. All rights reserved. 5
Docker Stack
© Cloudera, Inc. All rights reserved. 6
Docker Swarm
© Cloudera, Inc. All rights reserved. 7
Docker on Hadoop - Questions
1. Security
2. YARN or Kubernetes? Or both?
3. Ease of use
4. Cloud and on-prem
5. CDF?
6. Persistent Storage
© Cloudera, Inc. All rights reserved. 8
Kubernetes
• Automates the distribution and scheduling of application containers across a
cluster
• Master controls scheduling, maintaining, and upgrading application containers
• Kubelet: Agent for managing node communication with Master
• Self-Healing: Can withstand node failures
• Auto-scaling
• Pods: Group of containers with shared storage and network. It is the smallest
unit of containers managed by Kubernetes
Souce: https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-intro/
© Cloudera, Inc. All rights reserved. 9
Kubernetes and YARN
• Kubernetes MasterYARN Resource
Manager
• YARN Application
MasterContainer
• KubeletNodeManager
• Container Orchestrator vs. Capacity
Scheduler
© Cloudera, Inc. All rights reserved. 10
Kubernetes and YARN
1. Kubernetes Scheduler is notified of all pod creations
and deletions.
2. Information is passed to YARN
3. YARN controls all cluster resources being used by
workloads
4. YARN determines where to run the container
5. YARN notifies Kubernetes where the pod should run
Source: https://hortonworks.com/blog/docker-kubernetes-apache-hadoop-yarn/
© Cloudera, Inc. All rights reserved. 11
Which Distro Components Currently Use Docker?
1. Cloudbreak
2. Data Science Workbench
3. Data Plane Services
The future is to provide all distro components as containerized services
for both public and private clouds
© Cloudera, Inc. All rights reserved. 12
Open Hybrid Architecture Initiative
1. Apache Ozone
2. Kubernetes + YARN + Docker
3. Distro as a Service
4. Workloads as a Service
5. Any Cloud Platform
6. Shared Data Experience
7. Container Storage Interface (CSI)
© Cloudera, Inc. All rights reserved. 13
Container Storage Interface
Source: https://hortonworks.com/blog/open-hybrid-architecture-running-stateful-containers-on-yarn/
© Cloudera, Inc. All rights reserved. 14
Project Cumulus
© Cloudera, Inc. All rights reserved. 15
Project Cumulus
© Cloudera, Inc. All rights reserved. 16
Project Cumulus
© Cloudera, Inc. All rights reserved.
THANK YOU

More Related Content

Cloudera - Docker on hadoop

  • 1. © Cloudera, Inc. All rights reserved. Docker on Hadoop Scott Shaw Sr. Solutions Engineer February 6, 2019
  • 2. © Cloudera, Inc. All rights reserved. 2 What is Docker? 1. Docker is synonymous with containers 2. A container is a standard unit of software 3. Docker provides for a container standardization 4. Docker allows for an image repository
  • 3. © Cloudera, Inc. All rights reserved. 3 Benefits of Containerization • Flexible: Even the most complex applications can be containerized • Lightweight: Containers leverage and share the host kernel • Interchangeable: You can deploy updates and upgrades on the fly • Portable: You can build locally, deploy to the cloud, and run anywhere • Scalable: You can increase and automatically distribute container replicas • Stackable: You can stack services vertically and on the fly Souce: https://docs.docker.com/get-started/
  • 4. © Cloudera, Inc. All rights reserved. 4 How are Containers Different from Virtual Machines?
  • 5. © Cloudera, Inc. All rights reserved. 5 Docker Stack
  • 6. © Cloudera, Inc. All rights reserved. 6 Docker Swarm
  • 7. © Cloudera, Inc. All rights reserved. 7 Docker on Hadoop - Questions 1. Security 2. YARN or Kubernetes? Or both? 3. Ease of use 4. Cloud and on-prem 5. CDF? 6. Persistent Storage
  • 8. © Cloudera, Inc. All rights reserved. 8 Kubernetes • Automates the distribution and scheduling of application containers across a cluster • Master controls scheduling, maintaining, and upgrading application containers • Kubelet: Agent for managing node communication with Master • Self-Healing: Can withstand node failures • Auto-scaling • Pods: Group of containers with shared storage and network. It is the smallest unit of containers managed by Kubernetes Souce: https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-intro/
  • 9. © Cloudera, Inc. All rights reserved. 9 Kubernetes and YARN • Kubernetes MasterYARN Resource Manager • YARN Application MasterContainer • KubeletNodeManager • Container Orchestrator vs. Capacity Scheduler
  • 10. © Cloudera, Inc. All rights reserved. 10 Kubernetes and YARN 1. Kubernetes Scheduler is notified of all pod creations and deletions. 2. Information is passed to YARN 3. YARN controls all cluster resources being used by workloads 4. YARN determines where to run the container 5. YARN notifies Kubernetes where the pod should run Source: https://hortonworks.com/blog/docker-kubernetes-apache-hadoop-yarn/
  • 11. © Cloudera, Inc. All rights reserved. 11 Which Distro Components Currently Use Docker? 1. Cloudbreak 2. Data Science Workbench 3. Data Plane Services The future is to provide all distro components as containerized services for both public and private clouds
  • 12. © Cloudera, Inc. All rights reserved. 12 Open Hybrid Architecture Initiative 1. Apache Ozone 2. Kubernetes + YARN + Docker 3. Distro as a Service 4. Workloads as a Service 5. Any Cloud Platform 6. Shared Data Experience 7. Container Storage Interface (CSI)
  • 13. © Cloudera, Inc. All rights reserved. 13 Container Storage Interface Source: https://hortonworks.com/blog/open-hybrid-architecture-running-stateful-containers-on-yarn/
  • 14. © Cloudera, Inc. All rights reserved. 14 Project Cumulus
  • 15. © Cloudera, Inc. All rights reserved. 15 Project Cumulus
  • 16. © Cloudera, Inc. All rights reserved. 16 Project Cumulus
  • 17. © Cloudera, Inc. All rights reserved. THANK YOU