Kubernetes Problem-Solving

techupskills.com | techskillstransformations.com
© 2021 Brent C. Laster &
@techupskills
1
Kubernetes Problem-Solving
Learning to identify, understand, and fix the most
common issues in the cluster
Provided by Tech Skills Transformations

@techupskills
2
About me
• Founder, Tech Skills Transformations LLC
• R&D DevOps Director
• Global trainer – training (Git, Jenkins,
Gradle, CI/CD, pipelines, Kubernetes, Helm,
ArgoCD, operators)
• Author -
• OpenSource.com
• Professional Git book
• Jenkins 2 – Up and Running book
• Continuous Integration vs. Continuous
Delivery vs. Continuous Deployment
mini-book on Safari
• https://www.linkedin.com/in/brentlaster
• @BrentCLaster
• GitHub: brentlaster

@techupskills
3
Professional
Git
Book
• Extensive Git reference,
explanations,
• and examples
• First part for non-technical
• Beginner and advanced
reference
• Hands-on labs

@techupskills
4
Jenkins 2
Book
• Jenkins 2 – Up and Running
• “It’s an ideal book for those
who are new to CI/CD, as well
as those who have been using
Jenkins for many years. This
book will help you discover
and rediscover Jenkins.” By
Kohsuke Kawaguchi, Creator
of Jenkins

@techupskills
5
O’Reilly Training

@techupskills
6
High-level
agenda
• Kubernetes Architecture Items
• Useful commands for working with Kubernetes
• Understanding and debugging common problems
with system resources
• Understanding and debugging issues with node
selections and extended methods for debugging
Pod issues
• Debugging failed and crashed containers within
Pods
• Working with Probes
• Troubleshooting Services (DNS, network traffic,
etc.)

@techupskills
7
Kubernetes is a Desired-State System
• User supplies desired state via declaring it in manifests
• Kubernetes works to balance the current state and the
desired state
• Desired state – what you want your production
environment to be
• Current (observed) state – current status of your
production environment
Desired state
Current (Observed)
state
Kubernetes
Reconciliation
loop

@techupskills
8
Config
Worker
Cluster Overview
etcd
API server
Controller Manager
kubectl
Pod
Pod Pod Pod Pod
Kube-proxy Kubelet Kubelet Kube-proxy
Master
Worker
Scheduler
Container Network Interface (CNI) Calico FlannelWeave
Image
Registry

@techupskills
9
• Pods have a defined lifecycle/status
• Status is surfaced via PodStatus’ field “phase”
• Lifecycle/status/phase values include:
• Pending – Pod ok for cluster, but one or more containers not setup or ready to run
• Includes times such as downloading images or waiting to be scheduled
• Running – Pod bound to a node, all containers created, and at least one container is
running, or starting/restarting
• Succeeded – all containers have terminated as successful and will not be restarted
• Failed – all containers terminated with at least one in a failed condition
• failed condition = non-zero RC or terminated by K8S
• Unknown – Pod state could not be determined
• Might occur for instance if node can’t be reached
• While in “Running”, kubelet can restart containers for some cases
• Pods track overall ”states” for containers
• Container states include
• Waiting – still doing ops to complete startup, such as pulling images or loading secrets
• Running - executing w/o issues
• Terminated
• Has run to completion
• Or failed and killed
Pod Lifecycle
Cluster
Node
Pod
(Pending)
Container
(Waiting) Running
Terminated
(Running)
(Succeeded)

@techupskills
10
Pod Conditions
• Also provided with PodStatus
• PodScheduled – schedule to node
• ContainersReady – all ready
• Initialized – init containers started ok
• Ready – Pod able to serve requests – can add to svc pools
$ k describe -n kube-system pod kube-proxy-ncrgr | grep Conditions -A5
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True

@techupskills
11
Pod Conditions (seeing details)

@techupskills
12
kubectl get
• Basic functionality is “list” but lots of useful customizaton

@techupskills
13
kubectl get with jsonpath
• example: see all images running in all namespace
• example: get list of containers in a pod
• reference: https://kubernetes.io/docs/reference/kubectl/jsonpath/
k get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' 'n' |
sort | uniq -c
1 gcr.io/k8s-minikube/storage-provisioner:v5
1 k8s.gcr.io/coredns/coredns:v1.8.0
1 k8s.gcr.io/etcd:3.4.13-0
1 k8s.gcr.io/kube-apiserver:v1.21.1
1 k8s.gcr.io/kube-controller-manager:v1.21.1
1 k8s.gcr.io/kube-proxy:v1.21.1
1 k8s.gcr.io/kube-scheduler:v1.21.1
1
kubernetesui/dashboard:v2.1.0@sha256:7f80b5ba141bead69c4fee8661464857af300d7d7ed0274cf7
beecedc00322e6
1 kubernetesui/metrics-
scraper:v1.0.4@sha256:555981a24f184420f3be0c79d4efb6c948a85cfce84034f85a563f4151a81cbf
kubectl get pods POD_NAME_HERE -o jsonpath='{.spec.containers[*].name}'

@techupskills
14
Resource types
• seeing all resource
types
• kubectl api-resources
• "explaining" a resource
(describing purpose
and format)
• kubectl explain
<resource name>

@techupskills
15
kubectl describe
# Describe a node
kubectl describe nodes kubernetes-node-
emt8.c.myproject.internal
# Describe a pod
kubectl describe pods/nginx
# Describe a pod identified by type and name in
"pod.json"
kubectl describe -f pod.json
# Describe all pods
kubectl describe pods
# Describe pods by label name=myLabel
kubectl describe po -l name=myLabel
# Describe all pods managed by the 'frontend'
replication controller
(rc-created pods
# get the name of the rc as a prefix in the pod the
name).
kubectl describe pods frontend
$ k describe pod mysql-7bf6b7fc5-2tdkw
Name: mysql-7bf6b7fc5-2tdkw
Namespace: ts
Priority: 0
Node: <none>
Labels: app=mysql
pod-template-hash=7bf6b7fc5
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/mysql-7bf6b7fc5
Containers:
roar-db:
Image: quay.io/techupskills/roar-db:1.0.1
Port: 3306/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 10Gi
Requests:
cpu: 1
memory: 10Gi
Readiness: exec [mysql] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
MYSQL_DATABASE: registry
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j2lcx (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-j2lcx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 15s (x23 over 42m) default-scheduler 0/1 nodes are
available:
1 Insufficient memory.

@techupskills
16
kubectl logs
Print the logs for a container in a pod or specified resource.
Examples:
# Return snapshot logs from pod nginx with only one container
kubectl logs nginx
# Return snapshot logs from all containers in pods defined by
label app=nginx
kubectl logs -lapp=nginx --all-containers=true
# Return snapshot of previous terminated ruby container
logs from pod web-1
kubectl logs -p -c ruby web-1
# Begin streaming the logs of the ruby container in pod web-1
kubectl logs -f -c ruby web-1
# Display only the most recent 20 lines of output in pod nginx
kubectl logs --tail=20 nginx
# Show all logs from pod nginx written in the last hour
kubectl logs --since=1h nginx
# Show logs from a kubelet with an expired serving certificate
kubectl logs --insecure-skip-tls-verify-backend nginx
# Return snapshot logs from first container of a job named hello
kubectl logs job/hello
# Return snapshot logs from container nginx-1 of a deployment
named nginx
kubectl logs deployment/nginx -c nginx-1
$ k logs mysql-5c97d489dd-z5pp2
2021-06-22 23:43:46+00:00 [Note] [Entrypoint]:
Entrypoint script for MySQL Server 5.7.32-
1debian10 started.
2021-06-22 23:43:46+00:00 [Note] [Entrypoint]:
Initializing database files
2021-06-22T23:43:46.950512Z 0 [Warning] TIMESTAMP
with implicit DEFAULT value is deprecated. Please
use --explicit_defaults_for_timestamp server
option (see documentation for more details).
2021-06-22T23:43:47.138897Z 0 [Warning] InnoDB:
New log files created, LSN=45790
2021-06-22T23:43:47.211492Z 0 [Warning] InnoDB:
Creating foreign key constraint system tables.
2021-06-22T23:43:47.270535Z 0 [Warning] No
existing UUID has been found, so we assume that
this is the first time that this server has been
started. Generating a new UUID: b02442ec-d3b3-
11eb-9519-0242ac110006.
2021-06-22T23:43:47.274194Z 0 [Warning] Gtid
table is not ready to be used. Table
'mysql.gtid_executed' cannot be opened.
2021-06-22T23:43:47.625895Z 0 [Warning] CA
certificate ca.pem is self signed.
2021-06-22T23:43:47.848450Z 1 [Warning]
root@localhost is created with an empty password
! Please consider switching off the --initialize-
insecure option.
2021-06-22 23:43:52+00:00 [Note] [Entrypoint]:
Database files initialized
2021-06-22 23:43:52+00:00 [Note] [Entrypoint]:
Starting temporary server

@techupskills
17
kubectl edit
• Allows editing of resources in
place with defined editor
$ k edit deploy/mysql
deployment.apps/mysql edited
Examples:
# Edit the service named 'docker-registry':
kubectl edit svc/docker-registry
# Use an alternative editor
KUBE_EDITOR="nano" kubectl edit
svc/docker-registry
# Edit the job 'myjob' in JSON using the v1
API format:
kubectl edit job.v1.batch/myjob -o json
# Edit the deployment 'mydeployment' in
YAML and save the modified config in
its annotation:
kubectl edit deployment/mydeployment -o
yaml --save-config

@techupskills
18
kubectl replace
• Replace a resource by
filename or stdin
• JSON or YAML
• Requires complete
resource spec
(including status)
• Obtain by kubectl get
TYPE NAME -o yaml
• Status must not have
changed since edit
$ kubectl get -n roar2 pod roar-web-74bb47bdb8-56l4n -o yaml
! Note: Pods have both a spec and a status when running

@techupskills
19
Resource Requests and Limits (min & max)
• Requests and limit are specs in Pods to control resources like
memory and CPU.
• Requests are what the container is guaranteed to have.
• K8s will only schedule the container on a node that can meet the
request.
• Limits define upper bounds.
• Container is only allowed to go up to the limit - if it hits the
limit, it is dealt with.
• Specifying Requests and Limits is a best practice
• Allows the Kubernetes scheduler to make better decisions
about where to put pods
• Memory limits
• Measured in bytes
• Can be expressed as
• Bytes, fixed-point integers (128M), or Power-of-two
equivalents (123Mi) *
• CPU limits
• Measured as units, usually as a CPU core or portion of
• K8s introduces ”millicores”
• 1 millicore = 1/1000 core so 1000m = 1 core
• Can be fractional (as in a Pod with a CPU limit of .5 is allowed
* Mebibyte is multiple of byte - mebi = 2^20 – 1 MiB = 1, 048,
576 bytes = 1024 kibibytes

@techupskills
20
How pods with resource requests are scheduled
• When you create a pod, the K8s Scheduler selects a node for it to run on
• Each node has a maximum capacity for resources (memory and cpu)
• Scheduler ensures that, for each resource type, sum of the resource
requests for the containers being scheduled is less than the capacity of the
node
• Scheduler sticks to the requests and won’t schedule if requested is more
than node has (even if actual resource usage is low)
• Effective request (used for allocation) is higher of:
• Sum of requests of containers in pod
• The request of any init container

@techupskills
21
What happens when we pass resource limits?
• If a container exceeds its memory limit, may be terminated
• If that container is restartable, k8s will restart it
• If a container exceeds its memory requested, the pod will
likely be evicted when the node runs out of memory (badly-
behaved pod is punished)
• Container may/may not be allowed to exceed its cpu usage
for extended periods of time
• Containers are not killed for excessive cpu usage

@techupskills
22
kubectl top
$ k top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
training1 148m 7% 1685Mi 17%
$ k top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-558bd4d5db-lmbqt 2m 10Mi
etcd-training1 9m 41Mi
kube-apiserver-training1 47m 261Mi
kube-controller-manager-training1 16m 41Mi
kube-proxy-rvzmv 1m 11Mi
kube-scheduler-training1 2m 14Mi
metrics-server-77c99ccb96-lrvqm 3m 13Mi
storage-provisioner 1m 8Mi
Display Resource (CPU/Memory) usage.
The top command allows you to see the resource consumption for nodes or pods.
This command requires Metrics Server to be correctly configured and working on the server.
Available Commands:
node Display Resource (CPU/Memory) usage of nodes
pod Display Resource (CPU/Memory) usage of pods
Usage:
kubectl top [flags] [options]

@techupskills
23
Helm
• Traditional deployment in Kubernetes is done with
kubectl across files into separately managed items
• Helm deploys units called charts as managed releases
Traditional Kubernetes
Kubernetes
Cluster
Deployment
.yaml
service
.yaml
ingress
.yaml
apply
apply
apply
kubectl
Deployment
Service
Ingress
Helm
Kubernetes
Cluster
Deployment
.yaml
(template)
service
.yaml
(template)
ingress
.yaml
(template)
helm
Deployment
Service
Ingress
Release
Deployment
.yaml
service
.yaml
ingress
.yaml
Values.yaml
Chart
install

@techupskills
24
|
Lab 1 – Ways to identify and remediate
issues with system resources when trying to get pods
scheduled on nodes

@techupskills
25
Turning Deployments “on & off”
• $ kubectl scale deploy <name> --replicas=0
• Brings running pods down to zero
• Keeps configuration intact
• Scale back to non-zero to turn back “on”

@techupskills
26
kubectl cluster-info
• Prints info on control-plane and services such as DNS
• Use dump subcommand to get massive additional
info (json schemas for objects, logs, etc.)
$ k cluster-info
Kubernetes control plane is running at https://10.0.2.15:8443
CoreDNS is running at https://10.0.2.15:8443/api/v1/namespaces/kube-
system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

@techupskills
27
Labels
• Labels are the main mechanisms used in Kubernetes to select/organize/associate
obects
• Label is a key-value pair without any pre-defined meaning
• Kubectl get has option “--show-labels” to show labels from objects
• Can add label to object with “kubectl label” command
• We can filter based on a label
• --selector is long form of option
• -l is short form
• Most k8s objects support set-based selectors (choosing which items based on a set to
select from). If we have pods from upper right, then we can do:
• Labels can apply to other objects, such as nodes and services and be used in other
operations
• Types: “Equality-based” (=, !=) “Set-based” ( in ())

@techupskills
28
Using labels to find info vs pod names
• Can use selector to specify pod in commands
diyuser3@training1:~/k8s-ps$ k get pods --show-labels -n kube-system
NAME READY STATUS RESTARTS AGE LABELS
coredns-558bd4d5db-vzhnz 1/1 Running 1 293d k8s-app=kube-dns,pod-template-hash=558bd4d5db
etcd-training1 1/1 Running 8 293d component=etcd,tier=control-plane
kube-apiserver-training1 1/1 Running 13 293d component=kube-apiserver,tier=control-plane
kube-controller-manager-training1 1/1 Running 7 293d component=kube-controller-manager,tier=control-plane
kube-proxy-w8r7d 1/1 Running 1 293d controller-revision-hash=6bc6858f58,k8s-app=kube-proxy,pod-template-
generation=1
kube-scheduler-training1 1/1 Running 3 293d component=kube-scheduler,tier=control-plane
storage-provisioner 1/1 Running 1 293d addonmanager.kubernetes.io/mode=Reconcile,integration-test=storage-
provisioner
diyuser3@training1:~/k8s-ps$ k get pods -n kube-system -l component='kube-apiserver' -o jsonpath='{.items[*].spec.containers[*].name}'
kube-apiserver

@techupskills
29
Selecting Nodes
• By default, K8s scheduler will automatically select a node to run on
§ Checks node’s capacity for CPU/memory and compares to pod’s
requests
§ Sum of all resource requests is less than capacity of node
• Use cases exist when pod may need to end up on a specific node
§ Pod needs a particular resource that exists only on that node -
(i.e. SSD, larger memory)
§ Pods need to be co-located on same node due to same
availability zone OR spread across availability zones
§ Pods need to be co-located due to tight software dependencies
(web-server and cache)
• Multiple ways to handle
§ nodeSelector
§ Node Affinity
§ Inter-Pod Affinity

@techupskills
30
Node Selector
• Attracts a pod to certain node(s)
• Apply a label onto one or more nodes
• Specify nodeSelector label in manifest
• Scheduler schedules nodes only on
pods that have label
node1 node2 node3
$ kubectl label node node2 diskfmt=ssd
node ”node2” labeled
$ k get nodes --show-labels
NAME STATUS ROLES AGE VERSION
LABELS
node2 Ready etcd,worker 14d v1.17.1
beta.kubernetes.io/arch=amd64,beta.kubernetes.
io/os=linux,diskfmt=ssd,kubernetes.io/arch=amd
64,kubernetes.io/hostname=node2,kubernetes.io/
os=linux,,node-role.kubernetes.io/worker=true
$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox-exp
spec:
containers:
- name: busybox
image: busybox
args:
- sleep
- “10000”
nodeSelector:
diskfmt: ssd
EOF
diskfmt=
ssd
label
busybox-exp

@techupskills
31
Node Affinity (hard)
• hard = exact rule that must be matched to schedule
§ requiredDuringSchedulingIgnoredDuringExecution
» IgnoredDuringExecution – if labels on a node changed after matching Pod already running,
let it continue
node1 node2 node3
apiVersion: v1
kind: Pod
metadata:
name: busybox-exp
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: az
operator: In
values:
- east1
- east2
containers:
- name: busybox
image: busybox
args:
- sleep
- “10000”
busybox-exp
az=
east1
label
az=
east2
label
az=
west1
label

@techupskills
32
Node Affinity (soft)
• soft = rule that prefers match to schedule
§ if not possible, allow running anyway
§ preferredDuringSchedulingIgnoredDuringExecution
§ IgnoredDuringExecution – if labels on a node changed after matching Pod already running, let it
continue
node1 node2 node3
apiVersion: v1
kind: Pod
metadata:
name: busybox-exp
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: az
operator: In
values:
- west1
containers:
- name: busybox
image: busybox
args:
- sleep
- “10000”
busybox-exp
az=
east1
label
az=
east2
label
az=
west1
label
• weight (from 1-100)
• for a node that meets all reqs (resource
reqs, affinity expressions, etc.) scheduler
computes a sum and adds “weight” value if
matchExpressions match
• highest total score is most preferred

@techupskills
33
Advantages of Affinity over Selector
• Affinity language has more options (more logical operators to express
relationships - “In, NotIn, Exists, DoesNotExist, Gt, Lt”)
• Affinity provides “soft” scheduling rules – preference but not required
• Can use ”NotIn” & “DoesNotExist” to achieve node anti-affinity (or use taints)
• Affinity supports Pod co-location.
§ Can constrain Pod to run on a node based on label of another
Pod
§ Called “inter-Pod affinity/anti-affinity
• Two types of affinity
§ Node affinity – attracts Pod to a node
§ Pod affinity – attracts a Pod to a Pod
• Pod anti-affinity
§ Repels a Pod from other Pods

@techupskills
34
Pod Affinity (and Anti-affinity)
• Allow for specifying rules about how pods should
be scheduled (or not) relative to other pods
• Pod spec specifies an affinity or anti-affinity for
other pods
• Affinity – scheduler locates new pod on same
node as other pods if new pod matches label on
current pod
• Can have required or preferred rules – like node
affinity
• Uses “labelSelector” instead of
“nodeSelectorTerms”
• Set of operators is [ In, NotIn, Exists,
DoesNotExist ]
• Requires “topologyKey” – prepopulated
Kubernetes label that system uses to denote a
domain
• Anti-affinity – prevent scheduler from locating
new pod on same node as pods with same labels
if label selector matches
• Use cases:
§ Affinity - spread or pack pods together
§ Anti-affinity – prevent pods from being on
the same nodes that might interfere
apiVersion: v1
kind: Pod
metadata:
name: busybox-exp
labels:
status: busy
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: status
operator: In
values:
- busy
topologyKey: kubernetes.io/hostname
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: name
operator: In
values:
- lazybox
topologyKey: kubernetes.io/hostname
containers:
- name: busybox
image: busybox
args:
- sleep
- “10000”

@techupskills
35
Taints and Tolerations
• Taints – apply to nodes
• Tolerations – apply to pods
• Taints – repel pods (opposite of affinity)
• Use cases
§ Nodes with special hardware (GPUs)
» Taint nodes that have special hardware
» Add tolerations for Pods that must use it
§ Evictions
» Per-Pod eviction on nodes with problems
§ Dedicated nodes – combo of affinity and taints
» Use affinity to subset nodes
» Add tolerations to schedule pods on
those nodes
node1 node2
$ kubectl taint nodes node1 app=bb:NoSchedule
$ kubectl apply –f my-pod.yaml
$ kubectl taint nodes node2 app=db:NoExecute
$ kubectl describe my-pod
“0/2 nodes available no nodes support taint”
<edit to add toleration>
<change toleration to exists>
app=bb
NoSchedule
taint
apiVersion: v1
kind: Pod
metadata:
name: busybox-exp
spec:
containers:
- name: busybox
image: busybox
args:
- sleep
- “10000”
my-pod.yaml
my-pod
Pod scheduled on node2 because
it does not have a toleration for
node1’s taint
tolerations:
- key: “app”
operator: “Equal”
value: “bb”
effect: “NoSchedule”
Pod evicted (terminated) on node2
because it does not have a
toleration for node2’s taint
app=db
NoExecute
taint
it has a toleration that matches the
taint
my-pod
tolerations:
- key: “app”
operator: “Exists”
effect: “NoExecute”
my-pod
it has a toleration that matches the
taint
X

@techupskills
36
Taint Special Cases
• Kubernetes has the concept of node conditions.
• ”conditions” field describes status of all Running nodes from [Ready,
DiskPressure, MemoryPressure, PIDPressure, NetworkUnavailable]
• Kubernetes node controller automatically taints nodes based on
conditions.
§ Example: node is out of disk; lifecycle controller adds
node.Kubernetes.io/out-of-disk taint to prevent new Pods being
on it
• Users can interact with conditions to change Pod scheduling.
§ Known as TaintNodesByCondition
§ Users can ignore Node problems by adding taints
• Other taints include node.kubernetes.io/unschedulable &
node.cloudprovider.kubernetes.io/uninitialized

@techupskills
37
|
Lab 2 – Identify and remediate issues with getting scheduled
on nodes; debug and fix pod startup issues

@techupskills
38
Ephemeral Containers
• Useful for interactive debugging if kubectl exec isn’t enough
• Examples
• container has crashed
• container doesn’t have debugging utilities in it (minimal base
image)
• Different than normal containers
• no guarantees for resources or execution
• not automatically restarted
• not intended for actual applications
• can use Container spec, but no ports, livenessProbe, readinessProbe
• process namespace sharing - supporting technology that can be enabled
in a pod
• processes in a container are visible to all other containers in the pod

@techupskills
39
Debugging with a pod copy (same container
with new command)
• Tools like ”kubectl exec” may not be usable on container
§ Container may not have a shell
§ Application may crash at startup or not behave as expected
• Can use “kubectl debug” to create copy of pod and change the
command to execute
• Can provide interactive shell to aid debugging
$ k debug roar-web-5cb658866f-b8pl6 -it --copy-to=roar-debug --container=roar-web -- sh
If you don't see a command prompt, try pressing enter.
#
Pod
Container
original
command
$ kubectl debug Pod -
--copy-to=Debug-Pod
--container=Container
-- new-command
Debug-Pod
Container
new-
command

@techupskills
40
Debugging with a pod copy (adding a
new container)
• Can use “kubectl debug” to create copy of pod and add a new
container
• Can share processes to let new container see processes in other
container(s)
$ k debug roar-web-5cb658866f-b8pl6 -it --copy-to=roar-debug2 --image=ubuntu --share-
processes
Defaulting debug container name to debugger-6v4xm.
root@roar-debug2:/#
Pod
Container
original
command
--copy-to=Debug-Pod
--image=ubuntu
--share-processes
Debug-Pod
Container
original
command
Ubuntu
Container
sh
Processes

@techupskills
41
Debugging with a pod copy (new image
for container)
• Can use “kubectl debug” to create copy of pod and change the
image the container is based on
$ k debug <web pod name> --copy-to=web-test --set-image=roar-web=quay.io/techupskills/roar-
web:1.0.2
Pod
Container
original
command
--copy-to=Debug-Pod
--set-
image=container=new
-image Debug-Pod
New container
based off of
new image
command

@techupskills
42
Deployment
kubectl set
$ k set
Configure application resources
These commands help you make changes to existing application resources.
Available Commands:
env Update environment variables on a pod template
image Update image of a pod template
resources Update resource requests/limits on objects with pod templates
selector Set the selector on a resource
serviceaccount Update ServiceAccount of a resource
subject Update User, Group or ServiceAccount in a RoleBinding/ClusterRoleBinding
Usage:
kubectl set SUBCOMMAND [options]
$ k set image deploy/roar-web roar-web=quay.io/techupskills/roar-web:1.0.2
deployment.apps/roar-web image updated
Pod
Container
original
command
$ kubectl set image
Deployment
Pod=new-image
New version of Pod
New container
based off of
new image
command

@techupskills
43
|
Lab 3 – Troubleshoot failed containers within
pods and how to spin up pods to debug them

@techupskills
44
kubectl run vs exec vs attach
• run - create and run an image in a new pod
• exec - run a command inside an existing container
• attach - attach to a process already running inside an existing container
• primary distinction between attach and exec
• exec can interact with any process you want to create
• attach connects with the current one running (no choice)
• both attach and exec allow for sending stdin from terminal to process
• debug command also has attach option (defaults to false usually) - waits for container to start and then acts like attach
$ k run -it --rm pod1 --image=busybox:1.28 --restart=Never -n ts -- yes this is a repeating message
this is a repeating message
$ k exec -it pod1 -- sh
/ # ps
PID USER TIME COMMAND
1 root 0:10 yes this is a repeating message
12 root 0:00 sh
17 root 0:00 ps
/ #
$ k attach -it pod1
repeating message

@techupskills
45
Container Probes
• Probe – diagnostic performed periodically for a container by the kubelet
• Works by having “Handlers” implemented in the container (approaches for checks)
Node
Pod
Container
startupProbe
ExecAction
TCPSocketAction
HTTPGetAction
kubelet
livenessProbe
ExecAction
TCPSocketAction
HTTPGetAction
readinessProbe
ExecAction
TCPSocketAction
command RC=0
Success
• 3 Types of Handlers
• ExecAction – executes a command in
the container
• success = exit w/ 0 rc
• TCPSocketAction – does a TCP
check on pod’s IP and a port
• success = port is open
• HTTPGetAction – performs GET
request on pod’s IP on specified port
and path
• success = rc >= 200 & rc <=
400
• 3 Kinds of Probes
• livenessProbe – is container running
• fail – container killed/restarted
• readinessProbe – is container ready
to respond to events
• fail – endpoints controller
removes Pod’s IP address
from endpoints of all services
for matching Pods
• startupProbe – is application in
container started
• if provided, other probes
disabled till this succeeds
• fail – container killed/restarted
• Default (not provided) state for all probes =
Success
port = open
Success
GET IP:port:path
Success
TCP – IP:port
RC >= 200 & rc <= 400
HTTPGetAction
ExecAction
TCPSocketAction
HTTPGetAction

@techupskills
46
Probe Use Guidelines
• liveness probe
• Identifying conditions when container should be killed and
restarted
• May not be needed if container triggers a crash for issues or if it
becomes unhealthy
• readiness probe
• keep traffic from going to a Pod until ready
• could be same as liveness but allows for start and wait for traffic
• use for long startup times
• even w/o readiness probe, Pod goes to “unready” when deleted
while waiting for its containers to stop
• startup probe
• useful for long service startup times
• use instead of long liveness interval

@techupskills
47
|
Lab 4 – Debug issues when trying to use probes
to do health, liveness or readiness checks

@techupskills
48
Debugging with a shell on a node
• Can create privileged pod on a particular node
• privileged
• processes in container are essentially equal to root on host
• container is given access to all devices on host
• exception - default is not to be privileged
• Allows for privileged way to check items on same node as problematic pod
• Container runs in host IPC, Network, and PID namespaces
• root filesystem of Node is mounted at /host
$ k debug node/training1 -it --image=busybox:1.28
Creating debugging pod node-debugger-training1-rkk2m with container debugger on node training1.
/ # ls /host
bin etc lib64 proc srv var
boot home lost+found root swapfile vmlinuz
cdrom initrd.img media run sys vmlinuz.old
data initrd.img.old mnt sbin tmp
dev lib opt snap usr
/ # ls /host/home
diyuser3 git linuxbrew
/ # whoami
root

@techupskills
49
Endpoints
name= roar-web
K8S Node
Namespace
8080
Deployment
2
name=roar-web
8080
IP
172.17.0.21
IP
172.17.0.22
8080
IP
172.17.0.23
8080
Service
31789
Node
IP
8089
1
Replica Set
Endpoints
Selector:
name = roar-web
IP 172.17.0.23
IP 172.17.0.21
IP 172.17.0.22
§ Service provides virtual
address for us to
connect to for frontend
§ Uses labels to select
“pool” of backend pods
to map to
§ Endpoints for a service
are list of available ip
addresses for usable
pods
§ Example service here
is NodePort
§ If one pod goes down
or becomes
unavailable, service
can connect to another
pod
§ Spec is shown below

@techupskills
50
kubectl patch
• Allows for updating parts of existing resources
• syntax: kubectl patch (-f FILENAME | TYPE NAME) [-p PATCH|--patch-file FILE] [options]
• Three forms of patches
§ strategic merge patch (default)
» yaml file created with part of spec to be patched
» patch may either replace or added to existing spec portion
» depends on “patch strategy” of object
§ JSON merge patch
» to update part of a spec, have to specify entire spec section
» new spec section completely replaces existing section
§ JSON patch
» set of atomic operations on a JSON doc (”add”, “remove”, “replace”, etc.)
» uses “op” key to denote type of operation
» other keys are in arguments to operation
» path argument is JSON pointer to part of doc targeted for operation
# Update a container's image; spec.containers[*].name is required because it’s a merge key.
kubectl patch pod valid-pod -p '{"spec":{"containers":[{"name":"kubernetes-serve-hostname","image":"new image"}]}}'
# Update a container's image using a json patch with positional arrays.
kubectl patch pod valid-pod --type='json' -p='[{"op": "replace", "path": "/spec/containers/0/image", "value":"new image"}]'

@techupskills
51
jq
• Similar to sed for JSON data
• Use like awk, grep, sed for JSON
• stedolan.github.io/jq/
• Example: k get svc mysql -o json | jq -j '.spec.selector'
{
"name": "roar-db"
}

@techupskills
52
Debugging Networking for Services
Check DNS via
# nslookup
kubernetes.default
Kubernetes.default is a special
service that provides a way for
internal applications in the
cluster to talk to the API server.
OK?
Check if we can see our
service
# nslookup our-service
OK?
Check if we get to services
within cluster
# wget -q0 <CLUSTER-
IP>:<PORT> for service
OK?
Check if we can get
endpoints for service
$ kubectl get ep

@techupskills
53
|
Lab 5 – Troubleshoot and determine the
problem(s) when your service isn’t accessible

@techupskills
54
That’s all - thanks!
techskillstransformations.com
getskillsnow.com

Kubernetes Problem-Solving

Related slideshows

More Related Content

Kubernetes Problem-Solving