Cilium is an open source software that provides networking and security for Kubernetes. It implements Kubernetes networking, security policies, load balancing, and service mesh capabilities using eBPF. Cilium provides multi-cluster networking by coupling multiple Kubernetes clusters into a cluster mesh with a shared control plane. It also offers a sidecar-less service mesh that uses eBPF and Envoy for L4 and L7 traffic management instead of injecting proxies into each pod. Demos showed Cilium's multi-cluster load balancing and policies as well as its service mesh capabilities.
2. ● Cilium Overview
● Cilium Cluster Mesh
● Cilium Service Mesh
● Some aspects of other Cilium features (eBPF data path, load balancing
optimizations, policy)
● Relation to other K8s community/ RH projects
● Demos
● Summary/ Takeaways
Agenda
Source:
Insert source data here
What we will discuss today
2
6. Diagram:
Borkmann, Isovalent
Cilium E-W and N-S LB w/o kube-proxy
- Handles external traffic (N-S) for svc IP:port
- Backends can be local or remote
- Performs DNAT and DSR/SNAT/Hybrid when remote
- Same code compilable for XDP and tc/BPF
- Hairpin to remote on XDP layer, local backends
handled via tc ingress
eth0
eth0
redis
lxc0
Node A
eth0
eth0
nginx
lxc0
Node B
client
XDP/BPF
tc/BPF
sock/BPF sock/BPF
XDP/BPF
tc/BPF
- Handles internal traffic (E-W) for svc IP:port
- Backends can be local or remote
- No packet-based NAT needed due to connect(),
sendmsg(), recvmsg() hook
- No intermediate hops as in kube-proxy
- Exposes services to all local addresses and
loopback 127.0.0.1/::1
- Blocks other applications in post-bind() hook
from port reuse
Main principle: Operating as close as
possible to the socket for E-W and as close
as possible to the driver for N-S.
6
8. ● Multi-cluster networking analogous to “Submariner Mesh” or “Kubernetes Multi-Cluster
Services API” but significant differences
● Need Pod IP, service IP uniqueness and direct routability (no NAT) across the mesh
● This is not Kubernetes Federation .. still separately provisioned clusters but with coupled
networking, up to 256 clusters (possibly more in future) in a cluster mesh
● Separate control plane/ etcd for cross-cluster information sharing (e.g. pod IPs)
● MC Policy, identity at this layer, MC Load balancing (N-S, E-W)
● Use this for Multi-cluster with or without Cilium Service Mesh
● Encryption options: IPSec and Wireguard differences (per node tunnel vs per worker)
● Relation of K8s MCS API, Submariner, other community projects, compare MCS 2
resources (ClusterSetIPs/ ClusterIPs vs Cilium single service + global annotation)
● Note: Recently announced Cilium Mesh builds on this further
Cilium Cluster Mesh
8
9. Cilium Multi-Cluster Mesh -Control plane
● 2 or more (up to 256) independently provisioned k8s clusters, all running Cilium CNI, coupled in a “cluster mesh” (sort of “submariner mesh”)
● MCM control plane: A separate control plane with separate etcd datastore for the Multi-cluster mesh itself running as data plane pods within
the k8s clusters
● Cilium operator mirrors global k8s services, associated endpoints, related network policy info into MCM etcd
● A k8s Service is marked “Global” explicitly via Cilium annotations
○ Example: service.cilium.io/global: "true"
Diagram:
Cilium.io 9
11. Multi-Cluster Services & Network Policies
○ Relevant annotations:
■ service.cilium.io/global: "true" (/ “false”) Mark this local service as a “Global” (or not)
■ service.cilium.io/shared: "true" (/ “false”) Mark this local service as “Shared” (or not) within Global
■ service.cilium.io/affinity: "local|remote|none" Global service endpoint load balancing affinity/ preference
○ Note: Global services also have to adhere to namespace sameness rules
○ Multi-Cluster Network Policies
■ Exactly same API and implementation as single cluster network policies (both K8s network policy and
Cilium proprietary L4 and L7 Network policies)
■ Network policy labels/ selectors are reflected in the Multi-cluster mesh control plane so can have
global significance (plus optional additional per-cluster qualification within policy selectors)
11
12. Multi-Cluster Network Policy Example
#Sample Cilium Multicluster network policy augmented with cluster selectors
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "allow-cross-cluster"
spec:
description: "Allow x-wing in cluster1 to contact rebel-base in cluster2"
endpointSelector:
matchLabels:
name: x-wing
io.cilium.k8s.policy.cluster: cluster1
egress:
- toEndpoints:
- matchLabels:
name: rebel-base
io.cilium.k8s.policy.cluster: cluster2
12
14. Demo Topology
S1 S2
S1
S2
Cluster-id 1 Cluster-id 2
Cilium install –cluster-id 1 …
Cilium clustermesh enable
Cilium clustermesh connect
–context c1 –destination-context c2
Global Service Annotations
io.cilium/global-service="true"
io.cilium/shared-service="false"
io.cilium/service-affinity=local
Example demo application:
S1 & S2 each is a global (multi-cluster)
service with 2 backend pods in each
cluster of the clustermesh
clustermesh
14
15. Demos
● Demo 1:
○ Cilium ClusterMesh intro and setup
○ Link to demo recording
● Demo 2:
○ Multi-cluster E-W Services & Load balancing
○ Multi-cluster network policy
○ Link to demo recording
● Demo 3:
○ N-S Load balancing, gateway API
○ Single and multi-cluster
○ Link to demo recording
15
16. Demo Topology
S1 S2
S1
S2
Cluster-id 1 Cluster-id 2
N-S Loadbalancing using
Cilium Ingress or Cilium GW Api
Multi-Cluster Ingress when combined
with Cilium ClusterMesh
Multiple modes possible (demo
topology shows just 1 mode)
clustermesh
GTWY
16
17. Background: Multi-Cluster Ingress LB modes/ scenarios
Svc
1
Svc
2
Svc
1
Svc
2
Svc
3
GW
(K8s GW api)
E-W GW
E-W GW
Svc
1
Svc
2
Svc
1
Svc
2
Svc
3
GW
(K8s GW api)
GW
(K8s GW api)
Single gateway, on-cluster LB, Multi-network Multi-gateway, on-cluster LB, Single-network
Svc
1
Svc
2
Svc
1
Svc
2
Svc
3
Single gateway, off-cluster (e.g. public cloud) LB, Single-network
External GLB class
Multi-cluster services can be combined
with BPG, DNS and public cloud anycast
to yield a variety of multi-cluster L4 and L7
ingress solutions for various use cases
including RH Hybrid Cloud Gateway.
Related Refs.
RH-ET blog post on this topic
18. Diagram:
Borkmann, Isovalent
Cilium E-W and N-S LB w/o kube-proxy
- Handles external traffic (N-S) for svc IP:port
- Backends can be local or remote
- Performs DNAT and DSR/SNAT/Hybrid when remote
- Same code compilable for XDP and tc/BPF
- Hairpin to remote on XDP layer, local backends
handled via tc ingress
eth0
eth0
redis
lxc0
Node A
eth0
eth0
nginx
lxc0
Node B
client
XDP/BPF
tc/BPF
sock/BPF sock/BPF
XDP/BPF
tc/BPF
- Handles internal traffic (E-W) for svc IP:port
- Backends can be local or remote
- No packet-based NAT needed due to connect(),
sendmsg(), recvmsg() hook
- No intermediate hops as in kube-proxy
- Exposes services to all local addresses and
loopback 127.0.0.1/::1
- Blocks other applications in post-bind() hook
from port reuse
Main principle: Operating as close as
possible to the socket for E-W and as close
as possible to the driver for N-S.
Additional refs: Cilium kube-proxy replacement and
related enhancements
18
19. eBpf Intercepts
for Nodeport Svc
Rx from
buffer
XDP
alloc_skb
TC ingress
nat
Prerouting
mangle
Prerouting
conntrack
raw
Prerouting
Socket
lookup
Container namespace
Prerouting & Input Chains
mangle
Postrouting
nat
Postrouting
Routing
Destined to
host
Input chain
destined
To Veth
To container
Veth ns
Socket rx
buffer/ app
Rx node pkt
filter
Forward
20. Eth0
172.18.0.2
a.a.a.a/ pa b.b.b.b/ pb
NodePort svc:
172.18.0.2:31000 =>
(10.1.2.2:80, 10.1.2.4:80)
10.1.2.2:80
xx:31000
Eth0
172.18.0.3
10.1.2.4:80
xx:31000
Cilium N-S enhancements
- Direct Server Return and Hybrid modes
(in addition to SNAT mode)
- Source IP preservation
- XDP acceleration
- 4to6 NAT, Maglev hashing
22. ● Option 1: Use Cilium only for L3/ L4 networking, use Istio control and data planes for L7
service mesh
● Option 2: Preferred long term direction but not fully ready tet.
○ Use Cilium as a single solution for all Kubernetes networking including
■ CNI plugin
■ Multi-Cluster networking
■ L4 and L7 Service Mesh
■ All networking functions incl load balancing (N-S and E-W), network policy, ingress and egress, gateway API implementation
■ 1) Data plane: Cilium. 2) Control plane: K8s native (Gateway api) + Envoy config CRD + Cilium apis
● CSM uses “sidecar-less model” in contrast with Istio/ LinkerD per pod sidecar model
● Istio community is developing “Ambient mode” of Istio in response to CSM sidecar less
mode
● Side note: State of multi-cluster in upstream native K8s apis (independent of specific
service meshes) is incomplete. Early draft proposals at initiating standards
Cilium Service Mesh (CSM)
22
23. Cilium’s Design Philosophy for Service Mesh
● A single networking plugin can serve all networking needs (basic CNI, service load
balancing, network policy, ingress, multi-cluster networking & service mesh
functions at both L4 and L7 layers). This results in a better integrated
architecture, improved user experience and lower resource consumption and
control plane complexity than using multiple separate projects to serve as CNI
plugin, service mesh plugin, ingress, gateway API plugin, multi-cluster networking
plugin etc.
○ Cilium already has had both L4 and L7 networking policy and load balancing even for its CNI
plugin, just reuse it for service mesh and augment where needed, rather than create separate
functions.
○ Cilium already has L4 traffic encryption & zero trust networking functions, reuse for service mesh
○ Extensions to Kubernetes APIs like Gateway API and others are already beginning to address all
service mesh functions without need for special APIs like the Istio & LinkerD
○ Service mesh and gateways are moving into the kernel and infra layers in any case (e.g. Ambient)
23
24. Diagram: Liz Rice: Learning eBPF
● Conventional sidecar based model => poor
latency due to suboptimal packet processing
with multiple user space<->kernel space
context switches
● Lowered reliability due to disconnect
between sidecar proxy readiness and app
readiness, server side initiated connections
● High resource consumption (proxy per pod
adds up)
Issues with Sidecar proxies
24
25. L4 + L7 data paths
L4 data path
L7 data path
25
Diagram: cilium.io
27. S1 S2 S2
GTWY
Envoy
Cilium
Agent
Envoy
IPSec Tunnel
L7 mesh
traffic
● Cilium Agent envoy 1 per
node, used for L7 proxy (N-S
and E-W service load
balancing & policy)
● Cilium kernel eBPF used for
L4 pod and service load
balancing, policy
● Single Envoy instance
wrapped inside Cilium agent
for all L7 functions
● Special mTLS + IPSec &
Wireguard optional “on the
wire”
27
28. Diagram
cilium.io
Cilium’s Alternate Architecture for Service Mesh mutual-TLS
● Conventional session based
TLS: usable only by
applications running on TCP
and HTTP, fine grained
sessions but performance
impact
● Network based encrypted
tunnels (IPSec, Wireguard
etc): usable by any app
protocol, coarse grained,
higher performance
● Cilium mTLS : Combine the
two (Cilium Proprietary
solution)
● Not full GA yet (Cilium 1.14
?)
28