OVS uses AF_XDP to provide a fast userspace datapath. AF_XDP is a Linux socket that receives frames with low overhead via XDP. OVS implements an AF_XDP netdev that passes packets to the OVS userspace datapath with minimal processing in the kernel. Optimizations like batching and pre-allocation reduce cache misses and system calls. Prototype tests show L2 forwarding at 14Mpps and PVP at 3.3Mpps, approaching but still slower than DPDK performance. Further optimizations to AF_XDP and OVS are needed to achieve wire speed processing without dedicated hardware.
3. Linux AF_XDP
• A new socket type that receives/sends raw
frames with high speed
• Use XDP (eXpress Data Path) program to
trigger receive
• Userspace program manages Rx/Tx ring and
Fill/Completion ring.
• Zero Copy from DMA buffer to user space
memory with driver support
• Ingress/egress performance > 20Mpps [1]
3
From “DPDK PMD for AF_XDP”, Zhang Qi
[1] The Path to DPDK Speeds for AF XDP, Linux Plumber 2018
4. OVS-AF_XDP Netdev
ovs-vswitchd
Goal
• Use AF_XDP socket as a fast
channel to usersapce OVS
datapath, dpif-netdev
• Flow processing happens in
userspace
4
Network Stacks
Hardware
User space
Driver +
XDP
Userspace
DatapathAF_XDP
socket
Kernel
high speed channel
5. OVS-AF_XDP Architecture
5
Existing
• netdev: abstraction layer for network device
• dpif: datapath interface
• dpif-netdev: userspace implementation of OVS
datapath
New
• Kernel: XDP program and eBPF map
• AF_XDP netdev: implementation of afxdp device
ovs/Documentation/topics/porting.rst
6. OVS AF_XDP Configuration
#./configure
# make && maskinstall
# make check-afxdp
# ovs-vsctladd-br br0 --
set Bridge br0 datapath_type=netdev
# ovs-vsctladd-port br0 eth0 --
set int enp2s0 type="afxdp”
Based on v3 patch: [ovs-dev] [PATCHv3 RFC 0/3] AF_XDP netdev support for OVS
7. Prototype Evaluation
• Sender sends 64Byte, 20Mpps to one port, measure the
receiving packet rate at the other port
• Measure single flow, single core performance with Linux
kernel 4.19-rc3 and OVS master
• Enable AF_XDP Zero Copy mode
• Performance goal: 20Mpps rxdop
16-core Intel Xeon
E5 2620 v3 2.4GHz
32GB memory
DPDK packet generator
Netronome
NFP-4000 + AF_XDP
Userspace Datapath
br0
ingress Egress
enp2s0
7
20Mpps
sender
Intel XL710
40GbE
8. Budget your
packet like
Budget your
money
Time Budget
To achieve 20Mpps
• Budget per packet: 50ns
• 2.4GHz CPU: 120 cycles per packet
Fact [1]
• Cache misses: 32ns, x86 LOCK prefix: 8.25ns
• System call with/wo SELinux auditing: 75ns / 42ns
Batch of 32 packets
• Budget per batch: 50ns x 32 = 1.5us
[1] Improving Linux networking performance, LWN, https://lwn.net/Articles/629155/, Jesper Brouer
9. Optimization 1/5
• OVS pmd (Poll-Mode Driver) netdev for rx/tx
• Before: call poll() syscall and wait for new I/O
• After: dedicated thread to busy polling the Rx ring
• Effect: avoid system call overhead
9
+const struct netdev_class netdev_afxdp_class = {
+ NETDEV_LINUX_CLASS_COMMON,
+ .type= "afxdp",
+ .is_pmd = true,
.construct = netdev_linux_construct,
.get_stats = netdev_internal_get_stats,
10. Optimization 2/5
• Packet metadata pre-allocation
• Before: allocate md when receives packets
• After: pre-allocate md and initialize it
• Effect:
• Reduce number of per-packet operations
• Reduce cache misses
10
Multiple 2KB umem chunk memory region
storing packet data
Packet metadata in continuous memory region
(struct dp_packet)
One-to-one maps to AF_XDP umem
11. Optimizations 3-5
• Packet data memory pool for AF_XDP
• Fast data structure to GET and PUT free memory chunk
• Effect: Reduce cache misses
• Dedicated packet data pool per-device queue
• Effect: Consume more memory but avoid mutex lock
• Batching sendmsg system call
• Effect: Reduce system call rate
11
Reference: Bringing the Power of eBPF to Open vSwitch, Linux Plumber 2018
23. Performance Result
OVS
AF_XDP
PPS CPU
RX Drop 19Mpps 200%
L2fwd [2] 14Mpps 200%
PVP [3] 3.3Mpps 300%
[1] Intel® Open Network Platform Release 2.1 Performance Test Report
[2] Demo rxdrop/l2fwd: https://www.youtube.com/watch?v=VGMmCZ6vA0s
[3] Demo PVP: https://www.youtube.com/watch?v=WevLbHf32UY
OVS
DPDK [1]
PPS CPU
RX Drop NA NA
l3fwd 13Mpps 100%
PVP 7.4Mpps 200%
24. Conclusion 1/2
• AF_XDP is a high-speed Linux socket type
• We add a new netdev type based on AF_XDP
• Re-use the userspace datapath used by OVS-DPDK
Performance
• Pre-allocate and pre-init as much as possible
• Batching does not reduce # of per-packet operations
• Batching + cache-aware data structure amortizes the cache misses
25. Conclusion 2/2
• Need high packet rate but can’t deploy DPDK? Use AF_XDP!
• Still slower than OVS-DPDK [1], more optimizations are coming [2]
Comparison with OVS-DPDK
• Better integration with Linux kernel and management tool
• Selectively use kernel’s feature, no re-injection needed
• Do not require dedicated device or CPU
[1] The eXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel
[2] The Path to DPDK Speeds for AF XDP, Linux Plumber 2018