Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Fast Userspace OVS with
AF_XDP
OVS Conference 2018
William Tu, VMware Inc
Outline
• AF_XDP Introduction
• OVS AF_XDP netdev
• Performance Optimizations
Linux AF_XDP
• A new socket type that receives/sends raw
frames with high speed
• Use XDP (eXpress Data Path) program to
trigger receive
• Userspace program manages Rx/Tx ring and
Fill/Completion ring.
• Zero Copy from DMA buffer to user space
memory with driver support
• Ingress/egress performance > 20Mpps [1]
3
From “DPDK PMD for AF_XDP”, Zhang Qi
[1] The Path to DPDK Speeds for AF XDP, Linux Plumber 2018
OVS-AF_XDP Netdev
ovs-vswitchd
Goal
• Use AF_XDP socket as a fast
channel to usersapce OVS
datapath, dpif-netdev
• Flow processing happens in
userspace
4
Network Stacks
Hardware
User space
Driver +
XDP
Userspace
DatapathAF_XDP
socket
Kernel
high speed channel
OVS-AF_XDP Architecture
5
Existing
• netdev: abstraction layer for network device
• dpif: datapath interface
• dpif-netdev: userspace implementation of OVS
datapath
New
• Kernel: XDP program and eBPF map
• AF_XDP netdev: implementation of afxdp device
ovs/Documentation/topics/porting.rst
OVS AF_XDP Configuration
#./configure
# make && maskinstall
# make check-afxdp
# ovs-vsctladd-br br0 --
set Bridge br0 datapath_type=netdev
# ovs-vsctladd-port br0 eth0 --
set int enp2s0 type="afxdp”
Based on v3 patch: [ovs-dev] [PATCHv3 RFC 0/3] AF_XDP netdev support for OVS
Prototype Evaluation
• Sender sends 64Byte, 20Mpps to one port, measure the
receiving packet rate at the other port
• Measure single flow, single core performance with Linux
kernel 4.19-rc3 and OVS master
• Enable AF_XDP Zero Copy mode
• Performance goal: 20Mpps rxdop
16-core Intel Xeon
E5 2620 v3 2.4GHz
32GB memory
DPDK packet generator
Netronome
NFP-4000 + AF_XDP
Userspace Datapath
br0
ingress Egress
enp2s0
7
20Mpps
sender
Intel XL710
40GbE
Budget your
packet like
Budget your
money
Time Budget
To achieve 20Mpps
• Budget per packet: 50ns
• 2.4GHz CPU: 120 cycles per packet
Fact [1]
• Cache misses: 32ns, x86 LOCK prefix: 8.25ns
• System call with/wo SELinux auditing: 75ns / 42ns
Batch of 32 packets
• Budget per batch: 50ns x 32 = 1.5us
[1] Improving Linux networking performance, LWN, https://lwn.net/Articles/629155/, Jesper Brouer
Optimization 1/5
• OVS pmd (Poll-Mode Driver) netdev for rx/tx
• Before: call poll() syscall and wait for new I/O
• After: dedicated thread to busy polling the Rx ring
• Effect: avoid system call overhead
9
+const struct netdev_class netdev_afxdp_class = {
+ NETDEV_LINUX_CLASS_COMMON,
+ .type= "afxdp",
+ .is_pmd = true,
.construct = netdev_linux_construct,
.get_stats = netdev_internal_get_stats,
Optimization 2/5
• Packet metadata pre-allocation
• Before: allocate md when receives packets
• After: pre-allocate md and initialize it
• Effect:
• Reduce number of per-packet operations
• Reduce cache misses
10
Multiple 2KB umem chunk memory region
storing packet data
Packet metadata in continuous memory region
(struct dp_packet)
One-to-one maps to AF_XDP umem
Optimizations 3-5
• Packet data memory pool for AF_XDP
• Fast data structure to GET and PUT free memory chunk
• Effect: Reduce cache misses
• Dedicated packet data pool per-device queue
• Effect: Consume more memory but avoid mutex lock
• Batching sendmsg system call
• Effect: Reduce system call rate
11
Reference: Bringing the Power of eBPF to Open vSwitch, Linux Plumber 2018
Performance Evaluation
OVS AF_XDP RX drop
# ovs-ofctl add-flow br0 
"in_port=enp2s0, actions=drop"
# ovs-appctl pmd-stats-show
OVS AF_XDP
br0
enp2s0
DROP
pmd-stats-show (rxdrop)
pmd thread numa_id 0 core_id 11:
packets received: 2069687732
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 2069687636
smc hits: 0
megaflow hits: 95
avg. subtable lookups per megaflow hit: 1.00
miss with success upcall: 1
miss with failed upcall: 0
avg. packets per output batch: 0.00
idle cycles: 4196235931 (1.60%)
processing cycles: 258609877383 (98.40%)
avg cycles per packet: 126.98 (262806113314/2069687732)
avg processing cycles per packet: 124.95 (258609877383/2069687732)
120ns budget
for 20Mpps
Perf record -p `pidof ovs-vswitchd` sleep 10
26.91% pmd7 ovs-vswitchd [.] netdev_linux_rxq_xsk
26.38% pmd7 ovs-vswitchd [.] dp_netdev_input__
24.65% pmd7 ovs-vswitchd [.] miniflow_extract
6.87% pmd7 libc-2.23.so [.]__memcmp_sse4_1
3.27% pmd7 ovs-vswitchd [.] umem_elem_push
3.06% pmd7 ovs-vswitchd [.] odp_execute_actions
2.03% pmd7 ovs-vswitchd [.] umem_elem_pop
top
PID USER PR NI VIRT RES SHR S %CPU%MEM TIME+COMMAND
16root 20 0 0 0 0R 100.0 0.0 75:16.85ksoftirqd/1
21088root 20 0 451400 52656 4968S 100.0 0.2 6:58.70ovs-vswitchd
Mempool
overhead
OVS AF_XDP l2fwd
#ovs-ofctladd-flowbr0"in_port=enp2s0actions=
set_field:14->in_port,
set_field:a0:36:9f:33:b1:40->dl_src,enp2s0"
OVS AF_XDP
br0
enp2s0
pmd-stats-show (l2fwd)
pmd thread numa_id 0 core_id 11:
packets received: 868900288
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 868900164
smc hits: 0
megaflow hits: 122
avg. subtable lookups per megaflow hit: 1.00
miss with success upcall: 2
miss with failed upcall: 0
avg. packets per output batch: 30.57
idle cycles: 3344425951 (2.09%)
processing cycles: 157004675952 (97.91%)
avg cycles per packet: 184.54 (160349101903/868900288)
avg processing cycles per packet: 180.69 (157004675952/868900288)
Extra ~55 cycles for send
Perf record -p `pidof ovs-vswitchd` sleep 10
25.92% pmd7 ovs-vswitchd [.]netdev_linux_rxq_xsk
17.75% pmd7 ovs-vswitchd [.]dp_netdev_input__
16.55% pmd7 ovs-vswitchd [.]netdev_linux_send
16.10% pmd7 ovs-vswitchd [.]miniflow_extract
4.78% pmd7 libc-2.23.so [.]__memcmp_sse4_1
3.67% pmd7 ovs-vswitchd [.]dp_execute_cb
2.86% pmd7 ovs-vswitchd [.]__umem_elem_push
2.46% pmd7 ovs-vswitchd [.]__umem_elem_pop
1.96% pmd7 ovs-vswitchd [.]non_atomic_ullong_add
1.69% pmd7 ovs-vswitchd [.]dp_netdev_pmd_flush
_output_on_port
TOP results are similar to rxdrop
Mempool
overhead
# ./configure --with-dpdk=
# ovs-ofctl add-flow br0 "in_port=enp2s0, 
actions=output:vhost-user-1"
# ovs-ofctl add-flow br0 "in_port=vhost-user-1,
actions=output:enp2s0"
AF_XDP PVP Performance
• QEMU 3.0.0
• VM Ubuntu 18.04
• DPDK stable 17.11.4
• OVS-DPDK vhostuserclient port
• options:dq-zero-copy=true
• options:n_txq_desc=128
OVS AF_XDP
br0
QEMU + vhost-user
VM
XDP redirect
enp2s0
virtio
PVP CPU utilization
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND
16root 20 0 0 0 0R 100.0 0.0 88:26.26 ksoftirqd/1
21510root 20 0 9807168 53724 5668S100.0 0.2 5:58.38 ovs-vswitchd
21662root 20 0 4894752 30576 12252S 100.0 0.1 5:21.78qemu-system-x86
21878root 20 0 41940 3832 3096R 6.2 0.0 0:00.01top
pmd-stats-show (PVP)
pmd thread numa_id 0 core_id 11:
packets received: 205680121
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 205680121
smc hits: 0
megaflow hits: 0
avg. subtable lookups per megaflow hit: 0.00
miss with success upcall: 0
miss with failed upcall: 0
avg. packets per output batch: 31.01
idle cycles: 0 (0.00%)
processing cycles: 74238999024 (100.00%)
avg cycles per packet: 360.94 (74238999024/205680121)
avg processing cycles per packet: 360.94 (74238999024/205680121)
AF_XDP PVP Performance Evaluation
• ./perf record -p `pidof ovs-vswitchd` sleep 10
15.88% pmd28 ovs-vswitchd [.]rte_vhost_dequeue_burst
14.51% pmd28 ovs-vswitchd [.]rte_vhost_enqueue_burst
10.41% pmd28 ovs-vswitchd [.]dp_netdev_input__
8.31% pmd28 ovs-vswitchd [.]miniflow_extract
7.65% pmd28 ovs-vswitchd [.]netdev_linux_rxq_xsk
5.59% pmd28 ovs-vswitchd [.]netdev_linux_send
4.20% pmd28 ovs-vswitchd [.]dpdk_do_tx_copy
3.96% pmd28 libc-2.23.so [.]__memcmp_sse4_1
3.94% pmd28 libc-2.23.so [.]__memcpy_avx_unaligned
2.45% pmd28 ovs-vswitchd [.]free_dpdk_buf
2.43% pmd28 ovs-vswitchd [.]__netdev_dpdk_vhost_send
2.14% pmd28 ovs-vswitchd [.]miniflow_hash_5tuple
1.89% pmd28 ovs-vswitchd [.]dp_execute_cb
1.82% pmd28 ovs-vswitchd [.]netdev_dpdk_vhost_rxq_recv
Performance Result
OVS
AF_XDP
PPS CPU
RX Drop 19Mpps 200%
L2fwd [2] 14Mpps 200%
PVP [3] 3.3Mpps 300%
[1] Intel® Open Network Platform Release 2.1 Performance Test Report
[2] Demo rxdrop/l2fwd: https://www.youtube.com/watch?v=VGMmCZ6vA0s
[3] Demo PVP: https://www.youtube.com/watch?v=WevLbHf32UY
OVS
DPDK [1]
PPS CPU
RX Drop NA NA
l3fwd 13Mpps 100%
PVP 7.4Mpps 200%
Conclusion 1/2
• AF_XDP is a high-speed Linux socket type
• We add a new netdev type based on AF_XDP
• Re-use the userspace datapath used by OVS-DPDK
Performance
• Pre-allocate and pre-init as much as possible
• Batching does not reduce # of per-packet operations
• Batching + cache-aware data structure amortizes the cache misses
Conclusion 2/2
• Need high packet rate but can’t deploy DPDK? Use AF_XDP!
• Still slower than OVS-DPDK [1], more optimizations are coming [2]
Comparison with OVS-DPDK
• Better integration with Linux kernel and management tool
• Selectively use kernel’s feature, no re-injection needed
• Do not require dedicated device or CPU
[1] The eXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel
[2] The Path to DPDK Speeds for AF XDP, Linux Plumber 2018
Thank you
./perf kvm stat record -p 21662 sleep 10
Analyze eventsforall VMs,all VCPUs:
VM-EXIT Samples Samples% Time% MinTime Max Time Avg time
HLT 298071 95.56% 99.91% 0.43us 511955.09us 32.95us (+- 19.18% )
EPT_MISCONFIG 10366 3.32% 0.05% 0.39us 12.35us 0.47us ( +- 0.71% )
EXTERNAL_INTERRUPT 2462 0.79% 0.01% 0.33us 21.20us 0.50us ( +- 3.21% )
MSR_WRITE 761 0.24% 0.01% 0.40us 12.74us 1.19us ( +- 3.51% )
IO_INSTRUCTION 185 0.06% 0.02% 1.98us 35.96us 8.30us ( +- 4.97% )
PREEMPTION_TIMER 62 0.02% 0.00% 0.52us 2.77us 1.04us ( +- 4.34% )
MSR_READ 19 0.01% 0.00% 0.79us 2.49us 1.37us ( +- 8.71% )
EXCEPTION_NMI 1 0.00% 0.00% 0.58us 0.58us 0.58us ( +- 0.00% )
Total Samples:311927,Total eventshandledtime:9831483.62us.
root@ovs-afxdp:~/ovs# ovs-vsctl show
2ade349f-2bce-4118-b633-dce5ac51d994
Bridge "br0"
Port "br0"
Interface"br0"
type: internal
Port "vhost-user-1"
Interface"vhost-user-1"
type: dpdkvhostuser
Port "enp2s0"
Interface"enp2s0"
type: afxdp
QEMU
qemu-system-x86_64-hda ubuntu1810.qcow
-m4096
-cpuhost,+x2apic -enable-kvm
-chardevsocket,id=char1,path=/tmp/vhost,server 
-netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 
-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,
mq=on,vectors=10,mrg_rxbuf=on,rx_queue_size=1024
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on
-numanode,memdev=mem -mem-prealloc-smp 2

More Related Content

Fast Userspace OVS with AF_XDP, OVS CONF 2018

  • 1. Fast Userspace OVS with AF_XDP OVS Conference 2018 William Tu, VMware Inc
  • 2. Outline • AF_XDP Introduction • OVS AF_XDP netdev • Performance Optimizations
  • 3. Linux AF_XDP • A new socket type that receives/sends raw frames with high speed • Use XDP (eXpress Data Path) program to trigger receive • Userspace program manages Rx/Tx ring and Fill/Completion ring. • Zero Copy from DMA buffer to user space memory with driver support • Ingress/egress performance > 20Mpps [1] 3 From “DPDK PMD for AF_XDP”, Zhang Qi [1] The Path to DPDK Speeds for AF XDP, Linux Plumber 2018
  • 4. OVS-AF_XDP Netdev ovs-vswitchd Goal • Use AF_XDP socket as a fast channel to usersapce OVS datapath, dpif-netdev • Flow processing happens in userspace 4 Network Stacks Hardware User space Driver + XDP Userspace DatapathAF_XDP socket Kernel high speed channel
  • 5. OVS-AF_XDP Architecture 5 Existing • netdev: abstraction layer for network device • dpif: datapath interface • dpif-netdev: userspace implementation of OVS datapath New • Kernel: XDP program and eBPF map • AF_XDP netdev: implementation of afxdp device ovs/Documentation/topics/porting.rst
  • 6. OVS AF_XDP Configuration #./configure # make && maskinstall # make check-afxdp # ovs-vsctladd-br br0 -- set Bridge br0 datapath_type=netdev # ovs-vsctladd-port br0 eth0 -- set int enp2s0 type="afxdp” Based on v3 patch: [ovs-dev] [PATCHv3 RFC 0/3] AF_XDP netdev support for OVS
  • 7. Prototype Evaluation • Sender sends 64Byte, 20Mpps to one port, measure the receiving packet rate at the other port • Measure single flow, single core performance with Linux kernel 4.19-rc3 and OVS master • Enable AF_XDP Zero Copy mode • Performance goal: 20Mpps rxdop 16-core Intel Xeon E5 2620 v3 2.4GHz 32GB memory DPDK packet generator Netronome NFP-4000 + AF_XDP Userspace Datapath br0 ingress Egress enp2s0 7 20Mpps sender Intel XL710 40GbE
  • 8. Budget your packet like Budget your money Time Budget To achieve 20Mpps • Budget per packet: 50ns • 2.4GHz CPU: 120 cycles per packet Fact [1] • Cache misses: 32ns, x86 LOCK prefix: 8.25ns • System call with/wo SELinux auditing: 75ns / 42ns Batch of 32 packets • Budget per batch: 50ns x 32 = 1.5us [1] Improving Linux networking performance, LWN, https://lwn.net/Articles/629155/, Jesper Brouer
  • 9. Optimization 1/5 • OVS pmd (Poll-Mode Driver) netdev for rx/tx • Before: call poll() syscall and wait for new I/O • After: dedicated thread to busy polling the Rx ring • Effect: avoid system call overhead 9 +const struct netdev_class netdev_afxdp_class = { + NETDEV_LINUX_CLASS_COMMON, + .type= "afxdp", + .is_pmd = true, .construct = netdev_linux_construct, .get_stats = netdev_internal_get_stats,
  • 10. Optimization 2/5 • Packet metadata pre-allocation • Before: allocate md when receives packets • After: pre-allocate md and initialize it • Effect: • Reduce number of per-packet operations • Reduce cache misses 10 Multiple 2KB umem chunk memory region storing packet data Packet metadata in continuous memory region (struct dp_packet) One-to-one maps to AF_XDP umem
  • 11. Optimizations 3-5 • Packet data memory pool for AF_XDP • Fast data structure to GET and PUT free memory chunk • Effect: Reduce cache misses • Dedicated packet data pool per-device queue • Effect: Consume more memory but avoid mutex lock • Batching sendmsg system call • Effect: Reduce system call rate 11 Reference: Bringing the Power of eBPF to Open vSwitch, Linux Plumber 2018
  • 13. OVS AF_XDP RX drop # ovs-ofctl add-flow br0 "in_port=enp2s0, actions=drop" # ovs-appctl pmd-stats-show OVS AF_XDP br0 enp2s0 DROP
  • 14. pmd-stats-show (rxdrop) pmd thread numa_id 0 core_id 11: packets received: 2069687732 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 2069687636 smc hits: 0 megaflow hits: 95 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 1 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 4196235931 (1.60%) processing cycles: 258609877383 (98.40%) avg cycles per packet: 126.98 (262806113314/2069687732) avg processing cycles per packet: 124.95 (258609877383/2069687732) 120ns budget for 20Mpps
  • 15. Perf record -p `pidof ovs-vswitchd` sleep 10 26.91% pmd7 ovs-vswitchd [.] netdev_linux_rxq_xsk 26.38% pmd7 ovs-vswitchd [.] dp_netdev_input__ 24.65% pmd7 ovs-vswitchd [.] miniflow_extract 6.87% pmd7 libc-2.23.so [.]__memcmp_sse4_1 3.27% pmd7 ovs-vswitchd [.] umem_elem_push 3.06% pmd7 ovs-vswitchd [.] odp_execute_actions 2.03% pmd7 ovs-vswitchd [.] umem_elem_pop top PID USER PR NI VIRT RES SHR S %CPU%MEM TIME+COMMAND 16root 20 0 0 0 0R 100.0 0.0 75:16.85ksoftirqd/1 21088root 20 0 451400 52656 4968S 100.0 0.2 6:58.70ovs-vswitchd Mempool overhead
  • 17. pmd-stats-show (l2fwd) pmd thread numa_id 0 core_id 11: packets received: 868900288 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 868900164 smc hits: 0 megaflow hits: 122 avg. subtable lookups per megaflow hit: 1.00 miss with success upcall: 2 miss with failed upcall: 0 avg. packets per output batch: 30.57 idle cycles: 3344425951 (2.09%) processing cycles: 157004675952 (97.91%) avg cycles per packet: 184.54 (160349101903/868900288) avg processing cycles per packet: 180.69 (157004675952/868900288) Extra ~55 cycles for send
  • 18. Perf record -p `pidof ovs-vswitchd` sleep 10 25.92% pmd7 ovs-vswitchd [.]netdev_linux_rxq_xsk 17.75% pmd7 ovs-vswitchd [.]dp_netdev_input__ 16.55% pmd7 ovs-vswitchd [.]netdev_linux_send 16.10% pmd7 ovs-vswitchd [.]miniflow_extract 4.78% pmd7 libc-2.23.so [.]__memcmp_sse4_1 3.67% pmd7 ovs-vswitchd [.]dp_execute_cb 2.86% pmd7 ovs-vswitchd [.]__umem_elem_push 2.46% pmd7 ovs-vswitchd [.]__umem_elem_pop 1.96% pmd7 ovs-vswitchd [.]non_atomic_ullong_add 1.69% pmd7 ovs-vswitchd [.]dp_netdev_pmd_flush _output_on_port TOP results are similar to rxdrop Mempool overhead
  • 19. # ./configure --with-dpdk= # ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:vhost-user-1" # ovs-ofctl add-flow br0 "in_port=vhost-user-1, actions=output:enp2s0" AF_XDP PVP Performance • QEMU 3.0.0 • VM Ubuntu 18.04 • DPDK stable 17.11.4 • OVS-DPDK vhostuserclient port • options:dq-zero-copy=true • options:n_txq_desc=128 OVS AF_XDP br0 QEMU + vhost-user VM XDP redirect enp2s0 virtio
  • 20. PVP CPU utilization PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+COMMAND 16root 20 0 0 0 0R 100.0 0.0 88:26.26 ksoftirqd/1 21510root 20 0 9807168 53724 5668S100.0 0.2 5:58.38 ovs-vswitchd 21662root 20 0 4894752 30576 12252S 100.0 0.1 5:21.78qemu-system-x86 21878root 20 0 41940 3832 3096R 6.2 0.0 0:00.01top
  • 21. pmd-stats-show (PVP) pmd thread numa_id 0 core_id 11: packets received: 205680121 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 205680121 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 31.01 idle cycles: 0 (0.00%) processing cycles: 74238999024 (100.00%) avg cycles per packet: 360.94 (74238999024/205680121) avg processing cycles per packet: 360.94 (74238999024/205680121)
  • 22. AF_XDP PVP Performance Evaluation • ./perf record -p `pidof ovs-vswitchd` sleep 10 15.88% pmd28 ovs-vswitchd [.]rte_vhost_dequeue_burst 14.51% pmd28 ovs-vswitchd [.]rte_vhost_enqueue_burst 10.41% pmd28 ovs-vswitchd [.]dp_netdev_input__ 8.31% pmd28 ovs-vswitchd [.]miniflow_extract 7.65% pmd28 ovs-vswitchd [.]netdev_linux_rxq_xsk 5.59% pmd28 ovs-vswitchd [.]netdev_linux_send 4.20% pmd28 ovs-vswitchd [.]dpdk_do_tx_copy 3.96% pmd28 libc-2.23.so [.]__memcmp_sse4_1 3.94% pmd28 libc-2.23.so [.]__memcpy_avx_unaligned 2.45% pmd28 ovs-vswitchd [.]free_dpdk_buf 2.43% pmd28 ovs-vswitchd [.]__netdev_dpdk_vhost_send 2.14% pmd28 ovs-vswitchd [.]miniflow_hash_5tuple 1.89% pmd28 ovs-vswitchd [.]dp_execute_cb 1.82% pmd28 ovs-vswitchd [.]netdev_dpdk_vhost_rxq_recv
  • 23. Performance Result OVS AF_XDP PPS CPU RX Drop 19Mpps 200% L2fwd [2] 14Mpps 200% PVP [3] 3.3Mpps 300% [1] Intel® Open Network Platform Release 2.1 Performance Test Report [2] Demo rxdrop/l2fwd: https://www.youtube.com/watch?v=VGMmCZ6vA0s [3] Demo PVP: https://www.youtube.com/watch?v=WevLbHf32UY OVS DPDK [1] PPS CPU RX Drop NA NA l3fwd 13Mpps 100% PVP 7.4Mpps 200%
  • 24. Conclusion 1/2 • AF_XDP is a high-speed Linux socket type • We add a new netdev type based on AF_XDP • Re-use the userspace datapath used by OVS-DPDK Performance • Pre-allocate and pre-init as much as possible • Batching does not reduce # of per-packet operations • Batching + cache-aware data structure amortizes the cache misses
  • 25. Conclusion 2/2 • Need high packet rate but can’t deploy DPDK? Use AF_XDP! • Still slower than OVS-DPDK [1], more optimizations are coming [2] Comparison with OVS-DPDK • Better integration with Linux kernel and management tool • Selectively use kernel’s feature, no re-injection needed • Do not require dedicated device or CPU [1] The eXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel [2] The Path to DPDK Speeds for AF XDP, Linux Plumber 2018
  • 27. ./perf kvm stat record -p 21662 sleep 10 Analyze eventsforall VMs,all VCPUs: VM-EXIT Samples Samples% Time% MinTime Max Time Avg time HLT 298071 95.56% 99.91% 0.43us 511955.09us 32.95us (+- 19.18% ) EPT_MISCONFIG 10366 3.32% 0.05% 0.39us 12.35us 0.47us ( +- 0.71% ) EXTERNAL_INTERRUPT 2462 0.79% 0.01% 0.33us 21.20us 0.50us ( +- 3.21% ) MSR_WRITE 761 0.24% 0.01% 0.40us 12.74us 1.19us ( +- 3.51% ) IO_INSTRUCTION 185 0.06% 0.02% 1.98us 35.96us 8.30us ( +- 4.97% ) PREEMPTION_TIMER 62 0.02% 0.00% 0.52us 2.77us 1.04us ( +- 4.34% ) MSR_READ 19 0.01% 0.00% 0.79us 2.49us 1.37us ( +- 8.71% ) EXCEPTION_NMI 1 0.00% 0.00% 0.58us 0.58us 0.58us ( +- 0.00% ) Total Samples:311927,Total eventshandledtime:9831483.62us.
  • 28. root@ovs-afxdp:~/ovs# ovs-vsctl show 2ade349f-2bce-4118-b633-dce5ac51d994 Bridge "br0" Port "br0" Interface"br0" type: internal Port "vhost-user-1" Interface"vhost-user-1" type: dpdkvhostuser Port "enp2s0" Interface"enp2s0" type: afxdp
  • 29. QEMU qemu-system-x86_64-hda ubuntu1810.qcow -m4096 -cpuhost,+x2apic -enable-kvm -chardevsocket,id=char1,path=/tmp/vhost,server -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1, mq=on,vectors=10,mrg_rxbuf=on,rx_queue_size=1024 -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numanode,memdev=mem -mem-prealloc-smp 2

Editor's Notes

  1. Previous approach introducing BPF_ACTION Tc is a Kernel packet queuing subsystem, provide QoS …. Ovs-vswitch creates map, load ebpf programs, etc
  2. Compare Linux kernel 4.9-rc3
  3. ovs-ofctl add-flow br0 "in_port=enp2s0\ actions=set_field:14->in_port,set_field:a0:36:9f:33:b1:40->dl_src,enp2s0"
  4. 10455 2018-12-04T17:34:15.952Z|00146|dpdk|INFO|VHOST_CONFIG: dequeue zero copy is enabled
  5. 16 root 20 0 0 0 0 R 100.0 0.0 10:17.12 ksoftirqd/1 19525 root 20 0 9807164 54104 5800 S 106.7 0.2 2:07.59 ovs-vswitchd 19627 root 20 0 4886528 30336 12260 S 106.7 0.1 0:59.59 qemu-system-x86