The document discusses optimizing the performance of UDP and TCP network traffic on a Linux server. It describes setting up affinity between CPU cores and network interrupts to balance load, disabling an interrupt balancer service, and using advanced network card filtering features to block unwanted traffic. These changes improved CPU utilization from imbalanced to evenly distributed across cores, reducing load and further optimizing the server to handle over 14.88 million packets per second of UDP traffic.
13. Поприветствуем Flow Director
The flow director filters identify specific flows or
sets of flows and routes them to specific queues.
The flow director filters are programmed by
FDIRCTRL and all other FDIR registers. The 82599
shares the Rx packet buffer for the storage of
these filters.
14. Flow Director умеет
• Perfect match filters — The hardware checks a
match between the masked fields of the received
packets and the programmed filters. Masked
fields should be programmed as zeros in the filter
context. The 82599 support up to 8 K - 2 perfect
match filters.
• Signature filters — The hardware checks a match
between a hash-based signature of the masked
fields of the received packet. The 82599 supports
up to 32 K - 2 signature filters.
15. Perfect Filter умеют
(instanteneously)
• VLAN
• proto
• src_ip/mask
• src_port
• dst_ip/mask
• dst_port
• Flexible 2-byte tuple anywhere in the first 64
bytes of the packet (FRAME!)
16. Not so perfect
(Выкидыш FlowDirector)
• Потребляют память RX buffer (256/512)
• Не умеют ЕСЛИ-ТО
• Masks are GLOBAL for signature filters
• 64b это до обидного мало
• Поддерживается ethtool (perfect, buggy) и
PF_RING(signature only)
Но и на том Intel SPASIBO!
17. Flex Filters
(Выкидыши реализации RSS)
• 128b of the packet (FRAME!)
• 6 filters
• Кратковременно отключаются при
доступе(R|W)
• Нет публично доступного userland
конфигуратора.
18. Как быть с TCP SYN?
• SYN без Seq Number
• SYN без MSS
• … и прочие ляпы где можно вывести
сигнатуру до первых 128b
19. Как быть с Perfect TCP SYN ?
Больно умереть на 400kPPS…
29. Спасибо Эрик!
commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046
Author: Eric Dumazet <edumazet@google.com>
Date: Thu Jul 19 07:34:03 2012 +0000
ipv4: tcp: remove per net tcp_sock
tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.
This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.
To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)