Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Spy hard czyli regexpem po pakietach
wyzwania związane z implementacją 100G DPI na platformie x86
EN: Spy hard, challanges of 100G deep packet inspection on x86 platform
Paweł Małachowski, 2017.03.07
^Why?$
Deep packet inspection (DPI)
no DPI
• packet header lookup
• route based on destination (unless
PBR)
• classify with static rules or state data
• cheap
DPI
• packet header and payload lookup
• may route based on content (e.g.
uplinks for priority and `bulky’ traffic)
• classify with static rules, state data,
multiple patterns and custom logic
• expensive?
3
100+ Gbit DPI – why?
• end customers typically < 10G uplinks
– L7 filtering (WAF, IPS etc.) requested by enterprises
– multiple IDS, IPS, NGFW, UTM and WAFs on the market
– can be handled with open source tools
• 100G+ speeds: ISP/Telco/large DCs
– do not want to interfere with traffic
• unless hit by huge DDoS attack
• or kindly asked by local régime
4
Mirai botnet attacks – examples
• attack_tcp_stomp
– establish legal TCP connection, then flood it
– not to confuse with STOMP protocol
• attack_udp_dns
– DNS „water torture”, FQDN with random host
• attack_app_http
– HTTP request flood
• attack_app_cfnull
– HTTP POST junk
5
source: https://github.com/rosgos/Mirai-Source-Code
DPI may help
easy :)
Large DDoS attacks in 2016 – examples
1. 150M pps (650Gbps) of TCP SYN packets (mixed size), spoofed IPs
2. 1.75M rps peak of HTTP requests (~121B/r) from ~52k src IPs
3. 220k rps (360Gbps) of large HTTP requests from ~128k src IPs
4. ~1Tbps of recursive „water torture” DNS queries
sources:
• https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/
• https://www.incapsula.com/blog/650gbps-ddos-attack-leet-botnet.html
• http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/
6
DPI may help
100Gbit/s sizing
• ~148.8 Mpps in small frames, but no payload to scan
• ~8.127 Mpps in 1514B frames
• ~12.19 GB/s of IP payload
• given 16 core machine, our target is:
– ~0.5M – 2M lookups /s per core
– up to ~762 MB/s per core
– note: not all packets and not entire payloads have to be scanned
7
Payload lookup – position
• fixed
– e.g. NTP
• network protocol aware
– e.g. DNS
• application aware
– e.g. HTTP
• anywhere in the packet
– bad idea
$ strings /usr/bin/* | grep -c sex
93
8
Protocol design rant
"string: variable-length byte field, encoded in UTF-8, terminated by 0x00”
source: https://developer.valvesoftware.com/wiki/Server_queries
9
Software payload lookup – approaches
Method Example
fixed position literal matching (sequence) <you name it>
fixed position literal matching (trie) DPDK ACL
computed position literal matching tc u32
application aware classifier nDPI, netfilter l7-filter
application level gateway (ALG) netfilter nf_conntrack_*
programmable data path netfilter xt_bpf, nftables, XDP+eBPF
embedded scripting language NPFLua, pflua
hybrid with state machines Hyperscan, Tempesta FW
regexp engine Bro, Snort, Suricata
10
^[Mm]atchings+regexps$
Basic regexp
(w+ )+PLNOG1[68]$
tool: https://www.debuggex.com/
12
Finite–state machine
• abstract machine
• has states and transitions
• some states are "accept states"
• input updates machine state
• accepts and rejects input sequence
of symbols
sources:
• https://en.wikipedia.org/wiki/State_diagram
• https://en.wikipedia.org/wiki/Deterministic_finite_automaton
example: accepts binary strings with even number of zeroes
13
DFA vs. NFA
• Deterministic finite automaton (DFA)
– each of its transitions is uniquely determined by its source state and input
symbol
– reading an input symbol is required for each state transition.
• Nondeterministic finite automaton (NFA) otherwise
• NFA can be converted to DFA
– DFA is efficient to execute, but may grow
– NFA is easier to construct, but may be slower
tools:
• http://hackingoff.com/compilers/regular-expression-to-nfa-dfa
• http://ivanzuzak.info/noam/webapps/fsm_simulator/
14
PCRE vs. DFA and NFA
• PCRE (Perl Compatible Regular Expression) engine is powerful
• typical PCRE engine comes as NFA + backtracking
• DFA matches regular language (pure) thus can be used to match only
some of PCREs
• less features, faster engines!
– Hyperscan, https://01.org/hyperscan
– Perl Incompatible Regular Expressions, https://github.com/yandex/pire
15
Features considered harmful
• back-tracking (trial and error)
• back references 1
• lookarounds (lookahead, lookbehind) (?<!a)b
• conditional regexps (?(?=regex)then|else)
16
see also: http://www.regular-expressions.info
Case: catastrophic backtracking
• 34 min Stack Overflow outage in 2016
• s+$
• „malformed post contained roughly 20,000 consecutive
characters of whitespace on a comment line”
• O(n2)
• in other cases it may be 2n
sources:
• http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016
• http://www.regular-expressions.info/catastrophic.html
17
>>> sum(range(0,20001))
200010000
Sources
1. „Finite State Machine Parsing for Internet Protocols: Faster Than You Think”,
http://www.cs.dartmouth.edu/~pete/pubs/LangSec-2014-fsm-parsers.pdf
2. „100G Intrusion Detection”, http://go.lbl.gov/100g
3. „DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching”,
http://domino.watson.ibm.com/library/cyberdig.nsf/papers/F38C0227DBF5C7E78525758C005BD05C/$File/rc24645.pdf
4. „Fast Regular Expression Matching Using Dual Glushkov NFA”,
https://www-alg.ist.hokudai.ac.jp/~thomas/TCSTR/tcstr_14_73/tcstr_14_73.pdf
5. PIRE discussion: https://news.ycombinator.com/item?id=10209775
18
^Hyperscan$
What is Hyperscan?
• „high-performance multiple regex matching library”
• C (run-time, API) and C++ (compiler), BSD licensed
• runs on Intel CPUs only, uses:
– SIMD (Single Instruction, Multiple Data)
– BMI (Bit Manipulation Instruction Sets)
• „typically used in a DPI library stack”
20
Hyperscan history
• developed by Sensory Networks
• 2003-2008 hardware prototypes (GPGPU, FPGA), NodalCore C-series accelerators
• 2009 software-based Hyperscan created (note: hardware approach dead end)
• 2009-2015 evolution (commercial)
• 2015 acquired by Intel, released on BSD license
• 2017 v4.4 release
sources:
• https://01.org/hyperscan
• https://lists.01.org/pipermail/hyperscan/2017-January/000078.html
• "Hyperscan In SURICATA: STATE OF THE UNION"
21
Hyperscan usage examples (2016 EoY)
• unknown commercial IDS/IPS and NGFW products
• Snort integration (IDS/IPS signatures)
• Suricata integration (IDS/IPS signatures)
• RSPAMD integration (e-mail scanning)
• redGuardian integration (DDoS patterns)
22
How it works – regexp database
# pattern flags min offset max offset min length
0 ^foo
1 bar$
2 w+bazs{2} singlematch
3 d+ leftmost 5
4 loremnipsum dotall 10
n ^(all|your|base) caseless 15
23
database is a group of regexps and their settings, thousands of regexps possible
How it works – independent scanning contexts
24
regex
database
compiled
earlierinput core 0
matcher, local data (scratch)
input core n
matcher, local data (scratch)
How it works
• may return multiple matches
• by default, returns only end offset
• not greedy
• regexp expression parsed and split into:
– literals (fixed strings)
– DFA engines
– NFA engines
– custom engines (prefix, suffix, infix, outfix)
– not Aho-Corasick
• scanning mode – block, streaming, vectored
25
PCMPEQB (compare packed bytes in
xmm2/m128 and xmm1 for equality)
POPCNT (return the Count of Number
of Bits Set to 1)
DPDK ACL vs. Hyperscan regexp
DPDK ACL
• compiled to „ACL”
• fixed position pattern
• looks up all fields in the packet
• looks up multiple packets at once in
one ACL (up to 16 categories)
• predictable speed
• returns one match (highest priority) per
category
regexp as ACL1
• compiled to „DB”
• dynamic position pattern
• skip not relevant fields
• looks up one packet in DB (multiple
regexps at once)
• speed depends on input
• may return multiple matches
26
1 speculation, v4.5 is not released yet
Sources (Hyperscan)
1. http://01org.github.io/hyperscan/
2. http://www.slideshare.net/harryvanhaaren/hyperscan-mohammad-abdul-awal
3. „HYPERSCAN PERFORMANCE BENCHMARK ON INTEL XEON PROCESSORS, Delivering 160 Gbps DPI Throughput on the Intel
Xeon Processor E5-2600 Series”,
https://networkbuilders.intel.com/docs/1645-Hyperscan-Performance-Benchmark-on-Intel-Xeon-Processors.pdf
4. „HOW WE MATCH REGULAR EXPRESSIONS”, https://01.org/node/3777
5. „Hyperscan Glossary, a few philosophical points”, https://lists.01.org/pipermail/hyperscan/2016-September/000035.html
6. „Software-based Acceleration of Deep Packet Inspection on Intel Architecture”,
https://openisf.files.wordpress.com/2015/11/oisf-keynote-2015-geoff-langdale.pdf
7. "Hyperscan In SURICATA: STATE OF THE UNION",
http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_GeoffLangdale.pdf
8. „Hyperscan in Rspamd”, http://www.slideshare.net/VsevolodStakhov/rspamdhyperscan
9. https://www.reddit.com/r/cpp/comments/3picdx/hyperscan_highperformance_multiple_regex_matching/
27
redGuardian packet pipeline (simplified)
DPDK RX
customer? policingregexppre filtering
state
tables,
protocol
prefilters
DPDK
ACL1
DPDK TX
DPDK
ACLn
28
Basic benchmark
• Xeon E3-1231 v3 @ 3.40GHz, turbo mode disabled, 10G ixgbe port, 1 core
• two cache lines prefetched
• results in Mpps
29
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
pass
end
regex baz "^foobar"
network net.1 acl
regex drop baz pass udp
pass
end
plnog_udp_acl rx_median 12.912; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
plnog_udp_regexp rx_median 9.832; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
Basic benchmark
// ETH() / IP() / UDP() / ('x'*64 + 'foobar')
regex baz "^(.{8}){0,8}foobar"
network net.1 acl
regex drop baz pass udp
pass
end
matching
plnog_udp_acl_many rx_median 5.846; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
plnog_udp_regexp_many rx_median 2.921; tx_median 0.000; gen_rx 0.000; gen_tx 9.191
not matching
plnog_udp_acl_many rx_median 4.518; tx_median 4.518; gen_rx 4.517; gen_tx 9.124
plnog_udp_regexp_many rx_median 5.352; tx_median 5.352; gen_rx 5.353; gen_tx 9.124
30
network net.1 acl
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 8
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 16
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 24
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 32
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 40
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 48
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 56
drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 64
pass
end
Summary
• header and payload are the same
• regexp engines can be fast
• careful benchmarking required
• x86 platform can compete with „hardware appliances”
31
^Backups+slides$
Hardware: CPU + FPGA hybrid?
• CPU + FPGA hybrid
– Atom + Altera FPGA (2010)
– Intel bought Altera (2015)
– Intel Stratix® 10 FPGA has built in ARM Cortex-A53
– Xeon Broadwell-EP + FPGA rumours (2016)
• Xeon v5 with AVX-512
• Knights Landing Xeon PhiTM
– AVX-512
– 256 threads
33
sources:
• https://www.nextplatform.com/2016/03/14/intel-marrying-fpga-beefy-broadwell-open-compute-future/
• https://newsroom.intel.com/wp-content/uploads/sites/11/2016/01/ProductBrief-IntelAtomProcessor_E600C_series.pdf
• https://www.nextplatform.com/2016/11/15/intel-sets-skylake-xeon-hpc-knights-mill-xeon-phi-ai/
Hardware: 100+ G NICs
Mellanox ConnectX®-6
(not available yet)
Silicom
PE3100G2DQIRL
QLogic FastLinQ
QL45000
Netronome Agilio LX
ports 2 × 200G 2 × 100G 1 × 100G 1 × 100G
bus lanes 2 × 16, PCIe 3 or 4
(can use 2 slots)
2 × 8 16 2 × 8
chipset ConnectX-6 Intel® FM10420 cLOM8514 NFP-6480
host CPU bypass ASAP2 FlexPipeTM programmable data
path offload (C, P4)
driver mlx6? fm10k qede nfp
sources:
• http://www.mellanox.com/page/products_dyn?product_family=266&mtag=connectx_6_en_card
• http://www.silicom-usa.com/pr/server-adapters/networking-adapters/100-gigabit-ethernet-networking-server-adapters/pe3100g2dqirl-server-adapter/
• http://www.qlogic.com/Resources/Documents/DataSheets/Adapters/DataSheet_QL45611HLCU_IEA.pdf
• https://www.netronome.com/media/redactor_files/PB_Agilio_Lx_1x100GbE.pdf
34
^Q&A.*
https://twitter.com/redguardianeu

More Related Content

PLNOG 18 - Paweł Małachowski - Spy hard czyli regexpem po pakietach

  • 1. Spy hard czyli regexpem po pakietach wyzwania związane z implementacją 100G DPI na platformie x86 EN: Spy hard, challanges of 100G deep packet inspection on x86 platform Paweł Małachowski, 2017.03.07
  • 3. Deep packet inspection (DPI) no DPI • packet header lookup • route based on destination (unless PBR) • classify with static rules or state data • cheap DPI • packet header and payload lookup • may route based on content (e.g. uplinks for priority and `bulky’ traffic) • classify with static rules, state data, multiple patterns and custom logic • expensive? 3
  • 4. 100+ Gbit DPI – why? • end customers typically < 10G uplinks – L7 filtering (WAF, IPS etc.) requested by enterprises – multiple IDS, IPS, NGFW, UTM and WAFs on the market – can be handled with open source tools • 100G+ speeds: ISP/Telco/large DCs – do not want to interfere with traffic • unless hit by huge DDoS attack • or kindly asked by local régime 4
  • 5. Mirai botnet attacks – examples • attack_tcp_stomp – establish legal TCP connection, then flood it – not to confuse with STOMP protocol • attack_udp_dns – DNS „water torture”, FQDN with random host • attack_app_http – HTTP request flood • attack_app_cfnull – HTTP POST junk 5 source: https://github.com/rosgos/Mirai-Source-Code DPI may help easy :)
  • 6. Large DDoS attacks in 2016 – examples 1. 150M pps (650Gbps) of TCP SYN packets (mixed size), spoofed IPs 2. 1.75M rps peak of HTTP requests (~121B/r) from ~52k src IPs 3. 220k rps (360Gbps) of large HTTP requests from ~128k src IPs 4. ~1Tbps of recursive „water torture” DNS queries sources: • https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/ • https://www.incapsula.com/blog/650gbps-ddos-attack-leet-botnet.html • http://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/ 6 DPI may help
  • 7. 100Gbit/s sizing • ~148.8 Mpps in small frames, but no payload to scan • ~8.127 Mpps in 1514B frames • ~12.19 GB/s of IP payload • given 16 core machine, our target is: – ~0.5M – 2M lookups /s per core – up to ~762 MB/s per core – note: not all packets and not entire payloads have to be scanned 7
  • 8. Payload lookup – position • fixed – e.g. NTP • network protocol aware – e.g. DNS • application aware – e.g. HTTP • anywhere in the packet – bad idea $ strings /usr/bin/* | grep -c sex 93 8
  • 9. Protocol design rant "string: variable-length byte field, encoded in UTF-8, terminated by 0x00” source: https://developer.valvesoftware.com/wiki/Server_queries 9
  • 10. Software payload lookup – approaches Method Example fixed position literal matching (sequence) <you name it> fixed position literal matching (trie) DPDK ACL computed position literal matching tc u32 application aware classifier nDPI, netfilter l7-filter application level gateway (ALG) netfilter nf_conntrack_* programmable data path netfilter xt_bpf, nftables, XDP+eBPF embedded scripting language NPFLua, pflua hybrid with state machines Hyperscan, Tempesta FW regexp engine Bro, Snort, Suricata 10
  • 12. Basic regexp (w+ )+PLNOG1[68]$ tool: https://www.debuggex.com/ 12
  • 13. Finite–state machine • abstract machine • has states and transitions • some states are "accept states" • input updates machine state • accepts and rejects input sequence of symbols sources: • https://en.wikipedia.org/wiki/State_diagram • https://en.wikipedia.org/wiki/Deterministic_finite_automaton example: accepts binary strings with even number of zeroes 13
  • 14. DFA vs. NFA • Deterministic finite automaton (DFA) – each of its transitions is uniquely determined by its source state and input symbol – reading an input symbol is required for each state transition. • Nondeterministic finite automaton (NFA) otherwise • NFA can be converted to DFA – DFA is efficient to execute, but may grow – NFA is easier to construct, but may be slower tools: • http://hackingoff.com/compilers/regular-expression-to-nfa-dfa • http://ivanzuzak.info/noam/webapps/fsm_simulator/ 14
  • 15. PCRE vs. DFA and NFA • PCRE (Perl Compatible Regular Expression) engine is powerful • typical PCRE engine comes as NFA + backtracking • DFA matches regular language (pure) thus can be used to match only some of PCREs • less features, faster engines! – Hyperscan, https://01.org/hyperscan – Perl Incompatible Regular Expressions, https://github.com/yandex/pire 15
  • 16. Features considered harmful • back-tracking (trial and error) • back references 1 • lookarounds (lookahead, lookbehind) (?<!a)b • conditional regexps (?(?=regex)then|else) 16 see also: http://www.regular-expressions.info
  • 17. Case: catastrophic backtracking • 34 min Stack Overflow outage in 2016 • s+$ • „malformed post contained roughly 20,000 consecutive characters of whitespace on a comment line” • O(n2) • in other cases it may be 2n sources: • http://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016 • http://www.regular-expressions.info/catastrophic.html 17 >>> sum(range(0,20001)) 200010000
  • 18. Sources 1. „Finite State Machine Parsing for Internet Protocols: Faster Than You Think”, http://www.cs.dartmouth.edu/~pete/pubs/LangSec-2014-fsm-parsers.pdf 2. „100G Intrusion Detection”, http://go.lbl.gov/100g 3. „DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching”, http://domino.watson.ibm.com/library/cyberdig.nsf/papers/F38C0227DBF5C7E78525758C005BD05C/$File/rc24645.pdf 4. „Fast Regular Expression Matching Using Dual Glushkov NFA”, https://www-alg.ist.hokudai.ac.jp/~thomas/TCSTR/tcstr_14_73/tcstr_14_73.pdf 5. PIRE discussion: https://news.ycombinator.com/item?id=10209775 18
  • 20. What is Hyperscan? • „high-performance multiple regex matching library” • C (run-time, API) and C++ (compiler), BSD licensed • runs on Intel CPUs only, uses: – SIMD (Single Instruction, Multiple Data) – BMI (Bit Manipulation Instruction Sets) • „typically used in a DPI library stack” 20
  • 21. Hyperscan history • developed by Sensory Networks • 2003-2008 hardware prototypes (GPGPU, FPGA), NodalCore C-series accelerators • 2009 software-based Hyperscan created (note: hardware approach dead end) • 2009-2015 evolution (commercial) • 2015 acquired by Intel, released on BSD license • 2017 v4.4 release sources: • https://01.org/hyperscan • https://lists.01.org/pipermail/hyperscan/2017-January/000078.html • "Hyperscan In SURICATA: STATE OF THE UNION" 21
  • 22. Hyperscan usage examples (2016 EoY) • unknown commercial IDS/IPS and NGFW products • Snort integration (IDS/IPS signatures) • Suricata integration (IDS/IPS signatures) • RSPAMD integration (e-mail scanning) • redGuardian integration (DDoS patterns) 22
  • 23. How it works – regexp database # pattern flags min offset max offset min length 0 ^foo 1 bar$ 2 w+bazs{2} singlematch 3 d+ leftmost 5 4 loremnipsum dotall 10 n ^(all|your|base) caseless 15 23 database is a group of regexps and their settings, thousands of regexps possible
  • 24. How it works – independent scanning contexts 24 regex database compiled earlierinput core 0 matcher, local data (scratch) input core n matcher, local data (scratch)
  • 25. How it works • may return multiple matches • by default, returns only end offset • not greedy • regexp expression parsed and split into: – literals (fixed strings) – DFA engines – NFA engines – custom engines (prefix, suffix, infix, outfix) – not Aho-Corasick • scanning mode – block, streaming, vectored 25 PCMPEQB (compare packed bytes in xmm2/m128 and xmm1 for equality) POPCNT (return the Count of Number of Bits Set to 1)
  • 26. DPDK ACL vs. Hyperscan regexp DPDK ACL • compiled to „ACL” • fixed position pattern • looks up all fields in the packet • looks up multiple packets at once in one ACL (up to 16 categories) • predictable speed • returns one match (highest priority) per category regexp as ACL1 • compiled to „DB” • dynamic position pattern • skip not relevant fields • looks up one packet in DB (multiple regexps at once) • speed depends on input • may return multiple matches 26 1 speculation, v4.5 is not released yet
  • 27. Sources (Hyperscan) 1. http://01org.github.io/hyperscan/ 2. http://www.slideshare.net/harryvanhaaren/hyperscan-mohammad-abdul-awal 3. „HYPERSCAN PERFORMANCE BENCHMARK ON INTEL XEON PROCESSORS, Delivering 160 Gbps DPI Throughput on the Intel Xeon Processor E5-2600 Series”, https://networkbuilders.intel.com/docs/1645-Hyperscan-Performance-Benchmark-on-Intel-Xeon-Processors.pdf 4. „HOW WE MATCH REGULAR EXPRESSIONS”, https://01.org/node/3777 5. „Hyperscan Glossary, a few philosophical points”, https://lists.01.org/pipermail/hyperscan/2016-September/000035.html 6. „Software-based Acceleration of Deep Packet Inspection on Intel Architecture”, https://openisf.files.wordpress.com/2015/11/oisf-keynote-2015-geoff-langdale.pdf 7. "Hyperscan In SURICATA: STATE OF THE UNION", http://suricon.net/wp-content/uploads/2016/11/SuriCon2016_GeoffLangdale.pdf 8. „Hyperscan in Rspamd”, http://www.slideshare.net/VsevolodStakhov/rspamdhyperscan 9. https://www.reddit.com/r/cpp/comments/3picdx/hyperscan_highperformance_multiple_regex_matching/ 27
  • 28. redGuardian packet pipeline (simplified) DPDK RX customer? policingregexppre filtering state tables, protocol prefilters DPDK ACL1 DPDK TX DPDK ACLn 28
  • 29. Basic benchmark • Xeon E3-1231 v3 @ 3.40GHz, turbo mode disabled, 10G ixgbe port, 1 core • two cache lines prefetched • results in Mpps 29 network net.1 acl drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0 pass end regex baz "^foobar" network net.1 acl regex drop baz pass udp pass end plnog_udp_acl rx_median 12.912; tx_median 0.000; gen_rx 0.000; gen_tx 14.881 plnog_udp_regexp rx_median 9.832; tx_median 0.000; gen_rx 0.000; gen_tx 14.881
  • 30. Basic benchmark // ETH() / IP() / UDP() / ('x'*64 + 'foobar') regex baz "^(.{8}){0,8}foobar" network net.1 acl regex drop baz pass udp pass end matching plnog_udp_acl_many rx_median 5.846; tx_median 0.000; gen_rx 0.000; gen_tx 9.191 plnog_udp_regexp_many rx_median 2.921; tx_median 0.000; gen_rx 0.000; gen_tx 9.191 not matching plnog_udp_acl_many rx_median 4.518; tx_median 4.518; gen_rx 4.517; gen_tx 9.124 plnog_udp_regexp_many rx_median 5.352; tx_median 5.352; gen_rx 5.353; gen_tx 9.124 30 network net.1 acl drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 0 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 8 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 16 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 24 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 32 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 40 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 48 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 56 drop udp data u64 0x666f6f6261720000/0xffffffffffff0000 at 64 pass end
  • 31. Summary • header and payload are the same • regexp engines can be fast • careful benchmarking required • x86 platform can compete with „hardware appliances” 31
  • 33. Hardware: CPU + FPGA hybrid? • CPU + FPGA hybrid – Atom + Altera FPGA (2010) – Intel bought Altera (2015) – Intel Stratix® 10 FPGA has built in ARM Cortex-A53 – Xeon Broadwell-EP + FPGA rumours (2016) • Xeon v5 with AVX-512 • Knights Landing Xeon PhiTM – AVX-512 – 256 threads 33 sources: • https://www.nextplatform.com/2016/03/14/intel-marrying-fpga-beefy-broadwell-open-compute-future/ • https://newsroom.intel.com/wp-content/uploads/sites/11/2016/01/ProductBrief-IntelAtomProcessor_E600C_series.pdf • https://www.nextplatform.com/2016/11/15/intel-sets-skylake-xeon-hpc-knights-mill-xeon-phi-ai/
  • 34. Hardware: 100+ G NICs Mellanox ConnectX®-6 (not available yet) Silicom PE3100G2DQIRL QLogic FastLinQ QL45000 Netronome Agilio LX ports 2 × 200G 2 × 100G 1 × 100G 1 × 100G bus lanes 2 × 16, PCIe 3 or 4 (can use 2 slots) 2 × 8 16 2 × 8 chipset ConnectX-6 Intel® FM10420 cLOM8514 NFP-6480 host CPU bypass ASAP2 FlexPipeTM programmable data path offload (C, P4) driver mlx6? fm10k qede nfp sources: • http://www.mellanox.com/page/products_dyn?product_family=266&mtag=connectx_6_en_card • http://www.silicom-usa.com/pr/server-adapters/networking-adapters/100-gigabit-ethernet-networking-server-adapters/pe3100g2dqirl-server-adapter/ • http://www.qlogic.com/Resources/Documents/DataSheets/Adapters/DataSheet_QL45611HLCU_IEA.pdf • https://www.netronome.com/media/redactor_files/PB_Agilio_Lx_1x100GbE.pdf 34