100G Networking Technology Overview - Slides - Toronto (August 2016)
100G Networking Technology Overview - Slides - Toronto (August 2016)
100G Networking Technology Overview - Slides - Toronto (August 2016)
1
Why 100G now?
2
100G Networking Technologies
• 10 x 10G Link old standard CFP C??. Expensive. Lots of cabling. Has been in use for awhile for
specialized uses.
• New 4 x 28G link standards "QSFP28". Brings down price to ranges of SFP and QSFP.
Compact and designed to replace 10G and 40G networking.
• Infiniband (EDR)
o Standard pushed by Mellanox.
o Transitioning to lower Infiniband speeds through switches.
o Most mature technology to date. Switches and NICs are available.
• Ethernet
o Early deployment in 2015.
o But most widely used chipset for switches recalled to be respun.
o NICs are under development. Mature one is the Mellanox EDR adapter that can run in 100G
Ethernet mode.
o Maybe ready mid 2016.
• Omnipath (Intel)
• Redesigned serialization. No legacy issues with Infiniband. More nodes. Designed for Exascale
vision. Immature. Vendor claims production readiness but what is available has the character of 3
an alpha release with limited functionality. Estimate that this is going to be more mature at the
end of 2016.
CFP vs QSFP28: 100G Connectors
4
Splitting 100G Ethernet to 25G and 50G
• 50G (2x25) and 25G (1x25G) speeds are available which doubles or
quadruples the port density of switches.
• Some switches can handle 32 links of 100G, 64 of 50G and 128 of 25G.
6
100G Switches
Ports Status Name
Mellanox EDR x 36 Released. Stable. 7700 Series
Infiniband
7
Over whelmed by data
8
No time to process what you get?
• NICs have the problem of how to 1 us = 1 microsecond
get the data to the application
• Flow Steering in the kernel allows = 1/1000000 seconds
the distribution of packets to
multiple processors so that the 1 ns = 1 nanosecond
processing scales. But there are not = 1/1000 us
enough processing cores for 100G.
• NICs have extensive logic to offload Network send or receive syscall:
operations and distribute the load.
10-20 us
• One NIC supports multiple servers
of diverse architectures
simultaneously. Main memory access:
• Support for virtualization. SR-IOV ~100 ns
etc.
• Switch like logic on the chip.
9
Available 100G NICs
• Mellanox ConnectX4 Adapter
• 100G Ethernet
• EDR Infiniband
• Sophisticated offloads.
• Multi-Host
• Evolution of ConnectX3
10
Application Interfaces and 100G
1.Socket API (Posix)
Run existing apps. Large code base. Large set of developers that know how to use the programming interface
2.Block level File I/O
Another POSIX API. Remote filesystems like NFS may use NFSoRDMA etc
3.RDMA API
1.One sided transfers
2.Receive/SendQ in user space
3.Talk directly to the hardware.
4.OFI
Fabric API designed for application interaction not with the network but the “Fabric”
5.DPDK
Low level access to NIC from user space.
11
Using the Socket APIs with 100G
13
OFI (aka libfabric)
14
Software Support for 100G technology
15
Test Hardware
o Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
• Adapters
o Intel Omni-Path Host Fabric Interface Adapter
• Driver Version: 0.11-162
• Opa Version: 10.1.1.0.9
o Mellanox ConnectX-4 VPI Adapter
• Driver Version: Stock RHEL 7.2
• Firmware Version: 12.16.1020
• Switches
o Intel 100 OPA Edge 48p
• Firmware Version: 10.1.0.0.133
o Mellanox SB7700
• Firmware Version: 3.4.3050
16
Latency Tests via RDMA APIs(ib_send_lat)
11.00
Typical Latency (usec)
8.25
5.50 EDR
Omnipath
100GbE
2.75 10GbE
1GbE
0.00
2 4 8 16 32 64 128 256 512 1024 2048 4096
Msg Size (bytes)
17
Bandwidth Tests using RDMA APIs (ib_send_bw)
12000.00
BW average (MB/sec)
9000.00
6000.00 EDR
Omnipath
100GbE
3000.00 10GbE
1GbE
0.00
2 4 8 16 32 64 128 256 512 1024 2048 4096
Msg Size (bytes)
3
Latency (us)
0
EDR Omnipath 100GbE 10GbE 1GbE
19
RDMA vs. Posix Sockets (30 byte payload)
18
13.5
Latency (us)
Socket
9 RDMA
4.5
0
EDR Omnipath 100GbE 10GbE 1GbE
20
RDMA vs. Posix Sockets (1000 byte Payload)
30
22.5
Latency (us)
Sockets
15 RDMA
7.5
0
EDR Omnipath 100GbE 10GbE 1GbE 21
Further Reading Material
http://presentations.interop.com/events/las-vegas/2015/open-to-all---keynote-
presentations/download/2709
https://en.wikipedia.org/wiki/100_Gigabit_Ethernet
http://www.ieee802.org/3/index.html
22
Memory Performance issues with 100G
23
Looking Ahead
• 100G is maturing.
• 200G available in 2017/2018.
• Terabit links by 2022.
• Software needs to mature. Especially the
OS network stack to handle these speeds.
• Issues
o Memory throughput
o Proper APIs
o Deeper integration of cpu/memory/io
24
Q&A
• Issues
• Getting involved
• How to scale the OS and software
• What impact will this speed have on software
• Contact information
cl@linux.com
25