OmniXtend is an open source cache coherence protocol that runs over Ethernet. It allows for a unified memory fabric that scales beyond what is possible with traditional CPU-centric architectures. OmniXtend implements the TileLink cache coherence protocol over Ethernet frames, eliminating the need to rewrite software and enabling new data-centric architectures by decoupling compute from memory. The CHIPS Alliance is developing OmniXtend as an open standard with the goal of driving more collaboration in the hardware development community.
4. What Is CHIPS Alliance?
4
▸ Organization which develops and hosts:
– Open source hardware code (IP cores) -> think open source CPUs
– Open source software design tools -> fastest growing component
– Interconnect IP (phy and logical protocols) -> attracts most interest
▸ A barrier free environment for collaboration:
– Standards organization framework for collaboration and development
– Legal framework – Apache v2 license
▸ Shared resources ($ and time) which lower the cost
of hardware development:
– For IP and tools
8. Why OmniXtend?
▸ Processor acts as a control point within
the datacenter, limiting customer flexibility
▸ Memory blocked behind processor
– Limited number of DIMMs per socket
– Limited CPU memory address space
– Limited access to fast memory bus
▸ Analytics and machine learning driven
by accelerators, limited access to the
coherency bus
– Access to future fast-I/O storage attach
points may also be constrained
Main Memory
CPU
L1 $
GPU
FPGA
ML
Accelerator
.
.
.
I/O
8
9. OmniXtend vs. Other Memory-centric Concepts
Memory fabric may mean different things to different people
Context switch cost comparable
to memory access latency
Require software/kernel support
and/or rewriting of applications
‣ Page fault trap leading to RDMA request (incurs context switch
and SW overhead)
‣ Global address translation management in SW, leading to LD/ST
across global memory fabric
Fabric
CPU
Cache
DRAM
NIC
DMA RDMA
SW
Fabric
CPU
Cache
Tables
LD/ST
SW
This is OmniXtend.
No rewriting of software,
scalable like the algorithm
‣ Coherence protocol scaled out, global
page management and no context
switching
‣ No impact on application system call
interface (changes to boot needed)
Fabric
CPU
Cache
Cache
Coherence
Protocol
PT
10. OmniXtend — Open Unified Memory Fabric
10
Memory is the center of the architecture
Network Fabric
AI
Accelerator FPGA
GPU
Memory Fabric
Other
CPU
RISC-V
11. OmniXtend Details
11
▸ OmniXtend is
based off TileLink
– TileLink is an open,
coherent bus used
to connect Cores
with Memory
OmniXtend
encapsulates
TileLink and
serializes it
over Ethernet
13. Benefits of OmniXtend
Already implemented
in FPGAs
Enables new data centric
architectures and decouples
compute from memory
Completely
unleashed memory
from the CPU
No need to
rewrite application
software
The only completely
open cache coherent
fabric standard
Based on low-
cost Ethernet
13
14. OmniXtend System Block Diagram Example
Only Ethernet based fabric that supports cache coherency and is open
14
15. OmniXtend Architecture Overview
15
RISC-V node
Match-Action Unit 0
Parser Deparser
Match-Action Unit 1 Match-Action Unit 11
…
Data
Packets
Data
Packets
Match Action
Unit (MAU)
PHV PHV
Match
Table
Parameters
Key Actions
TLoE Frame Structure
TLoE Frame Header (8 bytes)
Ethernet Preamble/SFD (8 bytes)
Ethernet MAC Header (14 bytes)
TileLink message 1
TileLink message 2
…
TileLink message m
Ethernet FCS (4 bytes)
TLoE Frame Mask (8 bytes)
Padding (Px8 bytes)
Color coding in right panel may need revision
16. FPGA “Real World” Measurements
RISC-V SoC with
OmniXtend running
in FPGA
Tofino Switch programmed
with P4 code to support
OmniXtend
1 1
2
1
2
16
17. FPGA Time Measurements
17
▸ 16 CPU (FPGA) system
& Tofino switch
– CPU runs at 100Mhz
in FPGA
OmniXtend Latency
(100MHz FPGA)
18. Software Support
▸ Kernel level memory management changes
▸ Option 1: Single kernel instance
– All nodes controlled under a single
kernel instance
▸ NUMA SMP like system
– Small scale systems
– Expose nodes memory as NUMA nodes
▸ Option 2: Independent kernel instances
– Independent kernel instance per node
– Large scale systems
– Applications can share memory through
an FS-like interface
▸ Memory mapped files
18
Memory 0 Memory 1
CPU 0 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 CPU 7
NUMA node 0 NUMA node1
Logical System
Memory 0
CPU 0 CPU 1 CPU 2 CPU 3
Physical node 0
Memory 1
CPU 0 CPU 1 CPU 2 CPU 3
Physical node 1
OmniXtend
Fabric
Memory
CPU 0 CPU 1 CPU 2 CPU 3
Physical node 0
Memory
CPU 0 CPU 1 CPU 2 CPU 3
Physical node 1
OmniXtend
Fabric
Shared Memory
19. z
Qemu Process
Software Support
▸ Single kernel image prototype implemented
– OpenSBI modifications mainly, very few kernel
changes needed
▸ Device tree description of memory
– Boots on FPGA prototype hardware, up to 4 nodes
▸ Independent kernels implementation under study
– Interface and implementation of shared memory
setup and access control
▸ QEMU based OmniXtend emulation under
development
– Facilitate software development and verification
– Allows access to physical compute nodes too
Memory Ops
Callback
Guest User Space
Guest Kernel
Read/write
to cacheable
memory
Read/write to non-cacheable
remote memory
Address
Checker
LRU
Cache
OmniXtend
Device
Memory
Backend
Virtio
Net-device
Virtio Disk
Remote cacheable
memory request
Implement
the
OmniXtend
protocol
MMIO
Device
Remote
Local
User Space
Kernel
Hardware Ethernet Phy
Access ethernet packets using a raw socket
19
23. Summary
23
▸ OmniXtend is an open,
unified memory fabric
▸ Joint workgroups with
RISC-V International to
standardize on TileLink
2.0 and OmniXtend for
multicore systems
▸ Adopt the technology
in your next SoC
See more:
www.chipsalliance.org
27. Visibility
Growth + Operations
CHIPS Alliance – organizational structure
27
Project maintainer 3
Project maintainer 2
Project maintainer 1
CHIPS Alliance Board of Directors
Zvonimir Bandic (Chairman)
Richard Ho (Vice-chairman)
Xiaoning Qi
Dave Ditzel
Yunsup Lee
David Kehlet
Prof. Borivoje Nikolic
Ted Marena
Interim Director
Henry Cook
Technical Committee
Michael Gielda
Outreach Committee
Brian Warner
Operations
Community Manager
Linux Foundation
Events
Elected
Staff
Agency
Future
Linux Foundation
Finance / Operations
Workgroup Chairs
Linux Foundation
Legal
Advocacy +
Outreach
SW Engineer 2
Verif. Engineer 1
Technology
28. Single Kernel Model
Physical Node 0
Memory 0
Physical Node 1
Memory 1
Logical System
CPU 0 CPU 1
Memory 0
CPU 2 CPU 3 CPU 4 CPU 5
Memory 1
CPU 6 CPU 7
NUMA node 0 NUMA node 1
OmniXtend
fabric
CPU 0 CPU 1 CPU 2 CPU 3 CPU 0 CPU 1 CPU 2 CPU 3
30. OmniXtend System Block Diagram Example
ML Accelerator
802.3 Phy
Internal Cache Coherence Switch
Cache Coherence Serializer L2 cache
Coherence Manager
802.3 PHY
DRAM
Programmable
(P4) Switch
TofinoTM
Ethernet
with Cache
Coherency
L1 $
L1 $
L1 $
L1 $
L1 $
L1 $
L1 $
L1 $
Internal Cache Coherence Switch
Cache Coherence Serializer
L2 cache
Coherence Manager 802.3 PHY
DRAM
NVM
802.3 Phy
NVM
NVM – Main Memory
Only Ethernet based fabric that supports cache coherency and is open
This is the live version of the diagram on 15-16.
Pls do not delete.
31. OmniXtend vs Other Protocols
Interface Physical I/O Connection Standard Coherence? Reference Design
CCIX PCIe Point to Point Yes No No
Gen Z Custom P2P, Switched, Fabric Yes No Yes
CXL PCIe Point to Point Yes Partial No
Open CAPI Custom Point to Point Open Partial Yes
OmniXtend Ethernet Fabric, P2P Open Yes Yes
31
Editor's Notes
> - Tooling to support use of Linux in Safety Critical Systems
>
> Complete and consistent set of techniques mapped to a set of open source
> tools provided by the project.
>
> - kernel and software stack lifecycle in these systems ( extending LTS
> models )
> - Provisions for sustaining linux throughout the that product lifetime.
> - Support for certification activities.
>
> Incident and Hazard Monitoring
>
> - Monitoring critical components in member system specific contexts and
> reporting impact for updates
> - Establish best practices for member response teams.
>
> Reference Documentation and Use Cases
>
> - Safety Concepts and building blocks
> - Kernel selection and how to configure (how/why certain flags).
> - Reference sample system to guide use of measures and techniques
>
> Education and Evangelism
>
> - Workshops and opportunities for knowledge sharing
> - Course on Safety Engineering Best Practices
Preexisting element selection and integration
Safety concept/safety case based on pre-existing elements
> - Use of analysis tools and gaps that require manual intervention
Continuous feedback to FLOSS community
- Process and traceability improvments
- Automation of bug scanning and quality metrics
- Building awareness on safety and its relation to reliabilty and availability
Interaction with the safety community
- Present and critically discuss FLOSS in safety
- Present methods and tools for peer review
- Establish acceptance for FLOSS in the safety community
- Submit amendments and/or full standards to relevant committees
Changed Integration/Minaturization to Low-cost Low-power SoCs
Changed Storage Architecture Control Points to Memory Architecture Control Points
Page fault trap leading to RDMA request (incurs context switch and SW overhead)
Global address translation management in SW, leading to LD/ST across global memory fabric
Coherence protocol scaled out, global page management and no context switching
Completely unleashed memory from the CPU
Main memory can be shared equally with CPUs, GPUs, ML accelerators, FPGAs, etc.
No need to rewrite application software
The only completely open cache coherent fabric standard
Based on low cost Ethernet
Already implemented in FPGAs
Enables new data centric architectures and decouples compute from memory
Kernel level memory management changes
Option 1: Single kernel instance
All nodes controlled under a single kernel instance
NUMA SMP like system
Small scale systems
Expose nodes memory as NUMA nodes
Option 2: Independent kernel instances
Independent kernel instance per node
Large scale systems
Applications can share memory through an FS-like interface
Memory mapped files
Single kernel image prototype implemented
OpenSBI modifications mainly, very few kernel changes needed
Device tree description of memory
Boots on FPGA prototype hardware, up to 4 nodes
Independent kernels implementation under study
Interface and implementation of shared memory setup and access control
QEMU based OmniXtend emulation under development
Facilitate software development and verification
Allows access to physical compute nodes too
So how do we make amazing data centric architectures happen? We believe it is possible via open standards. RISC-V affords us a unique opportunity to create an open standard memory coherency bus for heterogeneous computing architectures. Today, the existing CPUs all have coherency buses that are closed. We can lead the way with an open coherency bus. The CHIPS Alliance organization along with RISC-V International has been discussing this. We are asking you to join us in creating a standard for a RISC-V coherency bus. We think OmniXtend is a great foundation to build upon. OmniXtend has a reference design board and has been shown to perform well in initial system build outs. Contribute to our efforts in creating an open standard which will enable data centric solutions to thrive.
OmniXtend is an open, data centric memory fabric
Joint workgroups with RISC-V International to standardize on TileLink 2.0 and OmniXtend for multicore systems
Adopt the technology in your next SoC