

# **OPERA RHBD Multi-core**

Michael Malone Draper Laboratory

**MAPLD – 2009** 

31 August 2009



## **OPERA RHBD Multi-core Agenda**

- OPERA Program History
- Multi-core Motivation
  - Why Multi-core For Space

### The OPERA Hardware and Software

- The Maestro chip
- Opera Software Architecture

### Radiation Hardening By Design Overview

- Single tile radiation performance
- Roadmaps and Summary

# **OPERA** Definitions



- OPERA Program
  - The On-board Processing Expandable Reconfigurable Architecture (OPERA) Program
- RHBD Program
  - DTRA / DARPA's Radiation Hardened By Design (RHBD) II Program
- OPERA Single Core (or Tile) Product Demonstration Vehicle (PDV) #1
  - Single RHBD core with test wrapper
  - RHBD Program Product Demonstration Vehicle #1 (PDV1)
- Integrated Test Chip (ITC)
  - First pass Maestro device testing scheduled to be complete March, 2010
    - It becomes Maestro should no errors exist
- Maestro The RHBD 49 core processor
  - Maestro is the radiation hardened by design 49 core general purpose processor based on the Tilera TILE64  $^{\rm TM}$
  - TRL 6 devices scheduled to be available December, 2010
- MDE Tilera's Multi-core Development Environment (MDE) for the Tile64<sup>TM</sup>
- OSA OPERA Software Architecture
- ODE OPERA Development Environment
- Core or Tile Single processor within the Maestro device
- SHIM Interfaces between the core array and chip I/Os
- CHREC
  - National Science Foundation's Center for High-Performance Reconfigurable Computing (CHREC)
    - University centers include Florida, George Washington, BYU, and Virginia Tech

## **OPERA Program History**







- Leverage existing microprocessor and software advances
  - Program origins: DARPA Polymorphic Computer Architecture (PCA) Program
  - Purchase the Tilera Corporation's Intellectual Property (IP) for Government spaced-based applications
  - Utilize DARPA / DTRA Radiation Hardened By Design (RHBD) program libraries
    - Hardened 90 nm IBM 9SF CMOS design libraries
  - License existing 3<sup>rd</sup> party silicon designs
    - Floating point unit, serial deserializer, phase locked loop, memory interfaces
  - Participate in the NSF's Center for High-Performance Reconfigurable Computing (CHREC) software consortium
- Provide OPERA hardware IP and software free to contractors involved with US space-based applications

## Multi-core's Advantage



### Multi-core's advantage

- Improved performance per watt for most applications
- Low latency connections between processors
- Parallel processing capability
- Commercial industry support





Multi-Core Architecture

Multi-core architecture has inherently faster processor communication



On-chip interface: <10 cycle latency between processors

# Why Multi-core for Space?



### OPERA's Goal - Revolutionary Improvement in Processor Capabilities for Space Applications

- Space processing challenges
  - Advancing mission requirements
  - Shrinking decision timelines
  - Providing a common high-performance hardware and software technology foundation
- OPERA is the Government's near-term low cost multi-core processor solution
  - US Government owns the OPERA multi-core intellectual property
  - The OPERA program's Maestro chip provides processing leap-ahead capability for space applications
    - Breaks the paradigm of space electronics being a decade behind the commercial sector
    - Produces a radiation hardened state of the art general purpose processor
    - 100x more capable than current space qualified general purpose processors

# **OPERA** Components



### Hardware – Maestro Chip

- 49 core general purpose multi-core processor
- 45 GOPS at 310 MHz
  - 22 GFLOPS (theoretical) at 310 MHz
  - Clock speed limited by memory
- Four 10 Gbps SERDES XAUI interfaces
- Radiation Hard By Design (RHBD)
- Developed by Boeing SSED
  - Uses Tilera Corporation IP
  - Additional third party IP

### Software

- Basic compiler tools
  - Complements Tilera's toolset
- Benchmark code
- Performance and productivity tools
  - Parallel libraries, analyzer, debugger, run time monitor, OS ports



## **The Maestro Chip**



### RHBD version of the Tilera TLR26480

- 7 x 7 core array
- IBM 9SF 90 nm CMOS process
- < 28 Watts Peak (selectable), 20 Watts typical (using 49 cores)
  - Can "nap" cores and reduce power
  - ~ 270 mW per core
- Floating point unit in each processing core
  - IEEE 754 compliant, single and double precision
  - Aurora FPU IP
- 500 Krad TID
- Demonstrate NASA TRL-6 by December 2010
- Software compatible with the Tilera TLR26480
  - Reduced number of cores, slower clock speed, added FPU
- Tilera TLR26480 information can be found at <a href="http://www.tilera.com">www.tilera.com</a>



# Maestro / Tilera Performance Features



#### Tiled Architecture

- 2-D mesh of processors, connected by low-latency high- bandwidth register-mapped networks
- Intra-tile (i.e., intra-core) VLIW performance
- Multi-tile ILP compilation
- Inter-module communication acceleration at compile time

#### Processors

- Main processor: 3 way VLIW CPU, 64-bit instruction bundles, 32-bit integer operations
- Static switch processor: 16-bit instructions

#### Memory

- L1 cache: 2 cycle latency
- L2 cache: 7 cycle latency
- Caches not automatically coherent across cores/tiles
- Cores/tiles can access other cores L2 cache ("L3")
- Off-chip main memory, ~ 88 cycle latency
- 32-bit virtual address space per core

#### I/O Interfaces

- Four integrated XAUI MACs
- Two 10/100/1000 Ethernet MACs



Maestro Tile

# Maestro / Tilera Tile Block Diagram



### Core/Tile Processor

- 3 way VLIW processor
- 8 KB L1 Instruction cache
- Instruction Translation Lookaside Buffer (TLB)

### Cache System

- 8 KB L1 Data Cache
- 64 KB L2 I/D Cache
- Data TLB
- DMA Engine

### Core/Tile Switch

- Switch processor
  - 2 KB switch instruction cache
  - Switch TLB
- Static network (STN)
- Dynamic networks
  - MDN, TDN, UDN and IODN



#### **MAESTRO** Tile

© 2008 Tilera Corporation all rights reserved. May be published with permission by MAPLD 2009



- Maestro design modifications include:
  - IEEE 754 compliant FPU added to every core
  - Artisan radiation tolerant commercial memories
    - 500 Krad TID
  - EDAC and scrubbing added to meet SEU rate requirements
  - DDR1/DDR2 memory interface pads
  - Cache parity
  - RHBD design techniques for radiation mitigation
    - Balanced drive strength, DICE latches, temporal filtering, guard rings, power rail sizing
  - Removed PCI Express interfaces
  - Added additional XAUI interfaces
- With Tilera intellectual property and RHBD libraries, the OPERA program was able to design an optimized space multi-core processor for their customer

## **OPERA Software Architecture**





**Future Activities** 

## **Maestro Software Stack**





### Complete Integrated Development Environment

- Eclipse based tool suite

### User Libraries

- MPI, pthreads, TMC
- Shared memory, message passing, channels, threads and processes

### Tilera / Maestro operating systems

- Linux SMP 2.6
- VxWorks SMP
  - Workbench 3.1 and VxWorks 6.7

### Hypervisor

- Network interfaces
- Tile protection

MPI = Message Passing Interface TMC = Tilera Multi-core Components SMP = Symmetric Multiprocessing



### Benefits to releasing OPERA IP to contractors:

- Contractors can obtain the free OPERA IP if it is used in US spacebased applications and they provide related IRAD / program results to the government
- Allows industry to rapidly target US Space customers for missions with tailored multi-core solutions
- Leverages existing and competitive space computer board markets
  - Space industry has succeeded in productizing PowerPC 750 and 603
- Encourages growth of competitive multi-core architectures
  - Provides space industry access to a new TRL 6 RHBD multi-core processor
- Creates innovation and sources of new space processing solutions
  - Quickens commercial/Government acceptance of multi-core architectures
  - Prompts new flight computer board development for space programs
  - Instigates *multiple* multi-core chip board solutions for space use

3<sup>rd</sup> party IP license fees apply only if the design is modified



## Radiation Hardening By Design (RHBD) Overview



# **OPERA Radiation Requirement**

### Radiation operating limits and specifications

- 500 Krad, no SEL
  - TID limited by commercial Artisan memory radiation performance
- 1 hard reset per mission (10 year)

### Radiation testing performed in accordance radiation test plan

- TID Testing
- Dose Rate
- Single Event Effects

| Parameter                                               | Requirement |
|---------------------------------------------------------|-------------|
| Total Ionizing Dose (rd(SiO <sub>2</sub> ) <sup>3</sup> | 500 K       |
| Single Event Effects (Latchup) <sup>1</sup>             |             |
| Single Event Latchup (LET in MeV-cm <sup>2</sup> /mg)   | ≥ 100       |
| Single Event Effects (Functional) <sup>1</sup>          |             |
| Destructive (errors/device-day)                         | 0           |
| Unrecoverable (errors/device-day) <sup>4</sup>          | 2.80E-04    |
| Recoverable (errors/device-day) <sup>5</sup>            | 2.80E-03    |
| Data Integrity (errors/device-day)                      | 1           |
| Dose Rate <sup>2</sup>                                  |             |
| Dose Rate Upset (rd(Si)/sec)                            | ≥ 1E9       |
| Dose Rate Survivability (rd(Si)/sec)                    | ≥ 1E12      |

- (1) Adams 10% Worst Case environment under worst-case operating conditions for voltage and temperature
- (2) Dose rate testing shall be accomplished using a 20 to 50 nsec FWHM pulse, under worst-case voltage and nominal temperature operating conditions, for static and dynamic operation. The operation of the device under test shall be monitored for memory cell upset, I/O upset defined as a voltage excursion > Vdd/3
- (3) Testing shall be done IAW MIL-T1019.7 using a Cobalt-60 source at a 50 to 300 rd(Si)/s dose-rate
- (4) 1-error/device-mission (mission = 10 years); would require intervention of hard reset or power cycling
- (5) 1-error/device-year; taken care of through an autonomous circuit such as a watch-dog-timer

# **RHBD** Program Vision



### Enable Rad-Hard ASICs on advanced commercial fabrication processes

- -High performance, low power devices
- -Leverage existing foundry capabilities
  - IBM 9SF 90 nm CMOS
- -Reduces the space electronics lag



| Hardness Goals       |                                                       | Acceptable RHBD<br>Penalties |        |
|----------------------|-------------------------------------------------------|------------------------------|--------|
| Total ionizing Dose  | > 2 Mrad(Si) (OPERA > 500 Krad (Si))                  | Area                         | ≤ 2X   |
| Single Event Upset   | <1E-10 errors/bit-day (Adams), LET <sub>Th</sub> > 20 |                              |        |
| Single Event Latchup | LET <sub>TH</sub> > 120 Mev-cm <sup>2</sup> /mg       | Speed                        | ≤ 1.5X |
| Dose-Rate Upset      | >1E10 rad(SiO <sub>2</sub> )/sec                      | Deveer                       |        |
|                      |                                                       | Power                        | ≤ 2X   |

# **RHBD Design Enablement**



#### **Standard Cell Libraries**

- 1014 Cells
- Low Power & High **Speed Variants**

#### I/O Libraries

- 500MHz LVDS
- C4

SRAM

- Wirebond
- 2.5V, 3.3V tolerant



Corner

#### Development Step Data included

**ASIC Design Flow** 

| Synthesis              | <ul> <li>Liberty Format Files (.lib)</li> <li>Synopsys Data Base Files (.db)</li> </ul>                                                            |
|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| Simulation             | <ul> <li>Verilog simulation models</li> <li>VHDL VITAL simulation<br/>models</li> <li>Cadence schematics</li> </ul>                                |
| Placement &<br>Routing | <ul> <li>Cell physical geometry</li> <li>Cell frame views</li> <li>Cell timing views</li> <li>Cell power views</li> <li>Technology file</li> </ul> |
| Verification           | <ul> <li>Cell SPICE netlist</li> <li>Verification decks version</li> </ul>                                                                         |
| Support d <b>eta</b>   | <ul> <li>Cell datasheets</li> <li>Models &amp; Design rules version</li> </ul>                                                                     |

#### DDR 1 & 2 Interface

 High bandwidth external storage interface

#### **SERDES**

- 10Gbit/sec Ethernet
- Supports XAUI & **PCI Express**





### 1Mrad

- 500Krad
- 6 Types
- Generator in work

#### **PLL**

- Clock
- SERDES
- DDR2





## **Maestro Radiation Mitigation**

| Radiation<br>Effect | Affects                                              | Where used                                                   | Mitigation                                                                                        |
|---------------------|------------------------------------------------------|--------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
| TID                 | Thick oxide devices                                  | I/O Circuitry                                                | Annular devices                                                                                   |
|                     | Thin oxide devices                                   | All logic cells                                              | <ul> <li>Balanced drive strength</li> </ul>                                                       |
| SEL                 | All devices                                          | All blocks                                                   | <ul> <li>Extensive use of substrate and well contacts</li> </ul>                                  |
| SEU                 | SRAM cells                                           | Memory caches                                                | <ul> <li>Bit-level, word-level, and instance level<br/>interleaving</li> </ul>                    |
|                     |                                                      |                                                              | <ul> <li>Error Detection and Correction (EDAC)</li> </ul>                                         |
|                     |                                                      |                                                              | <ul> <li>Scrubbing</li> </ul>                                                                     |
|                     | Flip-Flops                                           | Logic blocks                                                 | <ul> <li>Dual Interlocked Storage Cells (DICE) flip-<br/>flops</li> </ul>                         |
| SET                 | Logic cells                                          | Control and data                                             | <ul> <li>Balance drive strength</li> </ul>                                                        |
|                     |                                                      | paths                                                        | <ul> <li>Pulse width filtering: temporal filters in the<br/>data/scan inputs</li> </ul>           |
|                     |                                                      | Clock gating cells                                           | <ul> <li>Balance drive strength</li> </ul>                                                        |
|                     |                                                      |                                                              | <ul> <li>Pulse width filtering: temporal filters in the<br/>data/scan and clock inputs</li> </ul> |
| SET                 | SET Digital logic cells Clock and reset distribution | <ul> <li>Single-stage high drive strength buffers</li> </ul> |                                                                                                   |
|                     |                                                      | distribution                                                 | <ul> <li>Balanced drive strength</li> </ul>                                                       |
|                     | Macros: PLL,                                         | ,                                                            | <ul> <li>High drive strength devices</li> </ul>                                                   |
|                     |                                                      | SerDes                                                       | <ul> <li>Balanced drive strength</li> </ul>                                                       |
| Analog logic cells  | Analog logic cells Macros: PLL,                      | <ul> <li>RC filtering in analog nodes</li> </ul>             |                                                                                                   |
|                     |                                                      | SerDes                                                       | <ul> <li>Guard rings</li> </ul>                                                                   |
|                     |                                                      |                                                              | <ul> <li>Increased bias current</li> </ul>                                                        |
| Dose Rate           | All cells                                            | All blocks                                                   | <ul> <li>Power rail sizing</li> </ul>                                                             |
| Effects             |                                                      |                                                              | <ul> <li>Balanced drive strength</li> </ul>                                                       |

# **RHBD Risk Mitigation Approach**





### **RHBD Device Radiation Testing**





Nominal Transistor Design



**RHBD** Transistor Design

 Incorporating RHBD techniques as necessary, minimal device leakage can be ensured to 2 Mrad[Si]

# PDV1 (OPERA Tile) Radiation Test Results



- At-speed test categorized SEU and SET effects
- Functional Testing
  - Passed the original Tilera tests
  - Passed all Memory Built-In-Self-Test (MBIST), scan, and Boeing at-speed tests
- Radiation Testing
  - ≻Tile hard ≥ 2 Mrad TID with RHBD II memories
  - No Single Event Latchup (SEL) at an LET of 123 MeV-cm<sup>2</sup>/mg
  - Successfully sorted SEE errors into categories:
    - No Destructive Errors
    - No Unrecoverable Errors
    - Recoverable Errors
    - Data Integrity Errors

Survived dose rate to 5E11



Standby Current versus Total Ionizing Dose (TID)





- Maestro has built in single bit correct, double bit detect EDAC
- Memory scrubbing required to clear multiple bit errors



Scrubbing to achieve 2E-11 Errors/bit-day due to independent single errors has acceptable throughput impact in all environments

# **OPERA Program Roadmap**







- Multi-core processors offer improved performance per watt for most applications
- OPERA Integrated Test Chip on track for March, 2010 completion
  - PDV1 radiation testing complete and meets requirements
  - ITC tapeout is currently scheduled for Q4, 2009
    - Tapeout date has slipped mainly due to memory timing model and design for test implementation issues
- Maestro on track for December, 2010 completion (TRL 6)
  - 2<sup>nd</sup> pass device if required
- RHBD is a viable alternative to radiation hardened by process devices for select designs
- OPERA IP available for interested contractors / agencies working on US space-based applications