ELC Multiprocessor FPGA Linux
ELC Multiprocessor FPGA Linux
ELC Multiprocessor FPGA Linux
Agenda
Introduction
Problem: How to Integrate Multi-Processor Subsystems
Why
Why would you do this?
Why use FPGAs?
Lab 1: Getting Started - Booting Linux and Boot-strapping NIOS
Lab 2: Inter-Processor Communication and Shared Peripherals
Lab 3: Locking and Tetris
Building Hardware: FPGA Hardware Tools & Build Flow
Building/Debugging Software: Software Tools & Build Flow
References
Q&A All through out.
2
Periph 1
Processor
Subsystem
1
Periph 2
Periph 3
???
Periph 1
Processor
Subsystem
2
Periph 2
Periph 3
Experimentation
Many processor subsystems can
be implemented
Allows you to experiment changing
microprocessor subsystem
hardware designs
Altera FPGA under-the-hood
However: Generic Linux
interfaces used and can be
applied in any Linux system.
5
Mailbox
NIOS
N
Peripheral
Design Starts
50%
More than 50% of FPGA designs include an embedded processor, and growing.
Many embedded designs using Linux
Open-source re-use.
Altera Linux Development Team actively contributes to Linux Kernel
6
Lab focus
UART
DDR3
LEDs
Buttons
ARM-to-FPGA
Bridges
Data Width
configurable
A9
I$
A9
D$
I$
D$
L2
FPGA
EMIF
DMA
ROM
UART
RAM
SD/MMC
42K Logic
Macros
Using no more
than 14%
AXI Bridge
AXI Bridge
HPS2FPGA
LWHPS2FPGA
32/64/128
32
AXI Bridge
FPGA2HPS
32/64/128
SYS ID
RAM
FPGA Fabric
Soft Logic
8
GPIO
32
NIOS
Subsystem 1
SD/MMC
EMIF
Cortex-A9
UART
ARM Subsystem
RAM
NIOS 0
GPIO
Subsystem 2
Shared Peripherals
10
Dedicated Peripherals
NIOS
11
ARM Cortex-A9
Address Base
Peripheral
Address Base
Peripheral
0xFFC0_2000
ARM UART
0xFFC0_2000
UART
0x0003_0000
GPIO (LEDs)
0xC003_0000
GPIO (LEDs)
0x0002_0000
System ID
0xC002_0000
System ID
0x0000_0000
On-chip RAM
0xC000_0000
On-chip RAM
12
Peripheral Address
Offset
Access
Bit Definitions
Sys ID
0x0
RO
GPIO
0x0
R/W
UART
0x14
RO
UART
0x30
R/W
UART
0x34
R/W
NIOS resets
connected to GPIO
13
14
Lab 2: Mailboxes
NIOS/ARM Communication
Topics Covered:
Altera Mailbox Hardware IP
15
ARM Subsystem
Subsystem 1
SD/MMC
Cortex-A9
UART
GPIO
MBox
RAM
NIOS 0
GPIO
Subsystem 2
Shared Peripherals
16
EMIF
MBox
RAM
NIOS 1
GPIO
Subsystem 3
Dedicated Peripherals
NIOS 0 & 1
17
ARM Cortex-A9
Address Base
Peripheral
Address Base
Peripheral
0xFFC0_2000
ARM UART
0xFFC0_2000
UART
0x0007_8000
0x0007_8000
0x0007_0000
0x0007_0000
0x0005_0000
0x0006_8000
0x0003_0000
0x0006_0000
0x0002_0000
System ID
0xC003_0000
GPIO (LEDs)
0x0000_0000
On-chip RAM
0xC002_0000
System ID
0xC001_0000
NIOS 1 RAM
0xC000_0000
NIOS 0 RAM
Peripheral Address
Offset
Access
Bit Definitions
Mailbox
0x0
R/W
Mailbox
0x8
R/W
18
Design Decisions:
Short Length: A single 32-bit word
Human Readable
Message transactions are closed-
Format:
Message Length: Four Bytes
First Byte is ASCII character
19
Byte 0
Byte 1 Byte 2
Byte3
\0
\0
Message Types:
G00: Give Access to UART
(Push)
A1A: ACK
N1A:NACK
20
21
ARM Subsystem
Subsystem 1
SD/MMC
Cortex-A9
UART
GPIO
MBox
RAM
MBox
RAM
NIOS 0
NIOS 1
GPIO
Subsystem 2
GPIO
Subsystem 3
Shared Peripherals
22
EMIF
Dedicated Peripherals
NIOS 0 & 1
23
ARM Cortex-A9
Address Base
Peripheral
Address Base
Peripheral
0xFFC0_2000
ARM UART
0xFFC0_2000
UART
0x0007_8000
0x0007_8000
0x0007_0000
0x0007_0000
0x0005_0000
0x0006_8000
0x0003_0000
0x0006_0000
0x0002_0000
System ID
0xC003_0000
GPIO (LEDs)
0x0000_0000
On-chip RAM
0xC002_0000
System ID
0xC001_0000
NIOS 1 RAM
0xC000_0000
NIOS 0 RAM
Available in Linux
24
B00
NIOS 0
A0A
L00
A0A
NIOS 1
B10
A1A
L10
A1A
Cortex-A9
pthread_mutex_lock()
pthread_mutex_unlock()
/* Create 2 Threads */
i=0;
while(i < 1)
{
err = pthread_create(&(tid[i]), NULL,
&nios_buttons_get, &(nios_num[i]));
i++;
}
<snip Critical Section>
pthread_mutex_lock(&lock);
/* Critical Section */
pthread_mutex_unlock(&lock);
<snip Stop/Destroy>
/* Wait for threads to complete */
pthread_join(tid[0], NULL);
pthread_join(tid[1], NULL);
/* Destroy/remove lock */
pthread_mutex_destroy(&lock);
26
References
27
Altera References
System Design Tutorials:
http://www.alterawiki.com/wiki/Designing_with_AXI_for_Altera_SoC_ARM_Devices_Workshop_Lab_-_Creating_Your_A
XI3_Component
Designing_with_AXI_for_Altera_SoC_ARM_Devices_Workshop_Lab
Simple_HPS_to_FPGA_Comunication_for_Altera_SoC_ARM_Devices_Workshop
http://www.alterawiki.com/wiki/Simple_HPS_to_FPGA_Comunication_for_Altera_SoC_ARM_Devices_Workshop_-_LAB2
http://www.altera.com/literature/tt/tt_nios2_multiprocessor_tutorial.pdf
Quartus Handbook:
https://www.altera.com/en_US/pdfs/literature/hb/qts/quartusii_handbook.pdf
Qsys:
Qsys Tutorial: Step-by-step procedures and design example files to create and verify a system in Qsys
https://www.altera.com/en_US/pdfs/literature/ug/ug_soc_eds.pdf
Related Articles
Performance Analysis of Inter-Processor Communication Methods
http
://www.design-reuse.com/articles/24254/inter-processor-communication-multi-core
-processors-reconfigurable-device.html
OpenMCAPI:
https://bitbucket.org/hollisb/openmcapi/wiki/Home
Mutex Examples:
http://www.thegeekstuff.com/2012/05/c-mutex-examples/
29
Thank You
http://rocketboards.org/foswiki/Projects/BuildingMultiPro
cessorSystems
Includes:
Source code
Hardware source
Hardware Quartus Projects
Software Eclipse Projects
BACKUP SLIDES
32
Building Hardware:
Qsys (Hardware System Design Tool) User Interface
Interfaces
Exported
In/out of
system
Connections
between cores
33
Quartus
&
Qsys
Eclipse
DS-5 & Debug Tools
Device Tree
RBF
Inputs:
Hardware Design (Qsys or RTL or Both)
SDCARD Layout
Partition 1: FAT
Uboot scripts
FPGA HW Designs (RBF)
Device Tree Blobs
zImage
Lab material
35
Updating SD Cards
File
Update Procedure
zImage
soc_system.rbf
soc_system.dtb
u-boot.scr
preloader-mkpimage.bin
$ sudo dd if=preloader-mkpimage.bin
of=/dev/sdx3 bs=64k seek=0
u-boot-socfpga_cyclone5.img
root filesystem
36
37
38
Partitioning peripherals
Declare dedicated peripherals only connected/controlled by one
processor
Declare shared peripherals Connected/controlled by multiple processors
Decide Upon Locking Mechanism
Covered in Lab 3
39
Pre-loader/U-Boot Generator
Device Tree Generator
Bare-metal Libraries
Compilers
GCC (for ARM and NIOS)
ARMCC (for ARM with license)
Linux Specific
Kernel Sources
Yocto & Angstrom recipes: http://
rocketboards.org/foswiki/Documentation/AngstromOnSoCFPGA_1
Buildroot: http://
rocketboards.org/foswiki/Documentation/BuildrootForSoCFPGA
41
42
Design
Design
Simulate
Simulate
Debug
Debug
Release
Release
Eclipse
GNU toolchain
OS/BSP: Linux, VxWorks
Hardware Libraries
Design Examples
Flash Programmer
Hardware Design:
Simple custom logic design in FPGA
All source code and Quartus II /
Qsys design files for reference
Software Design:
Includes Linux Kernel and
Application Source code
Includes all compiled binaries
43
Software design
Message Protocols
Linux tools/mechanism available today
44
46
Project Navigator
Tool View
window
Tasks window
Messages window
Project
Project creation
creation
Functional
verification
Verify design behavior
Synthesis (mapping)
Functional
Functional verification
verification
Logic
Memory
I/O
Design
Design compilation
compilation
Timing analysis
Functional verification
Verify design will work in
target technology
Functional
Functional verification
verification
47
Project definition
Project
Project creation
creation
Design
Design creation
creation
Functional
Functional verification
verification
Memory
Logic
I/O
Design
Design compilation
compilation
Functional
Functional verification
verification
In-system
In-system debug
debug
48
Design entry
Quartus II Software
New project wizard
HDL editor
Schematic editor
State machine editor
MegaWizard Plug-In Manager
Customization and generation of IP
Qsys system integration tool
Assignment editor
Pin planner
Synopsys Design Constraint (SDC) editor
Synthesis
Netlist viewers
Advisors
Power analysis
49
Quartus II Software
Chip Planner
Functional verification
ModelSim-Altera edition
Third-party EDA simulation tools
Assembler
50
Device supported
Software features:
Incremental compilation
and team-based design
SSN Analyzer
Transceiver Toolkit
Web Edition
MAX series devices: All (Excluding MAX7000 /
3000)
Cyclone V FPGAs: All (Excluding 5CEA9,
5CGXC9, and 5CGTD9)
Cyclone III/IV FPGAs: All
Arria II GX FPGA: EP2AGX45
Cyclone V SoCs: All
Yes
No
Yes
Multi-processor support
Yes
Yes
Windows 32/64-bit
Linux 32/64-bit
Windows 32/64-bit
Linux 32/64-bit
Perpetual
(continues to work after
expiration)
Free
Agenda
52
Hierarchy
Based on Network-on-a-Chip (NoC)
Architecture
54
Design
System
Add to
Library
Industry-Standard Interfaces
Avalon Interfaces
Package as IP
Toolbar
55
Interfaces
Exported
for Hierarchy
Improved Validation
Display
Qsys Benefits
Raises the level of design abstraction
System-level design and system visualization
Network-on-Chip Architecture
Transaction Layer
Layer
Transaction
Converts transactions
transactions to
to
Converts
command packets
packets and
and
command
responses
packets
to
responses packets to
responses
responses
Avalon-MM
AXI-MM
57
Transport
Transport Layer
Layer
Transfers
Transfers packets
packets to
to destination
destination
Transaction
Transaction Layer
Layer
Converts
Converts command
command
packets
to
packets to transactions
transactions
and
and responses
responses to
to
response
packets
response packets
Avalon-MM
AXI-MM
Avalon-ST
Master
Interface
Master
Network
Interface
Avalon ST
Network
(Command)
Slave
Network
Interface
Slave
Interface
Master
Interface
Master
Network
Interface
Avalon ST
Network
(Response)
Slave
Network
Interface
Slave
Interface
Scalability
Segment network into sub-networks using
Bridges
Clock crossing logic
58
Industry-Standard Interfaces
Developer
Data plane
Streaming
Data switching (muxing, demuxing), aggregation, bridges
61
Embedded processor IP
E.g. Hardened ARM processor system, Nios II
processor
Verification IP
E.g. Avalon-MM/-ST, AXI4, APB
Connect IP and
Systems
Interface protocols
Memory
DSP
Embedded
Bridges
PLL
Custom systems
Accelerate
Development
IP 1
Custom 1
IP 2
IP 3
Custom 2
HDL
HDL
Simplify
Integration
Additional Resources
Watch online demos (3-5 min)
www.altera.com/qsys
In-system Verification
Debug Challenges
Accessing and viewing internal signals
Not enough pins to use as test points
Capabilities in creating trigger conditions that correctly
capture data
Verification of standard or proprietary protocol interfaces
Overall design process bottleneck
66
On-chip Debug
Access and view internal signals
Store captured data in FPGA embedded memory
Use JTAG interface as debug ports
Incrementally add internal signals to view
Reduce
Debug Cycles by
Using On-chip Debug Tools
67
Download
Cable
68
JTAG
Tap
Controller
SLD
Hub
User's
Design
(Core Logic)
Node
2
Node
N
Node
N-1
69
70
Enable
Enable memory
memory
content
content editor
editor
71
72
Hardware Libraries
GNU-based bare-metal (EABI) compiler tools
U-Boot
Root file system to jump start software development
Pre-built Linux kernel
http://www.rocketboards.org for source trees and community access
74
75
Design
Design
Simulate
Simulate
Debug
Debug
Release
Release
Flash Programmer
Hardware
Development
76
Software
Development
Design
HW/SW
Handoff
Simulate
Design
Simulate
Development
Debug
Release
FPGA-Adaptive
Debugging
Debug
Release
Flash Programmer
Hardware-toSoftware
Handoff
Firmware
Development
Linux
Application
Development
FPGAAdaptive
Debugging
FPGA-adaptive debugging
ARM DS-5 Altera Edition Toolkit
Design examples
77
Hardware-to-Software Handoff
Hardware
system.iswinfo
Software
78
system.sopcinfo
Preloader
Generator
Device Tree
Generator
.c & .h
source files
Linux
Device Tree
79
80
GNU-based bare-metal
(EABI) compiler tools
81
Application
Operating
System
BSP
Hardware
BMAL
HAL
PAL
Libraries
SoC FPGA
Baremetal
App
82
1
JTAG-Based Debugging
Board Bring-up
System Integration
Kernel Debug
System Debug
Application Debugging
83
3
FPGA-Adaptive Debugging
JTAG
DSTREAM
84
JTAG
g
n
i
g
g
u
b
De
Barrier
isualize
v
o
t
le
b
a
c
tool/
No single
and
U
P
C
h
t
o
b
l
and contro
ains
FPGA dom
GA to
P
F
d
n
a
U
r CP
No way fo
JTAG
elate
r
r
o
c
d
n
a
r
DSTREAM
cross trigge d software events
ar e an
w
d
r
a
h
JTAG
Dedicated JTAG connection
n
a
c
r
e
g
g
deb u
Visualize & control CPU
No fixed
Dedicated JTAG connection
of
s
d
e
e
n
e
subsystem
h
t
areVisualize & control FPGA
address
w
d
r
a
h
A
G
FP
changing
85
Altera
Altera
USB-Blaster
USB-BlasterII
II
Connection
Connection
87
88
Peripheral register
descriptions
89
90
Cross-Domain Debug 1
Trigger from software world to FPGA world
SOFTWARE TRIGGER
HARDWARE TRIGGER!
91
Cross-Domain Debug 2
Trigger from FPGA world to software world
HARDWARE TRIGGER
EXECUTION STOP
OR
HW TRACE TRIGGER
92
EXECUTION STOP
OR
SW TRACE TRIGGER
SignalTap II Logic
Analyzer
or
DS-5 debugger
Timestamp
Timestamp Correlated
Correlated
93
System-Level
Performance Analysis
Performance
bottlenecks in SoCs
often come from the
CPU interaction with
the rest of the SoC
Streamline visualizes
software activity with
performance counters
from the SoC and
FPGA to enable full
system-level analysis
Streamline only
requires a TCP/IP
connection to the SoC
94
Processor Counters,
Aggregated, or Per Core
Power Consumption
FPGA Block Counters
Application Events
95
Yes
Yes
OS Porting
Yes
Baremetal Programming
Yes
Yes
96
Subscription
Edition
Yes
Yes
Yes
System Debugging
Yes
Web
Edition
Subscription
Edition
30-Day
Evaluation
Eclipse IDE
Key Feature
ARM Compiler*
Debugging over Ethernet (Linux)
Hardware Cross-triggering
x
x
*ARM Compiler is available in DS-5 Professional Edition, available directly from ARM
97
Altera.com
Quartus II
Programmer
SignalTap II
98
Altera.com
RocketBoards.org
Pre-built Binaries
Kernel
U-Boot
Yocto
Minimal RFS
Tool chains
Handoff tools
HW Libraries
Examples
Documentation
Frequent Updates
Kernel source
U-Boot source
Yocto source
RFS source
Toolchain
source
Public git
Wiki
Mailman
Partners
BSPs
Middleware
3rd Party Tools
99