Cortex-M4 Part1
Cortex-M4 Part1
Cortex-M4 Part1
Architecture
Module Syllabus
ARM Architectures and Processors
What is ARM Architecture
ARM Processor Families
ARM Cortex-M Series
Cortex-M4 Processor
ARM Processor vs. ARM Architectures
ARM Holdings
The company designs ARM-based processors;
Does not manufacture, but licenses designs to semiconductor partners who add their own
Intellectual Property (IP) on top of ARMs IP, fabricate and sell to customers;
Also offer other IP apart from processors, such as physical IPs, interconnect IPs, graphics
High reliability
SecurCore series
Cortex-A15
Cortex-A9
Cortex-A8
Cortex-A7
Cortex-A5
Cortex-R7
Cortex-R5
Cortex-R4
Cortex-M4
Cortex-M3
Cortex-M1
Cortex-M0+
Cortex-M0
SC000
SC100
SC300
ARM11
ARM9
ARM7
Cortex-A
Cortex-R
Cortex-M
SecurCore
Classic
Cortex-A57
Cortex-A53
As of Dec 2013
Cortex-M processors are the optimal solution for low-power embedded computing applications. The 32-bit Cortex-M
processor family is the key to transforming all sorts of embedded systems into smart and connected systems. Often
provided as a black box with pre-loaded applications, they have limited capability to expand hardware functionality and in
most cases no screen.
Merchant MCUs
*Automotive Control Systems
White Goods controllers
*Smart Meters
*Sensors
6
IR Fire Detector
Intelligent toys
Utility
Meters
Exercise
Machines
Tele-parking
R
A
7
Intelligent
Vending
Since 1993
1993
10 Billion
2013
50 Billion
www.50billionchips.com
20% | Embedded
16% | Enterprise
6% | Home
Consumers devices such as smart TVs, game
consoles and home networking gateways
58% | Mobile
Devices including smartphones,
mobile phones, tablets, e-readers
and wearables
www.50billionchips.com
Cortex-R5
Cortex-M4
ARM7
ARM9
ARM11
DRAM ctrl
FLASH ctrl
SRAM ctrl
AXI bus
AHB bus
APB bus
GPIO
I/O blocks
Timer
Licensable IPs
12
SoC
ROM
ARM
processor
System bus
RAM
ARM-based
SoC
Peripherals
External Interface
SoC Design
Chip Manufacture
Smaller code
Lower silicon costs
Ease of use
Faster software development and reuse
Embedded applications
Smart metering, human interface devices, automotive and industrial control
13
As of Dec 2013
memory map
Documented in the Architecture Reference Manual
ARM processor
Developed using one of the ARM architectures
More implementation details, such as timing information
Documented in processors Technical Reference Manual
ARMv4/v4T
Architecture
ARMv5/ v4E
Architecture
ARMv6
Architecture
ARMv7
Architecture
ARMv7-A
e.g. Cortex-A9
ARMv7-R
e.g. Cortex-R4
ARM v6-M
e.g. Cortex-M0, M1
e.g. ARM7TDMI
14
e.g. ARM9926EJ-S
ARMv8
Architecture
ARMv8-A
e.g. Cortex-A53
Cortex-A57
ARMv8-R
ARMv7-M
e.g. Cortex-M4
e.g. ARM1136
As of Dec 2013
15
Processor
ARM
Architecture
Core
Architecture
Thumb
Thumb-2
Hardware
Multiply
Hardware
Divide
Saturated
Math
DSP
Extensions
Floating
Point
Cortex-M0
ARMv6-M
Von
Neumann
Most
Subset
1 or 32
cycle
No
No
Software
No
Cortex-M0+
ARMv6-M
Von
Neumann
Most
Subset
1 or 32
cycle
No
No
Software
No
Cortex-M1
ARMv6-M
Von
Neumann
Most
Subset
3 or 33
cycle
No
No
Software
No
Cortex-M3
ARMv7-M
Harvard
Entire
Entire
1 cycle
Yes
Yes
Software
No
Cortex-M4
ARMv7E-M
Harvard
Entire
Entire
1 cycle
Yes
Yes
Hardware
Optional
load/store architecture
Fixed instruction length
Fewer/simpler instructions than CISC CPU
Limited addressing modes, operand types
Simple design easier to speed up, pipeline &
scale
16
label
LDR r0,[r8]
ADD r4,r0,r1
destination
17
source/left
; a comment
;r4=r0+r1
source/right
is 32 bits long.
Word can be divided into four 8-bit bytes.
ARM addresses can be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Configure at power-up in either little- or bitendian mode.
18
Endianness
Relationship between bit and byte/word ordering
defines endianness:
bit 31
byte 3
byte 2
byte 1
little-endian
(default)
19
bit 0
byte 0
bit 0
byte 0
byte 1
byte 2
big-endian
bit 31
byte 3
Enhanced Determinism
The critical tasks and interrupt routines can be served quickly in a known number of cycles
21
Instruction set
Include the entire Thumb-1 (16-bit) and Thumb-2 (16/ 32-bit) instruction sets
Supported Interrupts
Non-maskable Interrupt (NMI) + 1 to 240 physical interrupts
8 to 256 interrupt priority levels
22
Integrated WFI (Wait For Interrupt) and WFE (Wait For Event) Instructions and Sleep On Exit capability (to be
covered in more detail later)
Enhanced Instructions
Debug
23
40 nm G process 8 W/MHz
24
Process
180ULL
(7-track, typical 1.8v, 25C)
90LP
(7-track, typical 1.2v, 25C)
40G
9-track, typical 0.9v, 25C)
Dynamic Power
157 W/MHz
33 W/MHz
8 W/MHz
Floorplanned Area
0.56 mm2
0.17 mm2
0.04 mm2
Nested Vector
Interrupt
Controller
(NVIC)
Optional
Debug
Access Port
Processor core
Optional
Embedded
Trace Macrocell
Optional Memory
protection unit
Optional Serial
Wire Viewer
Optional
Flash
patch
Optional
Data
watchpoints
Bus matrix
Code interface
25
SRAM and
peripheral interface
Fetch
Instruction 2
Fetch
Decode
Execute
Decode
Instruction 3
Fetch
Instruction 4
Fetch
Execute
Decode
Execute
Decode
Execute
Time
26
27
(bit-band)
May include bus bridges (e.g. AHB-to-APB bus bridge) to connect different buses
Debug subsystem
Handles debug control, program breakpoints, and data watchpoints
When a debug event occurs, it can put the processor core in a halted state,
28
where developers can analyse the status of the processor at that point, such as
register values and flags
JTAG/SW Debug
ETM
Nested vect IT Ctrl
1 x Systic Timer
DMA
D-bus
I-bus
S-bus
16 Channels
Clock Control
AHB1
(max 168MHz)
51/82/114/140 I/Os
2x6x 16-bit PWM
Synchronized AC Timer
3 x 16bit Timer
Up to 16 Ext. ITs
1 x SPI
2 x USART/LIN
Bridge
512kB- 1MB
Flash Memory
Flash I/F
CORTEX-M4
CPU + FPU +
MPU
168 MHz
Encryption**
Camera Interface
USB 2.0 OTG FS
128KB SRAM
External Memory
Interface
USB 2.0 OTG
FS/HS
Int. RC oscillators
32KHz + 16MHz
PLL
RTC / AWU
5x 16-bit Timer
2x 32-bit Timer
2x Watchdog
(independent& window)
1x SDIO
3x 12-bit ADC
24 channels / 2Msps
Temp Sensor
2x CAN 2.0B
2 x SPI / I2S
4x USART/LIN
3x I2C
Outstanding results:
210DMIPS at 168MHz.
Execution from Flash equivalent to 0-wait state performance up to 168MHz thanks to ST
ART Accelerator
30
5
Advanced peripherals
31
6
STM32F4 portfolio
STM3240G-EVAL
$349
Large choice of development IDE solutions from the STM32 and ARM
ecosystem
STM32F4DISCOVERY
$14.90
34
Commercial ones:
IAR eval 32kB/30days for test
[RK-System]
Keil (ARM) eval 32kB for test
[WG Electronics]
Based on GCC commercial:
Atollic Lite (no hex/bin, limited debug), [Kamami]
Raisonance debug limited to 32kB
Rowley Crossworks 30 days for test
Free
STVP FLASH prog.
STLink utility FLASH prog.
(+cmd line)
ST FlashLoader FLASH prog.
Libraries (free)
Standard peripherals library with CMSIS
USB device library
35
Cortex-M0
Cortex-M3
Instruction set architecture
Architecture Version
v7ME
Thumb, Thumb-2
Instructions
V6M Thumb + Thumb-2 System
v7M
0.9
DMIPS/MHz
Bus interfaces
Yes
Integrated NVIC
Number interrupts
1.25
1-32 + NMI
Yes
1-240 + NMI
Interrupt priorities
8-256
4/2/0, 2/1/0
8/4/0, 2/1/0
Breakpoints, Watchpoints
No
Yes (Option)
8/4/0, 2/1/0
No
Yes (Option)
Yes (Option)
No
Yes (Option)
Yes (Option)
Yes (Option)
No
Yes
No
Yes
Yes
Hardware Divide
Yes
Yes
Yes
WIC Support
No
Yes
Yes
No
No
Yes
No
No
Yes
Yes
Yes
AHB Lite, APB
Yes
13
Cortex-M4 Registers
Processor registers
The internal registers are used to store and process temporary data within the
processor core
All registers are inside the processor core, hence they can be accessed quickly
Load-store architecture
To process memory data, they have to be first loaded from memory to registers,
processed inside the processor core using register data only, and then written back to
memory if needed
Cortex-M4 registers
Register bank
Sixteen 32-bit registers (thirteen are used for general-purpose);
Special registers
39
Cortex-M4 Registers
R0
Register bank
R1
R2
R3
R4
Low
Registers
R5
General purpose
register
R6
R7
R8
R9
R10
R11
Special registers
R12
MSP
R13(banked)
R14
PSP
R15
x PSR
APSR
EPSR
IPSR
PRIMASK
Application
PSR
Execution
PSR
Interrupt
PSR
FAULTMASK
BASEPRI
Stack definition
40
High
Registers
CONTROL
Cortex-M4 Registers
R0 R12: general purpose registers
Data
Data
PUSH
POP
Low
Stack
access e.g. OS kernel, and exception handlers, and Process SP, used in baselevel application code (when not running an exception handler)
SP
High
PC
Heap
Address
Code
Cortex-M4 Registers
R14: Link Register (LR)
The LR is used to store the return address of a subroutine or a function call
The program counter (PC) will load the value from LR after a function is finished
Current PC
PC
LR
LR
2. Load PC with
the starting
address of the
subroutine
Main
Program
code
subroutine
Current PC
Code region
Main
Program
code
Code region
1. Save current
PC to LR
Current LR
subroutine
PC
Call a subroutine
42
Cortex-M4 Registers
xPSR, combined Program Status Register
Provides information about program execution and ALU flags
Application PSR (APSR)
Interrupt PSR (IPSR)
Execution PSR (EPSR)
APSR
NZCVQ
Reserved
IPSR
Reserved
EPSR
xPSR
NZCVQ
bit31
43
ISR number
ICI/IT
Reserved
ICI/IT
ICI/IT
Reserved
ICI/IT
bit24
bit16
bit8
ISR number
bit0
Cortex-M4 Registers
APSR
N: negative flag set to one if the result from ALU is negative
Z: zero flag set to one if the result from ALU is zero
C: carry flag set to one if an unsigned overflow occurs
V: overflow flag set to one if a signed overflow occurs
Q: sticky saturation flag set to one if saturation has occurred in saturating arithmetic instructions,
IPSR
ISR number current executing interrupt service routine number
EPSR
T: Thumb state always one since Cortex-M4 only supports the Thumb state (more on processor
44
each instruction
ex.
45
Cortex-M4 Registers
Interrupt mask registers
1-bit PRIMASK
Set to one will block all the interrupts apart from nonmaskable interrupt (NMI) and the
hard fault exception
1-bit FAULTMASK
Set to one will block all the interrupts apart from NMI
1-bit BASEPRI
Set to one will block all interrupts of the same or lower level (only allow for interrupts
with higher priorities)
Cortex-M4 Registers
PRIMASK
PRIMASK
Reserved
FAULTMASK
FAULTMASK
Reserved
BASEPRI
BASEPRI
Reserved
CONTROL
Reserved
bit31
bit24
bit16
bit8
Stack definition
47
48
49
50
Cortex M4
DSP features
Thumb-2 Technology
DSP and SIMD extensions
Single cycle MAC (Up to 32 x 32 + 64 -> 64)
Optional single precision FPU
Integrated configurable NVIC
Compatible with Cortex-M3
Microarchitecture
3-stage pipeline with branch speculation
3x AHB-Lite Bus Interfaces
15
Cortex-M4 overview
Main Cortex-M4 processor features
ARMv7-ME architecture revision
Fully compatible with Cortex-M3 instruction set
54
IN S TR U C TIO N S
CM 3
CM 4
n/a
n/a
n/a
n/a
n/a
n/a
1
1
1
1
1
1
n/a
n/a
32 x 32 = 32
M UL
32 (32 x 32) = 32
32 x 32 = 64
(32 x 32) + 64 = 64
(32 x 32) + 32 + 32 = 64
M LA, M LS
SM U LL, U M U LL
SM LAL, U M LAL
U M AAL
5-7
5-7
n/a
1
1
1
n/a
SM M U L, SM M U LR
n/a
All the above operations are single cycle on the Cortex-M4 processor
55
Saturated arithmetic
Intrinsically prevents overflow of variable by clipping to min/max
boundaries and remove CPU burden due to software range
checks
Benefits
1,5
1,5Audio applications
Without
saturation
0,5
0
-0,5
-1
0,5
-1,5
1,5
-0,5
0,5
-1
With
saturation
-1,5
Control applications
0
-0,5
-1
-1,5
The PID controllers integral term is continuously accumulated over time. The
saturation automatically limits its value and saves several CPU cycles per
regulators
56
Benefits
Parallelizes operations (2x to 4x speed gain)
Minimizes the number of Load/Store instruction for exchanges between memory and register file
(2 or 4 data transferred at once), if 32-bit is not necessary
Maximizes register file use (1 register holds 2 or 4 values)
57
00......00 A
Extract
00......00 B
Pack
A
58
1 Branch
2
59
2
1
1
1
1
1
2
1
1
1
1
1
2
yn
b0 x n
a1 y n
1
b1 x n 1
a2 y n
b2 x n 2
2
60
61
= filtLen-1;
*stateIndexPtr,
Multiply and
accumulate
previous
62
sum1
sum2
sum3
c0 = *(coeffPtr++);
x0 = *(q31_t *)(statePtr++); x1 =
*(q31_t *)(statePtr++);
sum0
=
SMLALD(x0, c0, sum0);
sum1
=
SMLALD(x1, c0, sum1);
sum2
=
SMLALD (x2, c0, sum2);
sum3
=
SMLALD (x3, c0, sum3);
} while(--i);
*pDst++ = (q15_t) (sum0>>15);
*pDst++ = (q15_t) (sum1>>15);
*pDst++ = (q15_t) (sum2>>15);
*pDst++ = (q15_t) (sum3>>15);
stateBasePtr= stateBasePtr + 4;
} while(--sample);
63
64
Useful Resources
Architecture Reference Manual:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html
65