ARM Notes1

ARM Processor
History
-
Advanced RISC Machine

ARM was developed at Acorn computers Ltd. Of Cambridge, England (1998).
o RISC concept introduced in 1980 at Stanford and Berkeley.
o Key design goal was to build a compact RISC CPU with low latency I/O
(interrupt) handling, which is required for embedded systems in real
environment.
o Led by Sophie Wilson and Steve Ferber.
ARM limited found in 1990.
ARM cores
o Licensed partners to develop and fabricate new controllers.
o Soft core.
ARM provides intellectual Property(IP core)
Soft intellectual property:
ARM provides IP to licenses and also the synthesis flows to allow the partner
to synthesise the processor to their technology.
RTL and synthesis flow.
GPSTI layout.
Note - Code Density: Space taken up in memory by the executable program.
ARM Nomenclature
ARM family
ARM [x][y][z][T][D][M][I][E][J][F][S][Z]
X Family belonging
Y Memory management protection unit
Z Available cache in processor
T Thumb mode
D Debug interface
M Multiplication unit (Multiplier)
I ICE (integrated circuit emulation) logic
-
Gives In-Circuit Emulation

Combination of software & hardware unit
Set watch pointer and break pointer.
E DSP enhancements
If only E is present then it is assumed that [T] [D] [I] are always present
J Java extension
JAZELLE (Java runtime environment)
F Floating point extension (co processor)
If floating point is present then you can connect or attach 15 co-processors.
S Synthesized core
-
Divides gate on. FPGA will be used for h/w implementation.

Source code that can be compiled into from C can be used by EDA
tools to select size of cache or any other h/w control modification.
Map a code program into IC unit.
Z- Trust Zone
-
Securing a system on internet.

Application will run into two systems. One is OS (non-secure) & another
is kernel (Secure).
ARM VERSIONS
Architecture Versions
(a) Ver.1 (1983-85): 26 bit address lines (no multiply or co-processor).
(b) Ver.2 : includes 32 bit result multiply co-processor;(32 bit databus,32 bit
(4Gbyte address space) and sixteen 32 bit registers. Simplest useful 32 bit
microprocessor in the world.
(c) Ver.3: 32 bit addressing, cache memory, co-processors (15) for cache register.
(d) Ver. 4: add signed, unsigned half word and signed byte and load and store
instruction.
(e) Ver. 4t: 16 bit thumb compressed form of instruction introduced.
(f) Ver. 5t: superset of 4t adding new instructions.
(g) Ver. 5te: add signal processing signal extension.
ARM6
Separate CPSR/SPSR undefined instructions & abort, MMU support
Virtual memory (extending RAM)
In late 1980s apple computer and LSI technology started working with acorn
on newer version of the ARM core.
ARM7 ARM7TDMI
Van Neumann (8k cache) architecture
32 bit embedded processor, MMU (3stage pipeline)
Strong ARM (power saving)
Broadcom (BCM 2121 processor)
Built in GPS protocol
ARM9E-S
More instruction for stage change
Enhance multiplier, DSP instruction, Fast MAC
ARM9
Offered Harvard architecture
Offered 5 stage pipeline
ARM9TDMI
ARM9E
1997
ARM9ES
Thumb/ARM enhanced
DSP instruction
Fast MAC
ARM IDE
1999
X-Scale (Intel) v5TE
Run at 1GHz ; MMU, Harvard Architecture.
ARM11
2003
Multiprocessor instructions
Multimedia instruction
Cortex
Pipelining Concept
It uses 3 stage pipelining
3 stage pipeline
a) 1st stage :Cycle1:- fetch instruction 1
nd
b) 2 stage
Cycle 2:- Fetch 2nd instruction &
Decode 1st instruction
rd
c) 3 stage
Cycle 3:- Fetch 3rd instruction
Decode 2nd instruction
Execute 1st instruction
ARM9
5 stage pipeline
Cycle1:- Fetch 1st inst
Cycle 2:- Fetch 2nd instruction, Decode 1st instr.
Cycle3:- Fetch 3rd instr., Decode 2nd inst & execute 3rd inst
Cycle4:- Buffer data: -> Access data memory or buffer

Cycle 5:- Write back to register file.
OR
a)
b)
c)
d)
e)
Fetch
Decode
Execute
Buffer data
Write back
Example:
TIME
CYCLE
1
2
3
fetch
ADD
SUB
CMP
decode
execute
ADD
SUB
ADD
Advantages:
An instruction is executed every cycle.
As pipeline length increases the amount of work done at each stage
decreases and hence processor attains high operating frequency.
System latency is also increases as it takes aim to fill pipeline before
execution can start.
Disadvantages
Length of pipeline increases which always increases data dependency.
While data dependency will be reduced b using a concept of instruction
scheduling
ARM architecture
Harvard Architecture
Architecture
Van Neumann
1. Used in DSP and other

processor found in latest
embedded systems and mobile
communication systems, audio
speech, image processing
systems.
Used in conventional
processors found in PCs and
servers and embedded
systems with only control
functions.
2. The data and program

memories are separate.
The data and program are

stored in the same memory.
3. The core is executed in parallel.
The code is executed serially

and takes more clock cycles.
4. It has MAC unit (Multiply

Accumulator)
There is no exclusive
multiplier.
5. Barrel shifter helps in shifting

and rotating operations of the
data.
No barrel shifter was made

available
6. The program tends to grow big

in size.
The program can be

optimized in lesser size.
e.g.: ARM (9 EJ)
e.g.: ARM7TDM
Some versions of ARM
1) ARM9TDMI
Dhrystone MIPS /MHz(Dhrystone increase processor performance)
It has 5 stage pipeline
Simultaneously access to instruction and data memory
It offers Harvard architecture
Increases available memory BW
Instruction memory interface
Data memory interface
2) ARM7TDMI
Mostly used in cell phones as processor
Broadcom(BCM) chip.(GPRS/GSM) single chip base band processor
ARM7TDMI(RISC)
Memory write interface
Voice recording recognition
Services
GSM/GPRS protocol stack
|
|
|
|
|
OAK(DSP)
single processing
Echo cancellation
speech algo noise suppression
Equalization
OMAP used in NOKIA series phones

Consists of ARM and DSP
3) ARM 11
Arm1136 J-S
8stage pipeline with incorporating separate load store architecture and
arithmetic pipeline
SIMP extension
Vector floating for fast floating point operation
4) X-Scale
Harvard Architecture
Works at 1GHz
Separate coprocessor for extension
Exhibits ARMv5TE instruction set
ARM architecture
RISC (Reduced Instruction Set Computer)
1)
2)
3)
4)
Fixed Instruction length 32 bit

Mostly single cycle instruction
Large uniform register
Load store architecture
5) Good speed/power consumption ratio

Enhanced RISC
1)
2)
3)
4)
5)
6)
7)
8)
9)
Control over ALU & shifter

L-S multiple instruction. Maximize the data throughput.
Auto increment and decrement mode (for loop)
Conditional execution of most instructions
Sequential memory access
Multiple register transfer
Inline barrel shifter
Having Thumb mode(16 bit) instruction
Enhance instruction (multiplier/saturation arithmetic)
1)
2)
3)
4)
5)
6)
7)
8)
ARM Registers
GPR hold either data or address
All registers are 32 bit
In user mode 16 data registers and 2 status registers are visible
Data registersr0 to r15
3 registers R13, R14, R15 perform special function.
R13-> SP(stack pointer)
R14-> LR(link register)
R15-> PC(program counter)
Link Register
1) Not stored in stack
2) If it is stored in stack then it will take time to complete process, which is not
granted in ES (system).
3) For return address when a subroutine is called.
4) It increases the processor speed.
Status Register
1)
2)
3)
4)
CPSR(current program status register)

32 bit register
Consist of Thumb mode of operation to be set or reset.
Mode of processor to be changed etc.
Status register
1) Flag
2) Status
3) Extension
4) Control
5) Mode(5 bit)
1) Conditional flag (32nd bit to 28th bit)
N-> negative flag
Z-> Zero flag
C-> carry flag
Read
V->overflow flag
2) Sticky overflow: (Q flag)
Q-> Related to saturation overflows
3) INT(interrupt flag)
DATA
Instruction decoder
7th bit and 6th bit
I=1 disable IRQ (interrupt request)
F=1 disable FIQ (Fast interrupt)
4) State bit (5th bit, T bit)
Rd, result
Write
Extend
T=0 -> ARM mode-> 32 bit Sign
instruction
T=1 -> Thumb mode -> 16 bit instruction
5) Mode (5 bit,0-4th bit)
Acc privilege mode
Process mode to control
(privilege
mode32&bit
non
Register
file (r0-r15)
(user))
H5
ARM data flow model
PC
Barrel shifter
MAC
A. L. U.
Address register
ARM core: HARVARD Architecture (Data flow representation)
Inc
Code
Read
DATA
Read
Write
Instruction decoder
Sign Extend
Rd, result
Register file (r0-r15) 32 bit

H5
PC
Barrel shifter
MAC
A. L. U.
Address register
Inc
ARM conditional Mnemonics

Mnemonics
Name
Conditional Flag
EQ
NE
CS
CC
MI
PL
VS
VC
HI
LS
GE
LT
GT
LE
AL
Equal
Not equal
Carry set
Carry clear
Minus
Plus
Overflow flag
Not overflow flag
Unsigned higher integer
Unsigned lower integer
Signed greater than equal
Signed less than
Signed greater than
Signed less than equal
Always
CACHE & TCM
Z
z
C
c
N
n
V
V
zC
Z or C
NV or nv
Nv or nV
NzV or nzv
Z or Nv or nV
Ignore
(tightly coupled memory)
The cache memory block fast memory placed between main memory and core.
It allows for more efficient fetches from some memory types.
The cache memory increases the performance gain.
It has two forms.
1) Von Neumann architecture
2) Harvard architecture
Disadvantages
These required good predictor because the main get certain memory, you
can fetch it and store it in the available cache memory & you can also
attach a tag.
Now if the processor will require instructions then if it is in cache then it
called tag hit and if it is not there then known as tag miss.
The good cache memory has high tag-hit rate.
Precision
For these problem we use LRU (least recently use) method.
Take out more frequently use data & put into cache.
A cache provides an overall increase in performance but at an expense of

predictable execution.
But for real time systems it is not that the code execution is
DETERMINISTIC: -- The time taken for loading and storing instructions or
data must be predicable. These can be achieved by using a form of
memory called tightly couple memory (TCM).
TCM is fast (SRAM) which is close to core and guarantees the clock cycle
required to fetch instructions or data. Critical for real time systems
provided the code execution is deterministic.
Now TCM appears in a memory in the address map and can be a fast
memory (used as predictable real time system)
By combining both the technologies, arm processor can have both
improved performance and predictable real time response.
EMBEDDED DEVICE (ARM Based)

Bus scheduling
Address of device/memory
Control of read and write both accesses.
AMBA bus protocol

-
The advanced microcontroller bus architecture (AMBA) most popular bus for
the ARM core.
The AMBA bus introduced the ARM system Bus (ASB) and ARM peripheral Bus
(APB).
Later on ARM introduces another bus design called ARM high performance
bus.
ASB
-
It is used in the interfacing all memory units, core and other units.
It is used for system purpose.
APB
AHB
This bus interface is generally used for I/O devices.

Like Ethernet card, keyboard or padetc.
It is used for providing higher data throughput then ASB because it is based
on a centralized multiplexed bus scheme rather than the ASB bidirectional
design.
Banked registers
R0
R1
R8
R2
R10
R3
R11
R4
R12
R5
R13
R6
R14
R7
R8
R9
R10
R11
R12
R13
R14
R15
Modes of ARM
1) ARM mode
2) Thumb mode
Thumb mode
16 bit instruction
High code density
(a space taken up in memory by an executable instruction)
Instructions in Thumb mode

ADD R0, #3
Thumb mode 2
It is a combine form of Thumb mode and ARM mode
Compatible with both
Now let us see how to switch
1) USER mode
As we know in user mode we cant access CPSR register

By using branch instruction
BX
-> R0=1 Thumb mode
BLX -> R0=0 ARM mode
2) Privilege mode
We can access CPSR register in this mode
Just change or set a bit T in the CPSR register.
Bit
operation result
T set
Thumb mode
T reset ARM mode
ARM Technologies
ARM item
Trust Zone
DSP
ARM
Intelligent Energy management

OS support
o Standards are available for power saving
Monitored all the devices and assume that at any point of time ,the
devices turn ON or OFF
TRUST ZONE
Consists of 2 OS
1) Simple OS
2) Kernel
The simple OS is a non-secure & kernel is known as secure process.
Eg: connected to internet so it can corrupted by any one. So to resist them
we can introduce secure zone as Trust zone.
Monitor
USER (OS) Non-secure
AAA
SSS
PPP
PPP
Kernel (Secure)
AAA
PPP
Application program
PPP
security application program
Both are orthogonal to each other.
DSP enhancement
MAC unit (Multiply & Arithmetic Control Unit)

Option of 32x16,16x16,32x32
16 instruction
Saturation arithmetic
Loading instruction.

ARM Notes1

Uploaded by

Copyright:

Available Formats

ARM Notes1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ARM Notes1

Uploaded by

Copyright:

Available Formats

ARM Processor

Advanced RISC Machine

Note - Code Density: Space taken up in memory by the executable program.

Gives In-Circuit Emulation

Divides gate on. FPGA will be used for h/w implementation.

Securing a system on internet.

Cycle4:- Buffer data: -> Access data memory or buffer

1. Used in DSP and other

2. The data and program

The data and program are

3. The core is executed in parallel.

The code is executed serially

4. It has MAC unit (Multiply

5. Barrel shifter helps in shifting

No barrel shifter was made

6. The program tends to grow big

The program can be

e.g.: ARM (9 EJ)

Some versions of ARM

OMAP used in NOKIA series phones

Fixed Instruction length 32 bit

5) Good speed/power consumption ratio

Control over ALU & shifter

CPSR(current program status register)

ARM data flow model

ARM core: HARVARD Architecture (Data flow representation)

Register file (r0-r15) 32 bit

ARM conditional Mnemonics

CACHE & TCM

(tightly coupled memory)

A cache provides an overall increase in performance but at an expense of

EMBEDDED DEVICE (ARM Based)

AMBA bus protocol

This bus interface is generally used for I/O devices.

Instructions in Thumb mode

As we know in user mode we cant access CPSR register

Intelligent Energy management

Both are orthogonal to each other.

MAC unit (Multiply & Arithmetic Control Unit)

You might also like