ARM Notes1
ARM Notes1
ARM Notes1
History
-
ARM Nomenclature
ARM family
ARM [x][y][z][T][D][M][I][E][J][F][S][Z]
X Family belonging
Y Memory management protection unit
Z Available cache in processor
T Thumb mode
D Debug interface
M Multiplication unit (Multiplier)
I ICE (integrated circuit emulation) logic
-
E DSP enhancements
If only E is present then it is assumed that [T] [D] [I] are always present
J Java extension
JAZELLE (Java runtime environment)
F Floating point extension (co processor)
If floating point is present then you can connect or attach 15 co-processors.
S Synthesized core
-
Z- Trust Zone
-
ARM VERSIONS
Architecture Versions
(a) Ver.1 (1983-85): 26 bit address lines (no multiply or co-processor).
(b) Ver.2 : includes 32 bit result multiply co-processor;(32 bit databus,32 bit
(4Gbyte address space) and sixteen 32 bit registers. Simplest useful 32 bit
microprocessor in the world.
(c) Ver.3: 32 bit addressing, cache memory, co-processors (15) for cache register.
(d) Ver. 4: add signed, unsigned half word and signed byte and load and store
instruction.
(e) Ver. 4t: 16 bit thumb compressed form of instruction introduced.
(f) Ver. 5t: superset of 4t adding new instructions.
(g) Ver. 5te: add signal processing signal extension.
ARM6
Separate CPSR/SPSR undefined instructions & abort, MMU support
Virtual memory (extending RAM)
In late 1980s apple computer and LSI technology started working with acorn
on newer version of the ARM core.
ARM7 ARM7TDMI
Van Neumann (8k cache) architecture
32 bit embedded processor, MMU (3stage pipeline)
Strong ARM (power saving)
Broadcom (BCM 2121 processor)
Built in GPS protocol
ARM9E-S
More instruction for stage change
Enhance multiplier, DSP instruction, Fast MAC
ARM9
Offered Harvard architecture
Offered 5 stage pipeline
ARM9TDMI
ARM9E
1997
ARM9ES
Thumb/ARM enhanced
DSP instruction
Fast MAC
ARM IDE
1999
X-Scale (Intel) v5TE
Run at 1GHz ; MMU, Harvard Architecture.
ARM11
2003
Multiprocessor instructions
Multimedia instruction
Cortex
Pipelining Concept
It uses 3 stage pipelining
3 stage pipeline
a) 1st stage :Cycle1:- fetch instruction 1
nd
b) 2 stage
Cycle 2:- Fetch 2nd instruction &
Decode 1st instruction
rd
c) 3 stage
Cycle 3:- Fetch 3rd instruction
Decode 2nd instruction
Execute 1st instruction
ARM9
5 stage pipeline
Cycle1:- Fetch 1st inst
Cycle 2:- Fetch 2nd instruction, Decode 1st instr.
Cycle3:- Fetch 3rd instr., Decode 2nd inst & execute 3rd inst
Fetch
Decode
Execute
Buffer data
Write back
Example:
TIME
CYCLE
1
2
3
fetch
ADD
SUB
CMP
decode
execute
ADD
SUB
ADD
Advantages:
An instruction is executed every cycle.
As pipeline length increases the amount of work done at each stage
decreases and hence processor attains high operating frequency.
System latency is also increases as it takes aim to fill pipeline before
execution can start.
Disadvantages
Length of pipeline increases which always increases data dependency.
While data dependency will be reduced b using a concept of instruction
scheduling
ARM architecture
Harvard Architecture
Architecture
Van Neumann
Used in conventional
processors found in PCs and
servers and embedded
systems with only control
functions.
There is no exclusive
multiplier.
e.g.: ARM7TDM
1) ARM9TDMI
Dhrystone MIPS /MHz(Dhrystone increase processor performance)
It has 5 stage pipeline
Simultaneously access to instruction and data memory
It offers Harvard architecture
Increases available memory BW
Instruction memory interface
Data memory interface
2) ARM7TDMI
Mostly used in cell phones as processor
Broadcom(BCM) chip.(GPRS/GSM) single chip base band processor
ARM7TDMI(RISC)
Memory write interface
Voice recording recognition
Services
GSM/GPRS protocol stack
|
|
|
|
|
OAK(DSP)
single processing
Echo cancellation
speech algo noise suppression
Equalization
3) ARM 11
Arm1136 J-S
8stage pipeline with incorporating separate load store architecture and
arithmetic pipeline
SIMP extension
Vector floating for fast floating point operation
4) X-Scale
Harvard Architecture
Works at 1GHz
Separate coprocessor for extension
Exhibits ARMv5TE instruction set
ARM architecture
RISC (Reduced Instruction Set Computer)
1)
2)
3)
4)
1)
2)
3)
4)
5)
6)
7)
8)
ARM Registers
GPR hold either data or address
All registers are 32 bit
In user mode 16 data registers and 2 status registers are visible
Data registersr0 to r15
3 registers R13, R14, R15 perform special function.
R13-> SP(stack pointer)
R14-> LR(link register)
R15-> PC(program counter)
Link Register
1) Not stored in stack
2) If it is stored in stack then it will take time to complete process, which is not
granted in ES (system).
3) For return address when a subroutine is called.
4) It increases the processor speed.
Status Register
1)
2)
3)
4)
Read
V->overflow flag
2) Sticky overflow: (Q flag)
Q-> Related to saturation overflows
3) INT(interrupt flag)
DATA
Instruction decoder
7th bit and 6th bit
I=1 disable IRQ (interrupt request)
F=1 disable FIQ (Fast interrupt)
4) State bit (5th bit, T bit)
Rd, result
Write
Extend
T=0 -> ARM mode-> 32 bit Sign
instruction
T=1 -> Thumb mode -> 16 bit instruction
5) Mode (5 bit,0-4th bit)
Acc privilege mode
Process mode to control
(privilege
mode32&bit
non
Register
file (r0-r15)
(user))
H5
PC
Barrel shifter
MAC
A. L. U.
Address register
Inc
Code
Read
DATA
Read
Write
Instruction decoder
Sign Extend
Rd, result
Barrel shifter
MAC
A. L. U.
Address register
Inc
Name
Conditional Flag
EQ
NE
CS
CC
MI
PL
VS
VC
HI
LS
GE
LT
GT
LE
AL
Equal
Not equal
Carry set
Carry clear
Minus
Plus
Overflow flag
Not overflow flag
Unsigned higher integer
Unsigned lower integer
Signed greater than equal
Signed less than
Signed greater than
Signed less than equal
Always
Z
z
C
c
N
n
V
V
zC
Z or C
NV or nv
Nv or nV
NzV or nzv
Z or Nv or nV
Ignore
The cache memory block fast memory placed between main memory and core.
It allows for more efficient fetches from some memory types.
The cache memory increases the performance gain.
It has two forms.
1) Von Neumann architecture
2) Harvard architecture
Disadvantages
These required good predictor because the main get certain memory, you
can fetch it and store it in the available cache memory & you can also
attach a tag.
Now if the processor will require instructions then if it is in cache then it
called tag hit and if it is not there then known as tag miss.
The good cache memory has high tag-hit rate.
Precision
For these problem we use LRU (least recently use) method.
Take out more frequently use data & put into cache.
The advanced microcontroller bus architecture (AMBA) most popular bus for
the ARM core.
The AMBA bus introduced the ARM system Bus (ASB) and ARM peripheral Bus
(APB).
Later on ARM introduces another bus design called ARM high performance
bus.
ASB
-
It is used in the interfacing all memory units, core and other units.
It is used for system purpose.
APB
AHB
It is used for providing higher data throughput then ASB because it is based
on a centralized multiplexed bus scheme rather than the ASB bidirectional
design.
Banked registers
R0
R1
R8
R2
R10
R3
R11
R4
R12
R5
R13
R6
R14
R7
R8
R9
R10
R11
R12
R13
R14
R15
Modes of ARM
1) ARM mode
2) Thumb mode
Thumb mode
16 bit instruction
High code density
(a space taken up in memory by an executable instruction)
ARM Technologies
ARM item
Trust Zone
DSP
ARM
TRUST ZONE
Consists of 2 OS
1) Simple OS
2) Kernel
The simple OS is a non-secure & kernel is known as secure process.
Eg: connected to internet so it can corrupted by any one. So to resist them
we can introduce secure zone as Trust zone.
Monitor
USER (OS) Non-secure
AAA
SSS
PPP
PPP
Kernel (Secure)
AAA
PPP
Application program
PPP
security application program
DSP enhancement