CMC Arm
CMC Arm
CMC Arm
Embedded system: is a system whose principal function is not computational, but which is controlled by a computer embedded within it.
Examples: Refrigerator
Application Areas
TV stereo remote control phone / mobile phone refrigerator microwave washing machine electric tooth brush oven / rice or bread cooker watch alarm clock electronic musical instruments electronic toys (stuffed animals,handheld toys, pinballs, etc.) medical home equipment (e.g. blood pressure, thermometer) [PDAs?? More like standard computer system] Consumer Products
System
application OS & middleware micro Processor Memory system peripherals controller
Size Power consumption Throughput Code Density Code Exe. Speed
SW system HW system
4K ROM Internal, up to 64K external 128 Bytes User RAM Internal, up to 64K external (addressing shared with ROM) 2 Timers 32 I/O pins 1 Serial Port
6 Interrupt Sources
ARM 7(LPC2148) 16-bit/32-bit ARM7TDMI-S microcontroller in a tiny LQFP64 package 8 kB to 40 kB of on-chip static RAM and 32 kB to 512 kB of on-chip flash memory
2 Timers(32 bit)
2 Serial Ports
Architecture Revisions
ARMv7 version ARM1156T2F-S ARM1136JF-S ARMv6 ARM102xE XScaleTM ARMv5
ARM7TDMI-S StrongARM
ARM1176JZF-S ARM1026EJ-S
V4 SC100 ARM720T
1994
1996
1998
2000
2002
2004
2006 time
ARM7 TDMI-S
NXP LPC2148
Current low-end ARM core for applications like digital mobile phones etc TDMI
T: Thumb, 16-bit compressed instruction set D: on-chip Debug support, enabling the processor to halt in response to a debug request M: enhanced Multiplier, yield a full 64-bit result, high performance I: Embedded ICE hardware
Byte means 8 bits Half word means 16 bits (two bytes) Word means 32 bits (four bytes)
Processor Modes
FIQ
r8 r9 r10 r11 r12 r13 (sp) r14 (lr)
IRQ
SVC
Undef
Abort
r7 r8 r9 r10 r11 r12 r13 (sp) (sp) r13 r14 (lr) (lr) r14 r15 (pc)
cpsr spsr spsr
spsr
spsr
spsr
spsr
spsr
FIQ
IRQ
SVC
Undef
Abort
r15 (pc)
cpsr spsr spsr spsr spsr spsr
The Registers
1 dedicated program counter 1 dedicated current program status register 5 dedicated saved program status registers 30 general purpose registers
The current processor mode governs which of several banks is accessible. Each mode can access
a particular set of r0-r12 registers a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
N Z C V Q
J U
I F T
mode
T Bit
Mode bits
J bit
All instructions are 32 bits wide All instructions must be word aligned Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be half word or byte aligned)
All instructions are 16 bits wide All instructions must be half word aligned Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned)
All instructions are 8 bits wide Processor performs a word access to read 4 instructions at once
Exceptions
Exceptions are usually used to handle unexpected events which arise during the execution of a program, such as interrupts or memory faults, also cover software interrupts, undefined instruction traps, and the system reset Three groups:
Software interrupts, undefined instructions, and prefetch abort Data aborts Reset, IRQ and FIQ
If core currently in Thumb state then ARM state is entered. Disable IRQs by setting bit 7 If the exception is a fast interrupt, disable further faster interrupt by setting bit 6 of the CPSR
Exception Reset Data abort (data access memory fault) FIQ (fast interrupt ) IRQ (normal interrupt)
Prefetch abort (instruction fetch memory fault) Abort Undefined instruction Software interrupt (SWI) UND SVC
Normally the vector address contains a branch to the relevant routine Exception handler use r13_<mode> and r14_<mode> to hold the stack point and return address
On entry
4) Saves the old value of the CPSR in the SPSR of the new mode
5) Disables IRQ exception; set bit 7 of the CPSR 6) If it a FIQ exception, disable further FIQ; disables bit 6 of the CPSR 7) Forces the PC to the address of exception handler
Exception Return
Once the exception has been handled, the user task is normally resumed The sequence is
Any
modified user registers must be restored from the handlers stack CPSR must be restored from the appropriate SPSR PC must be changed back to the relevant instruction address
On exit
1) Restores user registers 2) Restores the CPSR using the SPSR 3) set proper return address to PC
If step 2) is performed prior to step 3), then since lower bits of the CPSR determines the operating mode, restoring the CPSR makes it impossible to access the banked r14 If step 3) is performed prior to step 2), exception handler loses the control and the code to perform step 2) is never accessed
Conditional Execution
Most instruction sets only allow branches to be executed conditionally. However by reusing the condition evaluation hardware, ARM effectively increase number of instruction All instructions contain a condition field which determines whether the CPU will execute them Non-executed instruction still take up 1 cycle To allow other stages in the pipeline to complete This reduces the number of branches which would stall the pipeline Allows very dense in-line code The time penalty of not executing several conditional instructions is frequently less than overhead of the branch or instruction call that would otherwise be
Condition code
31 28 27 0
con d
Opcode [31:28] 0000 0001 Mnemonic extension EQ NE Interpretation Equal / equals zero Not equal Status flag state for execution Z set Z clear
0010
0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110
CS/HS
CC/LO MI PL VS VC HI LS GE LT GT LE AL
C set
C clear N set N clear V set V clear C set and Z clear C clear or Z set N equals V N is not equal to V Z clear and N equals V Z sets or N is not equal to V any
1111
NV
none
Use a sequence of several conditional instructions if (a==0) func(1); CMP MOVEQ BLEQ r0,#0 r0,#1 func
Set the flags, then use various condition codes if (a==0) x=0; if (a>0) x=1; CMP MOVEQ MOVGT r0,#0 r1,#0 r1,#1
Use conditional compare instructions if (a==4 || a==10) x=0; CMP CMPNE MOVEQ r0,#4 r0,#10 r1,#0
An unusual feature of the ARM instruction set is that conditional execution applies no only to branches but to all CMP r0,#5 ARM instructions CMP r0,#5
BEQ Bypass ;if (r0!=5) ADDNE r1,r1,r0 SUBNE r1,r1,r2 ADD r1,r1,r0 ;{r1=r1+r0}
SUB r1,r1,r2
Bypass
Whenever the conditional sequence is 3 instructions for fewer it is better (smaller and faster) to exploit conditional CMP r0,r1 execution than to use a branch
if((a==b)&&(c==d)) e++; CMPEQ r2,r3 ADDEQ r4,r4,#1
To cause the condition flags to be updated, the S bit of the instruction needs to be set by postfixing the instruction (and any condition codes) with an S.
For example to add two numbers and set the condition flags:
ARM instruction set A compressed form of the instruction set, a subset of the full ARM instruction set is encoded into 16-bit instructions Thumb instruction Some ARM cores support instruction set extensions to enhance signal processing capabilities
Load-store architecture
Process values which are in registers Load, store instructions for memory data accesses
3-address data processing instructions Conditional execution of every instruction Load and store multiple registers Shift, ALU operation in a single instruction Open instruction set extension through the coprocessor instruction Very dense 16-bit compressed instruction set (Thumb)
Instructions
Consist of
Arithmetic (ADD, SUB, RSB) Logical (BIC, AND) Compare (CMP, TST) Register movement (MOV, MVN)
All operands are 32-bit wide; come from registers or specified as literal in the instruction itself Second operand sent to ALU via barrel shifter 32-bit result placed in register; long multiply instruction produces 64-bit result 3-address instruction format
2 source operands and 1 destination register One source is always a register, the second may be a register, a
Allows direct control of whether or not the condition codes are affected by S bit (condition code unchanged when S = 0)
N = 1 if the result is negative; 0 otherwise (i.e. N = bit 31 of the result) Z = 1 if the result is zero; 0 otherwise C = 1 carry out from the ALU when ADD, ADC, SUB, SBC, RSB, RSC, CMP, or CMN; carry out from the shifter V = 1 if overflow from bit 30 to bit 31; 0 if no overflow (V is preserved in non-arithmetic operations)
PC may be used as a source operand (address of the instruction plus 8) except when a register-specified shift amount is used PC may be specified as the destination register, the instruction is a form of branch (return from a subroutine)
cond
00
opcode
Rn
Rd
operand 2
destination register first operand register set condition codes arithmetic/logic function
25 11 8 7 0
1
immediate alignment
11
#rot
8-bit immediate
5 4
#shift
25
Sh
Rm
Rs
register shift length
Sh
Rm
Comparison Operations
Not
produce result; omit the destination from the format Just set the condition code bits (N, Z, C and V) in CPSR
Immediate Operands
Replace the second source operand with an immediate operand, which is a literal constant, preceded by #
ADD r3,r3,#1 AND r8,r7,#&FF &:hexadecimal ;r3:=r3+1 ;r8:=r7[7:0],
Since the immediate value is coded within the 0 n 12 to 32 bits of the instruction, it is not possible enter every possible 32-bit value as an immediate.
ADD r3,r2,r2,LSL#3 := r2 + 8 * r1
;r3
00000 00000
LSL #5
31 0 0 31 1
LSR #5
0
LSL: Logical Shift Left by 0 to 31 places, 0 filled at the lsb end LSR, ASL (Arithmetic Shift Left), ASR, ROR (Rotate Right), RRX (Rotate Right eXtended by 1 place) ADD r5,r5,r3,LSL r2 ; r5:=r5+r3*2r2 MOV r12,r4,ROR r3 ;r12:=r4 rotated right by value of r3
00000 0
11111 1
ROR #5
RRX
Using the Barrel Shifter: The Second Operand Register, optionally with shift operation
Operand 1
Operand 2
Barrel Shifter
Shift value can be either be: 5 bit unsigned integer Specified in bottom byte of another register. Used for multiplication by constant
ALU
Result
Immediate value 8 bit number, with a range of 0255. Rotated right through even number of positions Allows increased range of 32-bit constants to be loaded directly into registers
32-bit product (Least Significant) MUL{<cond>}{S} Rd,Rm,Rs MLA{<cond>}{S} Rd,Rm,Rs,Rn MUL r4,r3,r2; r4:=(r3*r2)[31:0] MLA r4,r3,r2,r1; r4:=(r3*r2+r1)[31:0]
64-bit Product <mul>{<cond>}{S} RdHi,RdLo,Rm,Rs <mul> is UMULL,UMLAL,SMULL,SMLAL
Mnemonic MUL MLA UMULL UMLAL SMULL SMLAL Meaning Multiply (32-bit result) Multiply-accumulate (32-bit result) Unsigned multiply long Unsigned multiply-accumulate long Signed multiply long Signed multiply-accumulate long Effect Rd := (Rm * Rs) [31:0] Rd := (Rm * Rs + Rn) [31:0] RdHi:RdLo := Rm * Rs RdHi:RdLo += Rm * Rs RdHi:RdLo := Rm * Rs RdHi:RdLo += Rm * Rs
will early terminate wherever possible On ARM7TDMI Mul will execute in minimum of 2 clock cycles and maximum of 5 clock cycles
Restrictions on use:
Rd
and Rm cannot be the same register (can be avoided by swapping over Rm and Rs multiplication is commutative)
Multiplication by a Constant
r0,r1,r1,LSL #2 ;r0:=r1+r1*4
Can combine several instruction to carry out other multiplies Example: r2 = r3 * 119 Example: r2 = r3 * 17 * 7
No single ARM instruction can load a 32-bit immediate constant directly into a register
All ARM instructions are 32-bit long ARM instructions do not use the instruction stream as data
The data processing instruction format has 12 bits available for operand 2
Instead it is used to store 8-bit constants, give a range of 0255 These 8 bits can then be rotated right through an even number of positions This gives a much larger range of constants that can be directly loaded, through some constants will still need to be loaded from memory
To load a constant, simply move the required value into a register the assembler will convert to the rotate form for us MOV r0,#4096 ;MOV r0,#&1000 (0x40 ror 26) The bitwise complements can also be formed using MVN: MOV r0,#&FFFFFFFF ;MVN r0,#0 Value that cannot be generated in this way will cause an error
LDR Rd,=const Produce a MOV or MVN instruction to generate the value (if possible) or Generate a LDR instruction with a PC-relative address to read the constant from a literal pool (constant data area embedded in the code) MOV r0,=&FF LDR r0,=&55555555 ;MOV r0,#0xFF ;LDR r0,[PC,#Imm10]
For example
Three basic forms to move data between ARM registers and memory
Word transfer
LDR / STR
LDRB / STRB
Byte transfer
Halfword transfer
LDRH / STRH
LDRSB / LDRSH
All of these can be conditionally executed by insert-ing the appropriate condition code after STR/LDR
LDREQB
Addressing mode
Base register
r0 r15
Register-Indirect Addressing
Other forms
Adding
As an example, a program which must copy data from TABLE1 to TABLE2, both of which are near to the code
Copy ADR r1,TABLE1 ;r1 points to TABLE1
ADR r2,TABLE2 ;r2 points to TABLE2 TABLE1
TABLE2
;<source>
;<destination>
Pre-indexing
LDR r0,[r1,#4] ;r0:=mem32[r1+4] Offset up to 4K, added or subtracted, (# -4)
Post-indexing
LDR r0,[r1],#4 ;r0:=mem32[r1], r1:=r1+4 Equivalent to a simple register-indirect load, but faster, less code space
Auto-indexing
LDR r0, [r1,#4]! ;r0:=mem32[r1+4], r1:=r1+4 No extra time, auto-indexing performed while the data is being fetched from memory
r1
0x20c
Offest
12
0x20c
r0
0x5 0x5
Source Register for STR 0x200
r1
Base Register 0x200
r1
0x20c
Offest
12
0x20c
r1
0x200 0x200 0x5
r0
0x5
Source Register for STR
The load and store multiple instructions (LDM/STM) allow between 1 and 16 registers to be transferred to or from memory
Order of register transfer cannot be specified, order in the list is insignificant Lowest register number is always transferred to/form lowest memory location accessed Any subset of the current bank of registers (default) Any subset of the user mode bank of registers when in a privileged mode (postfix instruction with a ^)
4 different addressing modes Base register can be optionally updated following the transfer (using !)
Moving block of data around memory Saving and restoring context stack STMIA/LDMIA: Increment After STMIB/LDMIB: Increment Before STMDA/LDMDA: Decrement After STMDB/LDMDB: Decrement Before
The direction that the base pointer moves through memory is given by the postfix to the STM/LDM instruction
Allow any subset (or all, r0 to r15) of the 16 registers to be transferred with a single instruction
LDMIA r1,{r0,r2,r5} ;r0:=mem32[r1] ;r2:=mem32[r1+4] ;r5:=mem32[r1+8]
Stack Processing
STMFD/LDMFD: Full Descending stack STMFA/LDMFA: Full Ascending stack STMED/LDMED: Empty Descending stack STMEA/LDMEA: Empty Ascending stack
Note: ARM Compilers will always use a Full descending stack Addressing modes
Ascending Desceding Full Empty LDMIB LDMED STMIA STMEA LDMIA LDMFD
101816
r9' r5 r1 r9 r0
r9'
r5 r1 r0
101816
101816
101816
Empty
100c16
r9
100c16
r9
r5 r1 r0
100c16
r9 r5 r1
100c16
Increment After
100016
100016
r9'
100016
r9'
r0
100016
Syntax
SWP{<cond>}{B} Rd,Rm,[Rn] Rd <- [Rn], [Rn] <- Rm
Combine a load and a store of a word or an unsigned byte in a single instruction Example
ADR r0,SEMAPHORE SWPB r1,r1,[r0]
;exchange byte
Syntax
MRS{<cond>}
Rd,CPSR|SPSR
The CPSR or the current mode SPSR is copied into the destination register. All 32 bits are copied.
Example
MRS r0,CPSR MRS r3,SPSR
Syntax
c the control field PSR[7:0] x the extension field PSR[15:8] s the status field PSR[23:16] f the flag field PSR[31:24]
Example
Set N, X, C, V flags
Branch Instructions
Syntax
cond
101 L
Link bit
0 = Branch 1 = Branch with Link
Condition field
The branch has a condition associated with it and it is only executed if the condition codes have the correct value taken or not taken
MOV r0,#0 ;initialize counter ADD r0,r0,#1 ;increment loop CMP r0,#10 ;compare with limit BNE Loop ;repeat if not equal ;else fail through
Loop counter
Call a subroutine
SUB
SWI{<cond>}<24-bit immediate>
Used for calls to the operating system and is often called a supervisor call It puts the processor into supervisor mode and begins executing instruction from address 0x08 (refer to P.21)
Save the address of the instruction after SWI in r14_svc Save the CPSR in SPSR_svc Enter supervisor mode and disable IRQs by setting CPSR[4:0] to 100112 and CPSR[7] to 1 Set PC to 0816 and begin executing the instruction there
The 24-bit immediate does not influence the operation of the instruction but may be interpreted by the system code
Supervisor Calls
The supervisor is a program which operates at a privileged level, which means that it can do things that a use-level program cannot do directly (e.g. input or output) SWI instruction
Software
Instruction Pipeline
The ARM7TDMI-S core uses a pipeline to increase the speed of the flow of instructions to the processor. This enables several operations to take place simultaneously The Program Counter (PC) points to the instruction being fetched rather than to the instruction being executed During normal operation, while one instruction is being executed, its successor is being decoded, and a third instruction is being fetched from memory
PC
PC Fetch
Instruction Fetched from Memory
PC - 4
PC - 2 Decode
Thumb only: Thumb instruction decompressed to ARM instruction
Instruction decoded
PC - 8
PC - 4
Execute
Registers read from Register Bank, Shift and ALU operations performed, Registers written back to Register Bank
Optimal Pipelining
In this example it takes 6 clock cycles to execute 6 instructions All operations are on registers (single cycle instructions) Clock cycles per instruction (CPI) = 1
Fetch Decode Fetch Execute Decode Execute
ADD SUB
MOV
AND ORR
Fetch
Decode
Fetch
Execute
Decode Fetch Execute Decode Execute
EOR
CMP RSB
Fetch
Decode
Fetch
Execute
Decode Fetch
1 Cycle
BL X
0x8000 0x8004
Fetch
Decode Fetch
Execute Decode
Linkret
Adjust
X
ADD SUB
0x8008
0x8FEC 0x8FF0
Fetch
Fetch Decode Fetch Execute Decode Execute Decode Fetch
MOV
AND
0x8FF4
0x8FF8
Fetch
1 Cycle