MIPS R4000 Microprocessor User's Manual: Joe Heinrich
MIPS R4000 Microprocessor User's Manual: Joe Heinrich
MIPS R4000 Microprocessor User's Manual: Joe Heinrich
User’s Manual
Second Edition
Joe Heinrich
1994 MIPS Technologies, Inc. All Rights Reserved.
First of all, special thanks go to Duk Chun for his patient help in supplying and
verifying the content of this manual; that this manual is technically correct is, in a
very large part, directly attributable to him.
Thanks also to the following people for supplying portions of this book: Shabbir
Latif, for, among other things, the exception handler flow charts, the description
of the output buffer edge-control logic, and the interrupts; once again, Duk Chun,
for his paper on R4000 processor synchronization support; Paul Ries, for
confirming the accuracy of sections describing the memory management and the
caches; John Mashey, for verifying the R4000 processor actually does employ the
64-bit architecture; Dave Ditzel, for raising the issue in the first place; and Mike
Gupta, for substantiating various aspects of the errata. Finally, thanks to Ed
Reidenbach for supplying a large portion of the parity and ECC sections of this
manual, and Michael Ngo for checking their accuracy.
Thanks also to the following folks for their technical assistance: Andy Keane,
Keith Garrett, Viggy Mokkarala, Charles Price, Ali Moayedian, George Hsieh,
Peter Fu, Stephen Przybylski, Michael Woodacre, and Earl Killian. Also to be
thanked are the people at fvn@world.std.com: Bill Tuthill, Barry Shein, Bob
Devine, and Alan Marr, for helping place RISC in a pecuniary perspective. Also,
thanks to the following people at the mystery_train@swim2birds news group: toma,
dan_sears, jharris@garnet, tut@cairo (again), and elvis@dalkey(mateo_b). Their night-
for-day netversations, fueled by caffeine, concerning the viability of the
cyberpsykinetic compute-core model helped form an important basis of this book.
On the editorial front, thanks once again to Ms. Robin Cowan, of the Consortium
of Editorial Arts for her labors in editing this manual. Thanks to Evelyn Spire for
slaving over that bottomless black well we refer to as an “Index.” Thanks also,
once again, to Karen Gettman, and Lisa Iarkowski at Prentice-Hall for their help.
On the artistic side, thanks to Jeanne Simonian, of the Creative department here
at Silicon Graphics, for the book cover design; and thanks to Pam Flanders for
providing MarCom tactical support.
Have we missed anyone? If so, here is where we apologize for doing so.
Joe Heinrich
April 1, 1993
Mt. View, California
Thanks go to Shabbir Latif, from whose errata the major part of this second
edition is derived. Thanks also to Charlie Price for, among other things, making
available his revision of the ISA.
On the production side, thanks to Kay Maitz, Beth Fraker, Molly Castor, Lynnea
Humphries, and Claudia Lohnes for their assistance at the center of the hurricane.
Joe Heinrich
joeh@sgi.com
April 1, 1994
Mt. View, California
This book describes the MIPS R4000 and R4400 family of RISC
microprocessors (also referred to in this book as processor).
Chapter 8 describes the signals that pass between the R4000 processor
and other components in a system. The signals discussed include the
System interface, the Clock/Control interface, the Secondary Cache
interface, the Interrupt interface, the Initialization interface, and the
JTAG interface.
Appendix A describes the R4000 CPU instructions, in both 32- and 64-
bit modes. The instruction list is given in alphabetical order.
A Note on Style
A brief note on some of the stylistic conventions used in this book: bits,
fields, and registers of interest from a software perspective are
italicized (such as Config register); signal names of more importance
from a hardware point of view are rendered in bold (such as Reset*).
A range of bits uses a colon as a separator; for instance, (15:0)
represents the 16-bit range that runs from bit 0, inclusive, through bit
15. (In some places an ellipsis may used in place of the colon for
visibility: (15...0).)
Preface
1
Introduction
Benefits of RISC Design........................................................................................... 2
Shorter Design Cycle ........................................................................................... 3
Effective Utilization of Chip Area ..................................................................... 3
User (Programmer) Benefits............................................................................... 3
Advanced Semiconductor Technologies .......................................................... 3
Optimizing Compilers......................................................................................... 4
MIPS RISCompiler Language Suite .................................................................. 5
Compatibility ............................................................................................................ 6
Processor General Features..................................................................................... 6
R4000 Processor Configurations ............................................................................ 7
R4400 Processor Enhancements ............................................................................. 7
R4000 Processor ........................................................................................................ 9
64-bit Architecture ............................................................................................... 9
Superpipeline Architecture ................................................................................ 11
System Interface ................................................................................................... 11
CPU Register Overview ...................................................................................... 12
CPU Instruction Set Overview........................................................................... 14
Data Formats and Addressing ........................................................................... 24
Coprocessors (CP0-CP2) ..................................................................................... 27
System Control Coprocessor, CP0................................................................. 27
Floating-Point Unit (FPU), CP1 ..................................................................... 30
Memory Management System (MMU)............................................................. 31
The Translation Lookaside Buffer (TLB) ...................................................... 31
Operating Modes ............................................................................................. 32
Cache Memory Hierarchy .............................................................................. 32
Primary Caches ................................................................................................ 33
Secondary Cache Interface ............................................................................. 33
2
CPU Instruction Set Summary
CPU Instruction Formats ........................................................................................ 36
Load and Store Instructions ............................................................................... 37
Scheduling a Load Delay Slot ........................................................................ 37
Defining Access Types .................................................................................... 37
Computational Instructions................................................................................ 39
64-bit Operations ............................................................................................. 39
Cycle Timing for Multiply and Divide Instructions................................... 40
Jump and Branch Instructions ........................................................................... 41
Overview of Jump Instructions ..................................................................... 41
Overview of Branch Instructions .................................................................. 41
Special Instructions.............................................................................................. 42
Exception Instructions......................................................................................... 42
Coprocessor Instructions .................................................................................... 42
3
The CPU Pipeline
CPU Pipeline Operation .......................................................................................... 44
CPU Pipeline Stages................................................................................................. 45
Branch Delay ............................................................................................................. 48
Load Delay ................................................................................................................ 48
Interlock and Exception Handling......................................................................... 49
Exception Conditions .......................................................................................... 52
Stall Conditions .................................................................................................... 53
Slip Conditions ..................................................................................................... 53
External Stalls ....................................................................................................... 53
Interlock and Exception Timing ........................................................................ 53
Backing Up the Pipeline ................................................................................. 54
Aborting an Instruction Subsequent to an Interlock .................................. 55
Pipelining the Exception Handling ................................................................... 56
Special Cases......................................................................................................... 58
Performance Considerations.......................................................................... 58
Correctness Considerations............................................................................ 58
R4400 Processor Uncached Store Buffer ............................................................... 59
4
Memory Management
Translation Lookaside Buffer (TLB) ...................................................................... 62
Hits and Misses .................................................................................................... 62
Multiple Matches ................................................................................................. 62
Address Spaces ......................................................................................................... 63
Virtual Address Space......................................................................................... 63
Physical Address Space....................................................................................... 64
Virtual-to-Physical Address Translation .......................................................... 64
32-bit Mode Address Translation ...................................................................... 65
64-bit Mode Address Translation ...................................................................... 66
Operating Modes ................................................................................................. 67
User Mode Operations................................................................................... 67
Supervisor Mode Operations........................................................................ 69
Kernel Mode Operations ............................................................................... 73
System Control Coprocessor .................................................................................. 80
Format of a TLB Entry ......................................................................................... 81
CP0 Registers ........................................................................................................ 84
Index Register (0) ............................................................................................. 85
Random Register (1)........................................................................................ 86
EntryLo0 (2), and EntryLo1 (3) Registers..................................................... 87
PageMask Register (5)..................................................................................... 87
Wired Register (6) ............................................................................................ 88
EntryHi Register (CP0 Register 10)............................................................... 89
Processor Revision Identifier (PRId) Register (15)...................................... 89
Config Register (16) ......................................................................................... 90
Load Linked Address (LLAddr) Register (17) ............................................ 93
Cache Tag Registers [TagLo (28) and TagHi (29)] ...................................... 93
Virtual-to-Physical Address Translation Process............................................ 95
TLB Misses ............................................................................................................ 97
TLB Instructions ................................................................................................... 97
5
CPU Exception Processing
How Exception Processing Works......................................................................... 100
Exception Processing Registers .............................................................................. 101
Context Register (4) ............................................................................................. 102
Bad Virtual Address Register (BadVAddr) (8) ................................................ 103
Count Register (9) ................................................................................................ 103
Compare Register (11)......................................................................................... 104
Status Register (12)............................................................................................... 105
Status Register Format .................................................................................... 105
Status Register Modes and Access States..................................................... 109
Status Register Reset ....................................................................................... 110
Cause Register (13) .............................................................................................. 110
Exception Program Counter (EPC) Register (14) ............................................ 112
WatchLo (18) and WatchHi (19) Registers ....................................................... 113
XContext Register (20)......................................................................................... 114
Error Checking and Correcting (ECC) Register (26)....................................... 115
Cache Error (CacheErr) Register (27) ................................................................ 116
Error Exception Program Counter (Error EPC) Register (30)........................ 118
Processor Exceptions ............................................................................................... 119
Exception Types ................................................................................................... 119
Reset Exception Process.................................................................................. 120
Cache Error Exception Process ...................................................................... 120
Soft Reset and NMI Exception Process......................................................... 121
General Exception Process ............................................................................. 121
Exception Vector Locations ................................................................................ 122
Priority of Exceptions .......................................................................................... 123
Reset Exception .................................................................................................... 124
Soft Reset Exception ............................................................................................ 125
Address Error Exception..................................................................................... 127
TLB Exceptions..................................................................................................... 128
TLB Refill Exception........................................................................................ 129
TLB Invalid Exception..................................................................................... 130
TLB Modified Exception................................................................................. 131
Cache Error Exception......................................................................................... 132
Virtual Coherency Exception ............................................................................. 133
Bus Error Exception ............................................................................................. 134
Integer Overflow Exception ............................................................................... 135
6
Floating-Point Unit
Overview ................................................................................................................... 152
FPU Features ............................................................................................................. 153
FPU Programming Model....................................................................................... 154
Floating-Point General Registers (FGRs).......................................................... 154
Floating-Point Registers ...................................................................................... 156
Floating-Point Control Registers ....................................................................... 157
Implementation and Revision Register, (FCR0) .............................................. 158
Control/Status Register (FCR31)....................................................................... 159
Accessing the Control/Status Register......................................................... 160
IEEE Standard 754 ........................................................................................... 161
Control/Status Register FS Bit....................................................................... 161
Control/Status Register Condition Bit ......................................................... 161
Control/Status Register Cause, Flag, and Enable Fields ........................... 161
Control/Status Register Rounding Mode Control Bits.............................. 163
Floating-Point Formats ............................................................................................ 164
Binary Fixed-Point Format...................................................................................... 166
Floating-Point Instruction Set Overview .............................................................. 167
Floating-Point Load, Store, and Move Instructions ........................................ 169
Transfers Between FPU and Memory........................................................... 169
Transfers Between FPU and CPU.................................................................. 169
Load Delay and Hardware Interlocks .......................................................... 169
Data Alignment................................................................................................ 170
Endianness........................................................................................................ 170
Floating-Point Conversion Instructions............................................................ 170
Floating-Point Computational Instructions ..................................................... 170
Branch on FPU Condition Instructions............................................................. 170
Floating-Point Compare Operations ................................................................. 171
FPU Instruction Pipeline Overview....................................................................... 172
Instruction Execution .......................................................................................... 172
Instruction Execution Cycle Time ..................................................................... 173
Scheduling FPU Instructions.............................................................................. 175
FPU Pipeline Overlapping.................................................................................. 175
Instruction Scheduling Constraints .............................................................. 176
Instruction Latency, Repeat Rate, and Pipeline Stage Sequences............. 181
Resource Scheduling Rules ............................................................................ 182
7
Floating-Point Exceptions
Exception Types........................................................................................................ 188
Exception Trap Processing...................................................................................... 189
Flags ........................................................................................................................... 190
FPU Exceptions......................................................................................................... 192
Inexact Exception (I) ............................................................................................ 192
Invalid Operation Exception (V)........................................................................ 193
Division-by-Zero Exception (Z) ......................................................................... 194
Overflow Exception (O) ...................................................................................... 194
Underflow Exception (U).................................................................................... 195
Unimplemented Instruction Exception (E) ...................................................... 196
Saving and Restoring State ..................................................................................... 197
Trap Handlers for IEEE Standard 754 Exceptions............................................... 198
8
R4000 Processor Signal Descriptions
System Interface Signals.......................................................................................... 201
Clock/Control Interface Signals ............................................................................ 203
Secondary Cache Interface Signals ........................................................................ 205
Interrupt Interface Signals ...................................................................................... 207
JTAG Interface Signals............................................................................................. 207
Initialization Interface Signals ................................................................................ 208
Signal Summary ....................................................................................................... 209
9
Initialization Interface
Functional Overview ............................................................................................... 214
Reset Signal Description.......................................................................................... 215
Power-on Reset..................................................................................................... 216
Cold Reset ............................................................................................................. 217
Warm Reset........................................................................................................... 217
Initialization Sequence............................................................................................. 218
Boot-Mode Settings .................................................................................................. 222
10
Clock Interface
Signal Terminology.................................................................................................. 228
Basic System Clocks ................................................................................................. 229
MasterClock .......................................................................................................... 229
MasterOut ............................................................................................................. 229
SyncIn/SyncOut................................................................................................... 229
PClock.................................................................................................................... 229
SClock .................................................................................................................... 230
TClock.................................................................................................................... 230
RClock.................................................................................................................... 230
PClock-to-SClock Division ................................................................................. 230
System Timing Parameters ..................................................................................... 233
Alignment to SClock............................................................................................ 233
Alignment to MasterClock ................................................................................. 233
Phase-Locked Loop (PLL)................................................................................... 233
Connecting Clocks to a Phase-Locked System..................................................... 234
Connecting Clocks to a System without Phase Locking..................................... 235
Connecting to a Gate-Array Device .................................................................. 235
Connecting to a CMOS Logic System ............................................................... 238
Processor Status Outputs ........................................................................................ 241
11
Cache Organization, Operation, and Coherency
Memory Organization ............................................................................................. 244
Overview of Cache Operations .............................................................................. 245
R4000 Cache Description......................................................................................... 246
Secondary Cache Size.......................................................................................... 248
Variable-Length Cache Lines ............................................................................. 248
Cache Organization and Accessibility .............................................................. 248
Organization of the Primary Instruction Cache (I-Cache)......................... 249
Organization of the Primary Data Cache (D-Cache) .................................. 250
Accessing the Primary Caches....................................................................... 251
Organization of the Secondary Cache .......................................................... 252
Accessing the Secondary Cache..................................................................... 254
Cache States............................................................................................................... 255
Primary Cache States........................................................................................... 256
Secondary Cache States....................................................................................... 256
Mapping States Between Caches ....................................................................... 257
Cache Line Ownership ............................................................................................ 258
Cache Write Policy ................................................................................................... 259
Cache State Transition Diagrams........................................................................... 260
Cache Coherency Overview ................................................................................... 264
Cache Coherency Attributes............................................................................... 264
Uncached .......................................................................................................... 265
Noncoherent ..................................................................................................... 265
Sharable............................................................................................................. 265
Update ............................................................................................................... 265
Exclusive ........................................................................................................... 266
Cache Operation Modes...................................................................................... 266
Secondary-Cache Mode .................................................................................. 266
No-Secondary-Cache Mode ........................................................................... 266
Strong Ordering ................................................................................................... 267
An Example of Strong Ordering.................................................................... 267
Testing for Strong Ordering........................................................................... 267
Restarting the Processor ................................................................................. 268
Maintaining Coherency on Loads and Stores ...................................................... 269
Manipulation of the Cache by an External Agent ............................................... 270
Invalidate............................................................................................................... 270
Update ................................................................................................................... 270
12
System Interface
Terminology.............................................................................................................. 294
System Interface Description.................................................................................. 294
Interface Buses...................................................................................................... 295
Address and Data Cycles ............................................................................... 296
Issue Cycles ...................................................................................................... 296
Handshake Signals.............................................................................................. 298
System Interface Protocols ...................................................................................... 299
Master and Slave States....................................................................................... 299
Moving from Master to Slave State ................................................................... 300
External Arbitration............................................................................................. 300
Uncompelled Change to Slave State ................................................................. 301
Processor and External Requests ........................................................................... 302
Rules for Processor Requests.............................................................................. 303
Processor Requests............................................................................................... 304
Processor Read Request .................................................................................. 306
Processor Write Request ................................................................................. 307
Processor Invalidate Request ......................................................................... 308
Processor Update Request.............................................................................. 310
Clusters.............................................................................................................. 311
External Requests................................................................................................. 313
External Read Request .................................................................................... 316
External Write Request ................................................................................... 316
External Invalidate Request ........................................................................... 316
External Update Request ................................................................................ 316
External Snoop Request .................................................................................. 317
External Intervention Request ....................................................................... 317
Read Response ................................................................................................. 317
Handling Requests ................................................................................................... 318
Load Miss .............................................................................................................. 318
Secondary-Cache Mode .................................................................................. 320
No-Secondary-Cache Mode ........................................................................... 320
Store Miss .............................................................................................................. 321
Secondary-Cache Mode .................................................................................. 323
No-Secondary-Cache Mode ........................................................................... 325
Store Hit................................................................................................................. 326
Secondary-Cache Mode .................................................................................. 326
13
Secondary Cache Interface
Data Transfer Rates .................................................................................................. 380
Duplicating Signals .................................................................................................. 380
Accessing a Split Secondary Cache........................................................................ 381
SCDChk Bus.............................................................................................................. 381
SCTAG Bus................................................................................................................ 381
Operation of the Secondary Cache Interface........................................................ 382
Read Cycles........................................................................................................... 383
4-Word Read Cycle.......................................................................................... 383
8-Word Read Cycle.......................................................................................... 384
Notes on a Secondary Cache Read Cycle..................................................... 384
Write Cycles.......................................................................................................... 385
4-Word Write Cycle......................................................................................... 385
8-Word Write Cycle......................................................................................... 386
Notes on a Secondary Cache Write Cycle .................................................... 387
14
JTAG Interface
What Boundary Scanning Is ................................................................................... 390
Signal Summary ....................................................................................................... 391
JTAG Controller and Registers............................................................................... 392
Instruction Register.............................................................................................. 392
Bypass Register..................................................................................................... 393
Boundary-Scan Register...................................................................................... 394
Test Access Port (TAP) ........................................................................................ 395
TAP Controller ................................................................................................. 396
Controller Reset ............................................................................................... 396
Controller States............................................................................................... 396
Implementation-Specific Details ............................................................................ 400
15
R4000 Processor Interrupts
Hardware Interrupts................................................................................................ 402
Nonmaskable Interrupt (NMI)............................................................................... 402
Asserting Interrupts................................................................................................. 402
16
Error Checking and Correcting
Error Checking in the Processor............................................................................. 408
Types of Error Checking ..................................................................................... 408
Parity Error Detection ..................................................................................... 408
SECDED ECC Code......................................................................................... 409
Error Checking Operation .................................................................................. 412
System Interface............................................................................................... 412
Secondary Cache Data Bus............................................................................. 412
System Interface and Secondary Cache Data Bus....................................... 412
Secondary Cache Tag Bus............................................................................... 413
System Interface Command Bus ................................................................... 413
SECDED ECC Matrices for Data and Tag Buses ............................................. 414
ECC Check Bits..................................................................................................... 414
Data ECC Generation .......................................................................................... 415
Detecting Data Transmission Errors ................................................................. 418
Single Data Bit ECC Error .............................................................................. 420
Single Check Bit ECC Error............................................................................ 421
Double Data Bit ECC Errors........................................................................... 422
Three Data Bit ECC Errors ............................................................................. 423
Four Data Bit ECC Errors ............................................................................... 424
Tag ECC Generation............................................................................................ 425
Summary of ECC Operations............................................................................. 426
R4400 Master/Checker Mode................................................................................. 430
Connecting a System in Lock Step .................................................................... 431
Master-Listener Configuration .......................................................................... 432
Cross-Coupled Checking Configuration .......................................................... 433
Fault Detection ..................................................................................................... 435
Reset Operation .................................................................................................... 436
Fault History......................................................................................................... 436
A
CPU Instruction Set Details
B
FPU Instruction Set Details
C
Subblock Ordering
F
Coprocessor 0 Hazards
G
R4000 Pinouts
Index
Optimizing Compilers
RISC architecture is designed so that the compilers, not assembly
languages, have the optimal working environment. RISC philosophy
assumes that high-level language programming is used, which contradicts
the older CISC philosophy that assumes assembly language programming
is of primary importance.
The trend toward high-level language instructions has led to the
development of more efficient compilers to convert high-level language
instructions to machine code. Primary measures of compiler efficiency are
the compactness of its generated code and the shortness of its execution
time.
During the development of more efficient compilers, analysis of
instruction streams revealed that the greatest amount of time was spent
executing simple instructions and performing load and store operations,
while the more complex instructions were used less frequently. It was also
learned that compilers produce code that is often a narrow subset of the
processor instruction set architecture (ISA). A compiler works more
efficiently with instructions that perform simple, well-defined operations
and generate minimal side-effects. Compilers do not use complex
instructions and features; the more complex, powerful instructions are
either too difficult for the compiler to employ or those instructions do not
precisely fit high-level language requirements.
Thus, a natural match exists between RISC architectures and efficient,
optimizing compilers. This match makes it easier for compilers to
generate the most effective sequences of machine instructions to
accomplish tasks defined by the high-level language.
1.2 Compatibility
The R4000 processor provides complete application software
compatibility with the MIPS R2000, R3000, and R6000 processors.
Although the MIPS processor architecture has evolved in response to a
compromise between software and hardware resources in the computer
system, the R4000 processor implements the MIPS ISA for user-mode
programs. This guarantees that user programs conforming to the ISA
execute on any MIPS hardware implementation.
† Features of the R4400 processor that differ from the R4000 processor are noted throughout
this book; for instance, R4400 processor enhancements are listed in the next section.
Otherwise, references to the R4000 processor may be taken to include the R4400 processor.
64-bit Architecture
The natural mode of operation for the R4000 processor is as a 64-bit
microprocessor; however, 32-bit applications maintain compatibility even
when the processor operates as a 64-bit processor.
The R4000 processor provides the following:
• 64-bit on-chip floating-point unit (FPU)
• 64-bit integer arithmetic logic unit (ALU)
• 64-bit integer registers
• 64-bit virtual address space
• 64-bit system bus
Figure 1-1 is a block diagram of the R4000 processor internals.
Memory Management
Registers Load Aligner/Store Driver FP Multiplier
Pipeline Control
Superpipeline Architecture
The R4000 processor exploits instruction parallelism by using an eight-
stage superpipeline which places no restrictions on the instruction issued.
Under normal circumstances, two instructions are issued each cycle.
The internal pipeline of the R4000 processor operates at twice the
frequency of the master clock, as discussed in Chapter 3. The processor
achieves high throughput by pipelining cache accesses, shortening
register access times, implementing virtual-indexed primary caches, and
allowing the latency of functional units to span more than one pipeline
clock cycles.
System Interface
The R4000 processor supports a 64-bit System interface that can construct
uniprocessor systems with a direct DRAM interface—with or without a
secondary cache—or cache-coherent multiprocessor systems. The System
interface includes:
• a 64-bit multiplexed address and data bus
• 8 check bits
• a 9-bit parity-protected command bus
• 8 handshake signals
The interface is capable of transferring data between the processor and
memory at a peak rate of 400 Mbytes/second, when running at 50 MHz.
31 26 25 21 20 16 15 0
I-Type (Immediate) op rs rt immediate
31 26 25 0
J-Type (Jump) op target
31 26 25 21 20 16 15 11 10 6 5 0
R-Type (Register) op rs rt rd sa funct
The instruction set can be further divided into the following groupings:
• Load and Store instructions move data between memory and
general registers. They are all immediate (I-type) instructions,
since the only addressing mode supported is base register plus
16-bit, signed immediate offset.
• Computational instructions perform arithmetic, logical, shift,
multiply, and divide operations on values in registers. They
include register (R-type, in which both the operands and the
result are stored in registers) and immediate (I-type, in which
one operand is a 16-bit immediate value) formats.
• Jump and Branch instructions change the control flow of a
program. Jumps are always made to a paged, absolute address
formed by combining a 26-bit target address with the high-
order bits of the Program Counter (J-type format) or register
address (R-type format). Branches have 16-bit offsets relative
to the program counter (I-type). Jump And Link instructions
save their return address in register 31.
• Coprocessor instructions perform operations in the
coprocessors. Coprocessor load and store instructions are
I-type.
• Coprocessor 0 (system coprocessor) instructions perform
operations on CP0 registers to control the memory
management and exception handling facilities of the processor.
These are listed in Table 1-18.
• Special instructions perform system calls and breakpoint
operations. These instructions are always R-type.
• Exception instructions cause a branch to the general exception-
handling vector based upon the result of a comparison. These
instructions occur in both R-type (both the operands and the
result are registers) and I-type (one operand is a 16-bit
immediate value) formats.
Chapter 2 provides a more detailed summary and Appendix A gives a
complete description of each instruction.
Tables 1-2 through 1-17 list CPU instructions common to MIPS R-Series
processors, along with those instructions that are extensions to the
instruction set architecture. The extensions result in code space
reductions, multiprocessor support, and improved performance in
operating system kernel code sequences—for instance, in situations where
run-time bounds-checking is frequently performed. Table 1-18 lists CP0
instructions.
OpCode Description
LB Load Byte
LBU Load Byte Unsigned
LH Load Halfword
LHU Load Halfword Unsigned
LW Load Word
LWL Load Word Left
LWR Load Word Right
SB Store Byte
SH Store Halfword
SW Store Word
SWL Store Word Left
SWR Store Word Right
OpCode Description
ADDI Add Immediate
ADDIU Add Immediate Unsigned
SLTI Set on Less Than Immediate
SLTIU Set on Less Than Immediate Unsigned
ANDI AND Immediate
ORI OR Immediate
XORI Exclusive OR Immediate
LUI Load Upper Immediate
OpCode Description
ADD Add
ADDU Add Unsigned
SUB Subtract
SUBU Subtract Unsigned
SLT Set on Less Than
SLTU Set on Less Than Unsigned
AND AND
OR OR
XOR Exclusive OR
NOR NOR
OpCode Description
MULT Multiply
MULTU Multiply Unsigned
DIV Divide
DIVU Divide Unsigned
MFHI Move From HI
MTHI Move To HI
MFLO Move From LO
MTLO Move To LO
OpCode Description
J Jump
JAL Jump And Link
JR Jump Register
JALR Jump And Link Register
BEQ Branch on Equal
BNE Branch on Not Equal
BLEZ Branch on Less Than or Equal to Zero
BGTZ Branch on Greater Than Zero
BLTZ Branch on Less Than Zero
BGEZ Branch on Greater Than or Equal to Zero
BLTZAL Branch on Less Than Zero And Link
BGEZAL Branch on Greater Than or Equal to Zero And Link
OpCode Description
SLL Shift Left Logical
SRL Shift Right Logical
SRA Shift Right Arithmetic
SLLV Shift Left Logical Variable
SRLV Shift Right Logical Variable
SRAV Shift Right Arithmetic Variable
OpCode Description
LWCz Load Word to Coprocessor z
SWCz Store Word from Coprocessor z
MTCz Move To Coprocessor z
MFCz Move From Coprocessor z
CTCz Move Control to Coprocessor z
CFCz Move Control From Coprocessor z
COPz Coprocessor Operation z
BCzT Branch on Coprocessor z True
BCzF Branch on Coprocessor z False
OpCode Description
SYSCALL System Call
BREAK Break
OpCode Description
LD Load Doubleword
LDL Load Doubleword Left
LDR Load Doubleword Right
LL Load Linked
LLD Load Linked Doubleword
LWU Load Word Unsigned
SC Store Conditional
SCD Store Conditional Doubleword
SD Store Doubleword
SDL Store Doubleword Left
SDR Store Doubleword Right
SYNC Sync
OpCode Description
DADDI Doubleword Add Immediate
DADDIU Doubleword Add Immediate Unsigned
OpCode Description
DMULT Doubleword Multiply
DMULTU Doubleword Multiply Unsigned
DDIV Doubleword Divide
DDIVU Doubleword Divide Unsigned
OpCode Description
BEQL Branch on Equal Likely
BNEL Branch on Not Equal Likely
BLEZL Branch on Less Than or Equal to Zero Likely
BGTZL Branch on Greater Than Zero Likely
BLTZL Branch on Less Than Zero Likely
BGEZL Branch on Greater Than or Equal to Zero Likely
BLTZALL Branch on Less Than Zero And Link Likely
Branch on Greater Than or Equal to Zero And Link
BGEZALL
Likely
BCzTL Branch on Coprocessor z True Likely
BCzFL Branch on Coprocessor z False Likely
OpCode Description
DADD Doubleword Add
DADDU Doubleword Add Unsigned
DSUB Doubleword Subtract
DSUBU Doubleword Subtract Unsigned
OpCode Description
DSLL Doubleword Shift Left Logical
DSRL Doubleword Shift Right Logical
DSRA Doubleword Shift Right Arithmetic
DSLLV Doubleword Shift Left Logical Variable
DSRLV Doubleword Shift Right Logical Variable
DSRAV Doubleword Shift Right Arithmetic Variable
DSLL32 Doubleword Shift Left Logical + 32
DSRL32 Doubleword Shift Right Logical + 32
DSRA32 Doubleword Shift Right Arithmetic + 32
OpCode Description
TGE Trap if Greater Than or Equal
TGEU Trap if Greater Than or Equal Unsigned
TLT Trap if Less Than
TLTU Trap if Less Than Unsigned
TEQ Trap if Equal
TNE Trap if Not Equal
TGEI Trap if Greater Than or Equal Immediate
Trap if Greater Than or Equal Immediate
TGEIU
Unsigned
TLTI Trap if Less Than Immediate
TLTIU Trap if Less Than Immediate Unsigned
TEQI Trap if Equal Immediate
TNEI Trap if Not Equal Immediate
OpCode Description
DMFCz Doubleword Move From Coprocessor z
DMTCz Doubleword Move To Coprocessor z
LDCz Load Double Coprocessor z
SDCz Store Double Coprocessor z
OpCode Description
DMFC0 Doubleword Move From CP0
DMTC0 Doubleword Move To CP0
MTC0 Move to CP0
MFC0 Move from CP0
TLBR Read Indexed TLB Entry
TLBWI Write Indexed TLB Entry
TLBWR Write Random TLB Entry
TLBP Probe TLB for Matching Entry
CACHE Cache Operation
ERET Exception Return
Bit #
Higher Word
Address Address 31 24 23 16 15 8 7 0
12 15 14 13 12
8 11 10 9 8
4 7 6 5 4
Lower 0 3 2 1 0
Address
Figure 1-5 Little-Endian Byte Ordering
In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit
designations are always little-endian (although no instructions explicitly
designate bit positions within words).
Figures 1-6 and 1-7 show little-endian and big-endian byte ordering in
doublewords.
Bit # 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
Byte # 7 6 5 4 3 2 1 0
Halfword Byte
Bit # 7 6 5 4 3 2 1 0
Bits in a Byte
Figure 1-6 Little-Endian Data in a Doubleword
Bit # 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
Byte # 0 1 2 3 4 5 6 7
Halfword Byte
Bit # 7 6 5 4 3 2 1 0
Bits in a Byte
The CPU uses byte addressing for halfword, word, and doubleword
accesses with the following alignment constraints:
• Halfword accesses must be aligned on an even byte boundary
(0, 2, 4...).
• Word accesses must be aligned on a byte boundary divisible by
four (0, 4, 8...).
• Doubleword accesses must be aligned on a byte boundary
divisible by eight (0, 8, 16...).
The following special instructions load and store words that are not
aligned on 4-byte (word) or 8-word (doubleword) boundaries:
LWL LWR SWL SWR
LDL LDR SDL SDR
These instructions are used in pairs to provide addressing of misaligned
words. Addressing misaligned data incurs one additional instruction
cycle over that required for addressing aligned data.
Figures 1-8 and 1-9 show the access of a misaligned word that has byte
address 3.
Higher
Address Bit #
31 24 23 16 15 8 7 0
4 5 6
3
Lower
Address
Figure 1-8 Big-Endian Misaligned Word Addressing
Higher
Address Bit #
31 24 23 16 15 8 7 0
6 5 4
3
Lower
Address
Coprocessors (CP0-CP2)
The MIPS ISA defines three coprocessors (designated CP0 through CP2):
• Coprocessor 0 (CP0) is incorporated on the CPU chip and
supports the virtual memory system and exception handling.
CP0 is also referred to as the System Control Coprocessor.
• Coprocessor 1 (CP1) is reserved for the on-chip, floating-point
coprocessor, the FPU.
• Coprocessor 2 (CP2) is reserved for future definition by MIPS.
CP0 and CP1 are described in the sections that follow.
Index 0 Config 16
Random 1 LLAddr 17
EntryLo0 2 WatchLo 18
EntryLo1 3 WatchHi 19
Context 4 XContext 20
PageMask 5 21
Wired 6 22
7 23
BadVAddr 8 24
Count 9 25
EntryHi 10 ECC 26
Compare 11 CacheErr 27
SR 12 TagLo 28
Cause 13 TagHi 29
EPC 14 ErrorEPC 30
PRId 15 31
Instruction TLB
The R4000 processor has a two-entry instruction TLB (ITLB) which assists
in instruction address translation. The ITLB is completely invisible to
software and exists only to increase performance.
Joint TLB
An address translation value is tagged with the most-significant bits of its
virtual address (the number of these bits depends upon the size of the
page) and a per-process identifier. If there is no matching entry in the TLB,
an exception is taken and software refills the on-chip TLB from a page
table resident in memory; this TLB is referred to as the joint TLB (JTLB)
because it contains both data and instructions jointly. The JTLB entry to
be rewritten is selected at random.
Operating Modes
The R4000 processor has three operating modes:
• User mode
• Supervisor mode
• Kernel mode
The manner in which memory addresses are translated or mapped depends
on the operating mode of the CPU; this is described in Chapter 4.
Primary Caches
The R4000 processor incorporates separate on-chip primary instruction
and data caches to fill the high-performance pipeline. Each cache has its
own 64-bit data path, and each can be accessed in parallel.
The R4000 processor primary caches hold from 8 Kbytes to 32 Kbytes; the
R4400 processor primary caches are fixed at 16 Kbytes.
Cache accesses can occur up to twice each cycle. This provides the integer
and floating-point units with an aggregate bandwidth of 1.6 Gbytes per
second at a MasterClock frequency of 50 MHz.
I-Type (Immediate)
31 26 25 21 20 16 15 0
op rs rt immediate
J-Type (Jump)
31 26 25 0
op target
R-Type (Register)
31 26 25 21 20 16 15 11 10 6 5 0
op rs rt rd sa funct
Computational Instructions
Computational instructions can be either in register (R-type) format, in
which both operands are registers, or in immediate (I-type) format, in
which one operand is a 16-bit immediate.
Computational instructions perform the following operations on register
values:
• arithmetic
• logical
• shift
• multiply
• divide
These operations fit in the following four categories of computational
instructions:
• ALU Immediate instructions
• three-Operand Register-Type instructions
• shift instructions
• multiply and divide instructions
64-bit Operations
When operating in 64-bit mode, 32-bit operands must be sign extended.
The result of operations that use incorrect sign-extended 32-bit values is
unpredictable.
† Taken branches have a 3 cycle penalty in this implementation. See Chapter 3 for more
information.
Special Instructions
Special instructions allow the software to initiate traps; they are always
R-type. For more information about special instructions, refer to the
individual instruction as described in Appendix A.
Exception Instructions
Exception instructions are extensions to the MIPS ISA. For more
information about exception instructions, refer to the individual
instruction as described in Appendix A.
Coprocessor Instructions
Coprocessor instructions perform operations in their respective
coprocessors. Coprocessor loads and stores are I-type, and coprocessor
computational instructions have coprocessor-dependent formats.
Individual coprocessor instructions are described in Appendices A (for
CP0) and B (for the FPU, CP1).
CP0 instructions perform operations specifically on the System Control
Coprocessor registers to manipulate the memory management and
exception handling facilities of the processor. Appendix A details CP0
instructions.
This chapter describes the basic operation of the CPU pipeline, which
includes descriptions of the delay instructions (instructions that follow a
branch or load instruction in the pipeline), interruptions to the pipeline
flow caused by interlocks and exceptions, and R4400 implementation of an
uncached store buffer.
The FPU pipeline is described in Chapter 6.
PCycle
MasterClock (8-Deep)
Cycle
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Current
CPU
Cycle
RF - Register Fetch
During the RF stage, the following occurs:
• The instruction decoder (IDEC) decodes the instruction and
checks for interlock conditions.
• The instruction cache tag is checked against the page frame
number obtained from the ITLB.
• Any required operands are fetched from the register file.
EX - Execution
During the EX stage, one of the following occurs:
• The arithmetic logic unit (ALU) performs the arithmetic or
logical operation for register-to-register instructions.
• The ALU calculates the data virtual address for load and store
instructions.
• The ALU determines whether the branch condition is true and
calculates the virtual branch target address for branch
instructions.
TC - Tag Check
For load and store instructions, the cache performs the tag check during
the TC stage. The physical address from the TLB is checked against the
cache tag to determine if there is a hit or a miss.
WB - Write Back
For register-to-register instructions, the instruction result is written back
to the register file during the WB stage. Branch instructions perform no
operation during this stage.
Figure 3-2 shows the activities occurring during each ALU pipeline stage,
for load, store, and branch instructions.
Clock
Phase 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
Stage IF IS RF EX DF DS TC WB
IC1 IC2
IFetch ITLB1 ITLB2 ITC
and IDEC
Decode
RF
ALU ALU WB
Load/Store DVA DC1 DC2
LSA
JTLB1 JTLB2 DTC WB
Branch IVA
branch IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
three branch
delay
IF IS RF EX DF DS TC WB
instructions
IF IS RF EX DF DS TC WB
target IF IS RF EX DF DS TC WB
Branch Delay
load IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB two load
delay
IF IS RF EX DF DS TC WB instructions
f(load) IF IS RF EX DF DS TC WB
Load
Delay
Faults
Software Hardware
Exceptions Interlocks
Stalls Slips
Clock
PCycle 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2
Pipeline Stage
State
IF IS RF EX DF DS TC WB
ITM ICM CPBE DCM
Stall* SXT WA
STI
*MP stalls can occur at any stage; they are not associated with any instruction or pipe stage
IF IS RF EX DF DS TC WB
LDI
MultB
DivB
Slip
MDOne
ShSlip
FCBsy
IF IS RF EX DF DS TC WB
ITLB Intr OVF DTLB DBE
IBE FPE TLBMod Watch
IVACoh ExTrap DVACoh
II DECCErr
Exceptions
BP NMI
SC Reset
CUn
IECCErr
Exception Description
ITLB Instruction Translation or Address Exception
Intr External Interrupt
IBE IBus Error
IVACoh IVA Coherent
II Illegal Instruction
BP Breakpoint
SC System Call
CUn Coprocessor Unusable
IECCErr Instruction ECC Error
OVF Integer Overflow
FPE FP Interrupt
ExTrap EX Stage Traps
DTLB Data Translation or Address Exception
TLBMod TLB Modified
DBE Data Bus Error
Watch Memory Reference Address Compare
DVACoh DVA Coherent
DECCErr Data ECC Error
NMI Non-maskable Interrupt
Reset Reset
Interlock Description
ITM Instruction TLB Miss
ICM Instruction Cache Miss
CPBE Coprocessor Possible Exception
SXT Integer Sign Extend
STI Store Interlock
DCM Data Cache Miss
WA Watch Address Exception
LDI Load Interlock
MultB Multiply Unit Busy
DivB Divide Unit Busy
MDOne Mult/Div One Cycle Slip
ShSlip Var Shift or Shift > 32 bits
FCBsy FP Busy
Exception Conditions
When an exception condition occurs, the relevant instruction and all those
that follow it in the pipeline are cancelled. Accordingly, any stall
conditions and any later exception conditions that may have referenced
this instruction are inhibited; there is no benefit in servicing stalls for a
cancelled instruction.
After instruction cancellation, a new instruction stream begins, starting
execution at a predefined exception vector. System Control Coprocessor
registers are loaded with information that identifies the type of exception
and auxiliary information such as the virtual address at which translation
exceptions occur.
Stall Conditions
Often, a stall condition is only detected after parts of the pipeline have
advanced using incorrect data; this is called a pipeline overrun. When a stall
condition is detected, all eight instructions—each different stage of the
pipeline—are frozen at once. In this stalled state, no pipeline stages can
advance until the interlock condition is resolved.
Once the interlock is removed, the restart sequence begins two cycles
before the pipeline resumes execution. The restart sequence reverses the
pipeline overrun by inserting the correct information into the pipeline.
Slip Conditions
When a slip condition is detected, pipeline stages that must advance to
resolve the dependency continue to be retired (completed), while
dependent stages are held until the required data is available.
External Stalls
External stall is another class of interlocks. An external stall originates
outside the processor and is not referenced to a particular pipeline stage.
This interlock is not affected by exceptions.
Cycle Run Run Run Run Run Run Run Stl Stl Stl Stl Stl Run Run Run Run Run
Load IF IS RF EX DF DS TC DF DS TC WB
IF IS RF EX DF DS DF DS TC WB
ALU IF IS RF EX DF DF DS TC WB
IF IS RF EX- RF EX+ DF DS TC WB
IF IS RF EX DF DS TC WB
Cycle Run Run Run Run Stl Stl Stl Stl Stl Run Run Run Run Run Run Run
Stall InstrCacheMiss
ALU IF IS RF EX DF DS TC WB*
OVF
IF IS RF IF IS RF EX DF DS TC WB*
ICM
IF IS IF IS RF EX DF DS TC WB*
IF IF IS RF EX DF DS TC WB*
Even though the line brought in by the instruction cache could have been
replaced by a line of the exception handler, no performance loss occurs,
since the instruction cache miss would have been serviced anyway, after
returning from the exception handler. Handling of the exception is done
in this fashion because the frequency of an exception occurring is, by
definition, relatively low.
Clock
Phase 1 2 1 2 1 2 1 2 1 2 1 2
Load1: DF DS TC WB
Load2: DF DS TC WB
Load3: DF DS TC WB
The decision whether or not to advance the pipeline is derived from these
three rules:
• All possible fault-causing events, such as cache misses,
translation exceptions, load interlocks, etc., must be
individually evaluated.
• The fault to be serviced is selected, based on a predefined
priority as determined by the pipeline stage of the asserted
faults.
• Pipeline advance control signals are buffered and distributed.
Figure 3-10 illustrates this process.
Clock
Phase 1 2 1 2 1 2 1 2
Special Cases
In some instances, the pipeline control state machine is bypassed. This
occurs due to performance considerations or to correctness
considerations, which are described in the following sections.
Performance Considerations
A performance consideration occurs when there is a cache load miss. By
bypassing the pipeline state machine, it is possible to eliminate up to two
cycles of load miss latency. Two techniques, address acceleration and
address prediction, increase performance.
Address Acceleration
Address acceleration bypasses a potential cache miss address. It is relatively
straightforward to perform this bypass since sending the cache miss
address to the secondary cache has no negative impact even if a
subsequent exception nullifies the effect of this cache access. Power is
wasted when the miss is inhibited by some fault, but this is a minor effect.
Address Prediction
Another technique used to reduce miss latency is the automatic increment
and transmission of instruction miss addresses following an instruction
cache miss. This form of latency reduction is called address prediction: the
subsequent instruction miss address is predicted to be a simple increment
of the previous miss address. Figure 3-11 shows a cache miss in which the
cache miss address is changed based on the detection of the miss.
Cycle Run Run Run Run Run Run Run Stl Stl Stl Stl Stl Stl Stl Stl Run
Load IF IS RF EX DF DS TC DF DS TC WB
Correctness Considerations
An example in which bypassing is necessary to guarantee correctness is a
cache write.
If the two uncached stores execute within a loop, the two killed
instructions which are part of the loop branch latency are included in the
count of seven interpolated cycles. Figure 3-13 shows the four NOP
instructions that need to be scheduled in this case.
Multiple Matches
If more than one entry in the TLB matches the virtual address being
translated, the operation is undefined. To prevent permanent damage to
the part, the TLB may be disabled if more than several entries match. The
TLB-Shutdown (TS) bit in the Status register is set to 1 if the TLB is
disabled.
† There are virtual-to-physical address translations that occur outside of the TLB. For
example, addresses in the kseg0 and kseg1 spaces are unmapped translations. In these
spaces the physical address is derived by subtracting the base address of the space from
the virtual address.
Virtual address
1. Virtual address (VA) represented by the
virtual page number (VPN) is compared G ASID VPN Offset
with tag in TLB.
TLB
Physical address
† Figure 4-8 shows the 32-bit and 64-bit versions of the processor TLB entry.
As shown in Figures 4-2 and 4-3, the virtual address is extended with an
8-bit address space identifier (ASID), which reduces the frequency of TLB
flushing when switching contexts. This 8-bit ASID is in the CP0 EntryHi
register, described later in this chapter. The Global bit (G) is in the EntryLo0
and EntryLo1 registers, described later in this chapter.
39 32 31 29 28 20 bits = 1M pages 12 11 0
39 32 31 29 28 24 23 0
Offset passed
Virtual-to-physical unchanged to
translation in TLB TLB
physical
Bits 62 and 63 of the virtual memory
36-bit Physical Address
address select user, supervisor,
or kernel address spaces.
35 0
PFN Offset
Offset passed
Virtual-to-physical unchanged to
translation in TLB physical
TLB memory
71 64 63 62 61 40 39 24 23 0
Operating Modes
The processor has three operating modes that function in both 32- and 64-
bit operations:
• User mode
• Supervisor mode
• Kernel mode
These modes are described in the next three sections.
32-bit* 64-bit
0x FFFF FFFF 0x FFFF FFFF FFFF FFFF
Address Address
Error Error
0x 8000 0000 0x 0000 0100 0000 0000
2 GB 1 TB
useg xuseg
Mapped Mapped
0x 0000 0000 0x 0000 0000 0000 0000
*NOTE: The R4000 uses 64-bit addresses internally. When the kernel
is running in Kernel mode, it initializes registers before switching
modes, and saves (or restores, whichever is appropriate) register
values on context switches. In 32-bit mode, a valid address must be a
32-bit signed number, where bits 63:32 = bit 31. In normal operation
it is not possible for a 32-bit User-mode program to produce invalid
addresses. However, although it would be an error, it is possible for a
Kernel-mode program to erroneously place a value that is not a 32-bit
signed number into a 64-bit register, in which case the User-mode
program generates an invalid address.
The User segment starts at address 0 and the current active user process
resides in either useg (in 32-bit mode) or xuseg (in 64-bit mode). The TLB
identically maps all references to useg/xuseg from all modes, and controls
cache accessibility.†
The processor operates in User mode when the Status register contains the
following bit-values:
• KSU bits = 102
• EXL = 0
• ERL = 0
In conjunction with these bits, the UX bit in the Status register selects
between 32- or 64-bit User mode addressing as follows:
• when UX = 0, 32-bit useg space is selected and TLB misses are
handled by the 32-bit TLB refill exception handler
• when UX = 1, 64-bit xuseg space is selected and TLB misses are
handled by the 64-bit XTLB refill exception handler
Table 4-1 lists the characteristics of the two user mode segments, useg and
xuseg.
Status Register
Address Bit Segment
Bit Values Address Range Segment Size
Values Name
KSU EXL ERL UX
† The cached (C) field in a TLB entry determines whether the reference is cached; see Figure
4-8.
Figure 4-5 shows Supervisor mode address mapping. Table 4-2 lists the
characteristics of the supervisor mode segments; descriptions of the
address spaces follow.
32-bit* 64-bit
0x FFFF FFFF 0x FFFF FFFF FFFF FFFF
Address Address
0x E000 0000 error error
0x FFFF FFFF E000 0000
0.5 GB sseg 0.5 GB
0x C000 0000 Mapped Mapped csseg
0x FFFF FFFF C000 0000
Address
error Address
0x A000 0000 error
Address 0x 4000 0100 0000 0000
error 1 TB xsseg
0x 8000 0000 Mapped
0x 4000 0000 0000 0000
Address
2 GB 0x 0000 0100 0000 0000 error
suseg
Mapped 1 TB
Mapped xsuseg
0x 0000 0000 0x 0000 0000 0000 0000
32-bit* 64-bit
0x FFFF FFFF FFFF FFFF 0.5 GB
0x FFFF FFFF
ckseg3
0.5 GB 0x FFFF FFFF E000 0000 Mapped
Mapped kseg3
0x E000 0000
0.5 GB
cksseg
0x FFFF FFFF C000 0000 Mapped
0.5 GB 0.5 GB
ksseg Unmapped ckseg1
Mapped Uncached
0x C000 0000 0x FFFF FFFF A000 0000
0.5 GB 0.5 GB
Unmapped ckseg0
Unmapped kseg1 0x FFFF FFFF 8000 0000 Cached
0x A000 0000
Uncached Address
0.5 GB 0x C000 00FF 8000 0000 error
Unmapped kseg0 Mapped xkseg
0x 8000 0000 Cached 0x C000 0000 0000 0000
Unmapped xkphys
0x 8000 0000 0000 0000
Address
0x 4000 0100 0000 0000 error
2 GB 1 TB xksseg
kuseg Mapped
Mapped 0x 4000 0000 0000 0000
Address
0x 0000 0100 0000 0000 error
1 TB
Mapped xkuseg
0x 0000 0000 0x 0000 0000 0000 0000
Status Register
Address Bit Is One Of These Segment Segment
Values Address Range
Values Name Size
KSU EXL ERL KX
0x0000 0000 2 Gbytes
A(31) = 0 0 kuseg through
0x7FFF FFFF (231 bytes)
† For a description of CP0 data dependencies and hazards, please see Appendix F.
32-bit Mode
127 121 120 109 108 96
0 MASK 0
7 12 13
95 77 76 75 72 71 64
VPN2 G 0 ASID
128-bit TLB
entry in 32-
19 1 4 8
bit mode of 63 62 61 38 37 35 34 33 32
R4000
processor 0 PFN C D V0
2 24 3 1 1 1
31 30 29 6 5 3 2 1 0
0 PFN C D V 0
2 24 3 1 1 1
64-bit Mode
255 217 216 205 204 192
0 MASK 0
39 12 13
191 190 189 168 167 141 140139 136 135 128
0 PFN C D V0
34 24 3 1 1 1
The format of the EntryHi, EntryLo0, EntryLo1, and PageMask registers are
nearly the same as the TLB entry. The one exception is the Global field
(G bit), which is used in the TLB, but is reserved in the EntryHi register.
Figures 4-9 and 4-10 describe the TLB entry fields shown in Figure 4-8.
PageMask Register
31 25 24 13 12 0
32-bit
Mode 0 MASK 0
7 12 13
Mask..... Page comparison mask.
0 ........... Reserved. Must be written as zeroes, and returns zeroes when read.
31
EntryHi Register
13 12 8 7 0
32-bit
Mode VPN2 0 ASID
19 5 8
63 62 61 40 39 13 12 8 7 0
64-bit
Mode
R FILL VPN2 0 ASID
2 22 27 5 8
VPN2 ... Virtual page number divided by two (maps to two pages).
ASID .... Address space ID field. An 8-bit field that lets multiple processes share the TLB;
each process has a distinct mapping of otherwise identical virtual page numbers.
R .......... Region. (00 → user, 01 → supervisor, 11 → kernel) used to match vAddr63...62
Fill ........ Reserved. 0 on read; ignored on write.
0........... Reserved. Must be written as zeroes, and returns zeroes when read.
The TLB page coherency attribute (C) bits specify whether references to
the page should be cached; if cached, the algorithm selects between several
coherency attributes. Table 4-6 shows the coherency attributes selected by
the C bits.
0 Reserved
1 Reserved
2 Uncached
CP0 Registers
The following sections describe the CP0 registers, shown in Figure 4-7,
that are assigned specifically as a software interface with memory
management (each register is followed by its register number in
parentheses).
• Index register (CP0 register number 0)
• Random register (1)
• EntryLo0 (2) and EntryLo1 (3) registers
• PageMask register (5)
• Wired register (6)
• EntryHi register (10)
• PRId register (15)
• Config register (16)
• LLAddr register (17)
• TagLo (28) and TagHi (29) registers
Index Register
31 30 6 5 0
P 0 Index
1 25 6
Figure 4-11 Index Register
Field Description
Probe failure. Set to 1 when the previous TLBProbe
P
(TLBP) instruction was unsuccessful.
Index to the TLB entry affected by the TLBRead and
Index
TLBWrite instructions
Reserved. Must be written as zeroes, and returns zeroes
0
when read.
Random Register
31 6 5 0
0 Random
26 6
Figure 4-12 Random Register
Field Description
Random TLB Random index
Reserved. Must be written as zeroes, and returns zeroes
0
when read.
Bit
Page Size
24 23 22 21 20 19 18 17 16 15 14 13
4 Kbytes 0 0 0 0 0 0 0 0 0 0 0 0
16 Kbytes 0 0 0 0 0 0 0 0 0 0 1 1
64 Kbytes 0 0 0 0 0 0 0 0 1 1 1 1
256 Kbytes 0 0 0 0 0 0 1 1 1 1 1 1
1 Mbyte 0 0 0 0 1 1 1 1 1 1 1 1
4 Mbytes 0 0 1 1 1 1 1 1 1 1 1 1
16 Mbytes 1 1 1 1 1 1 1 1 1 1 1 1
TLB
47
Wired
Register
Range of Wired entries
The Wired register is set to 0 upon system reset. Writing this register also
sets the Random register to the value of its upper bound (see Random
register, above). Figure 4-14 shows the format of the Wired register; Table
4-10 describes the register fields.
Wired Register
31 6 5 0
0 Wired
26 6
Field Description
Wired TLB Wired boundary
Reserved. Must be written as zeroes, and returns
0
zeroes when read.
PRId Register
31 16 15 87 0
0 Imp Rev
16 8 8
The low-order byte (bits 7:0) of the PRId register is interpreted as a revision
number, and the high-order byte (bits 15:8) is interpreted as an
implementation number. The implementation number of the R4000
processor is 0x04. The content of the high-order halfword (bits 31:16) of
the register are reserved.
The revision number is stored as a value in the form y.x, where y is a major
revision number in bits 7:4 and x is a minor revision number in bits 3:0.
The revision number can distinguish some chip revisions, however there
is no guarantee that changes to the chip will necessarily be reflected in the
PRId register, or that changes to the revision number necessarily reflect
real chip changes. For this reason, these values are not listed and software
should not rely on the revision number in the PRId register to characterize
the chip.
Config Register
31 30 28 27 24 23 22 21 20 19 18 17 16 15 14 13 12 11 9 8 6 5 4 3 2 0
CM EC EP SB SS SW EW SC SM BE EM EB 0 IC DC IB DB CU K0
1 3 4 2 1 1 2 1 1 1 1 1 1 3 3 1 1 1 3
Field Description
CM Master-Checker Mode (1 → Master/Checker Mode is enabled).
System clock ratio:
0 → processor clock frequency divided by 2
1 → processor clock frequency divided by 3
EC
2 → processor clock frequency divided by 4
3 → processor clock frequency divided by 6 (R4400 processor only)
4 → processor clock frequency divided by 8 (R4400 processor only)
Transmit data pattern (pattern for write-back data):
0→D Doubleword every cycle
1 → DDx 2 Doublewords every 3 cycles
2 → DDxx 2 Doublewords every 4 cycles
3 → DxDx 2 Doublewords every 4 cycles
EP
4 → DDxxx 2 Doublewords every 5 cycles
5 → DDxxxx 2 Doublewords every 6 cycles
6 → DxxDxx 2 Doublewords every 6 cycles
7 → DDxxxxxx 2 Doublewords every 8 cycles
8 → DxxxDxxx 2 Doublewords every 8 cycles
Secondary Cache line size:
0 → 4 words
SB 1 → 8 words
2 → 16 words
3 → 32 words
Split Secondary Cache Mode
SS 0 → instruction and data mixed in secondary cache (joint cache)
1 → instruction and data separated by SCAddr(17)
Secondary Cache port width
SW 0 → 128-bit data path to S-cache
1 → Reserved
System Port width
EW 0 → 64-bit
1, 2, 3 → Reserved
Secondary Cache present
SC 0 → S-cache present
1 → no S-cache present
LLAddr Register
31 0
PAddr(35:4)
32
Figure 4-17 LLAddr Register Format
31 8 7 6 5 1 0
TagLo PTagLo PState 0 P
24 2 5 1
31 0
TagHi Undefined
32
31 13 12 10 9 7 6 0
TagLo STagLo SState VIndex ECC
19 3 3 7
31 0
TagHi Undefined
32
Figure 4-19 TagLo and TagHi Register (S-cache) Formats
Field Description
PTagLo Specifies the physical address bits 35:12
PState Specifies the primary cache state
P Specifies the primary tag even parity bit
STagLo Specifies the physical address bits 35:17
SState Specifies the secondary cache state
Specifies the virtual index of the associated Primary cache line,
VIndex
vAddr(14:12)
ECC ECC for the STag, SState, and VIndex fields
0 Reserved. Must be written as zeroes, and returns zeroes when read.
Undefined The TagHi register should not be used.
Exception Exception
Yes No Yes
No Valid
Unmapped Address?
Access
Yes
VPN No
Match?
Yes
Global
G No ASID No
= 1? Match?
Yes
Yes
Valid 32-bit No
V address?
= 1? No
Yes Yes
Dirty
Yes No D
Write? = 1?
Yes
No
Non-
TLB cacheable TLB TLB XTLB
C= Invalid Refill
Mod Refill
010?
Yes No
Exception Exception
Access
Main Access
Memory Cache
TLB Misses
If there is no TLB entry that matches the virtual address, a TLB miss
exception occurs.† If the access control bits (D and V) indicate that the
access is not valid, a TLB modification or TLB invalid exception occurs. If
the C bits equal 0102, the physical address that is retrieved accesses main
memory, bypassing the cache.
TLB Instructions
Table 4-14 lists the instructions that the CPU provides for working with
the TLB. See Appendix A for a detailed description of these instructions.
CPU general registers are interlocked and the result of an instruction can
normally be used by the next instruction; if the result is not available right
away, the processor stalls until it is available. CP0 registers and the TLB
are not interlocked, however; there may be some delay before a value
written by one instruction is available to following instructions. For more
information please see Appendix F.
Context Register
31 23 22 4 3 0
32-bit PTEBase BadVPN2 0
Mode
9 19 4
63 23 22 4 3 0
64-bit PTEBase BadVPN2 0
Mode
41 19 4
Figure 5-1 Context Register Format
The 19-bit BadVPN2 field contains bits 31:13 of the virtual address that
caused the TLB miss; bit 12 is excluded because a single TLB entry maps
to an even-odd page pair. For a 4-Kbyte page size, this format can directly
address the pair-table of 8-byte PTEs. For other page and PTE sizes,
shifting and masking this value produces the appropriate address.
BadVAddr Register
31 0
32-bit Bad Virtual Address
Mode
32
63 0
64-bit Bad Virtual Address
Mode
64
Figure 5-2 BadVAddr Register Format
Note: The BadVAddr register does not save any information for bus errors,
since bus errors are not addressing errors.
Count Register
31 0
Count
32
Compare Register
31 0
Compare
32
Figure 5-4 Compare Register Format
Status Register
31 28 27 26 25 24 16 15 8 7 6 5 4 3 2 1 0
CU
RP FR RE DS IM7 - IM0 KX SX UX KSU ERL EXL IE
(Cu3:.Cu0)
4 1 1 1 9 8 1 1 1 2 1 1 1
Field Description
Controls the usability of each of the four coprocessor unit
numbers. CP0 is always usable when in Kernel mode,
CU regardless of the setting of the CU0 bit.
1 → usable
0 → unusable
Enables reduced-power operation by reducing the internal
clock frequency. The clock divisor is programmable at boot
RP time.
0 → full speed
1→ reduced clock
Enables additional floating-point registers
FR 0 → 16 registers
1 → 32 registers
RE Reverse-Endian bit, valid in User mode.
DS Diagnostic Status field (see Figure 5-6).
Interrupt Mask: controls the enabling of each of the external,
internal, and software interrupts. An interrupt is taken if
interrupts are enabled, and the corresponding bits are set in
IM both the Interrupt Mask field of the Status register and the
Interrupt Pending field of the Cause register.
0 → disabled
1→ enabled
Enables 64-bit addressing in Kernel mode. The extended-
addressing TLB refill exception is used for TLB misses on
KX kernel addresses.
0 → 32−bit
1 → 64−bit
Enables 64-bit addressing and operations in Supervisor
mode. The extended-addressing TLB refill exception is used
SX for TLB misses on supervisor addresses.
0 → 32−bit
1 → 64−bit
Field Description
Enables 64-bit addressing and operations in User mode.
The extended-addressing TLB refill exception is used for
UX TLB misses on user addresses.
0 → 32−bit
1 → 64−bit
Mode bits
102 → User
KSU
012 → Supervisor
002 → Kernel
Error Level; set by the processor when Reset, Soft Reset,
NMI, or Cache Error exception are taken.
ERL
0 → normal
1 → error
Exception Level; set by the processor when any exception
other than Reset, Soft Reset, NMI, or Cache Error exception
EXL are taken.
0 → normal
1 → exception
Interrupt Enable
IE 0 → disable interrupts
1 → enables interrupts
0 BEV TS SR 0 CH CE DE
2 1 1 1 1 1 1 1
Bit Description
Controls the location of TLB refill and general exception
vectors.
BEV
0 → normal
1→ bootstrap
TS 1→ Indicates TLB shutdown has occurred (read-only).
1→ Indicates a Reset* signal or NMI has caused a Soft Reset
SR
exception
Hit (tag match and valid state) or miss indication for last
CACHE Hit Invalidate, Hit Write Back Invalidate, Hit Write
Back, Hit Set Virtual, or Create Dirty Exclusive for a
CH
secondary cache.
0 → miss
1 → hit
Contents of the ECC register set or modify the check bits of the
CE
caches when CE = 1; see description of the ECC register.
Specifies that cache parity or ECC errors cannot cause
exceptions.
DE
0 → parity/ECC remain enabled
1 → disables parity/ECC
Reserved. Must be written as zeroes, and returns zeroes
0
when read.
User Address Space Accesses: Access to the user address space is allowed
in any of the three operating modes.
Field Description
Indicates whether the last exception taken occurred in a branch delay slot.
BD 1 → delay slot
0 → normal
Coprocessor unit number referenced when a Coprocessor Unusable
CE
exception is taken.
Indicates an interrupt is pending.
IP 1 → interrupt pending
0 → no interrupt
ExcCode Exception code field (see Table 5-6)
0 Reserved. Must be written as zeroes, and returns zeroes when read.
Cause Register
31 30 29 28 27 16 15 8 7 6 2 1 0
1 1 2 12 8 1 5 2
Exception
Mnemonic Description
Code Value
0 Int Interrupt
1 Mod TLB modification exception
2 TLBL TLB exception (load or instruction fetch)
3 TLBS TLB exception (store)
4 AdEL Address error exception (load or instruction fetch)
5 AdES Address error exception (store)
6 IBE Bus error exception (instruction fetch)
7 DBE Bus error exception (data reference: load or store)
8 Sys Syscall exception
9 Bp Breakpoint exception
10 RI Reserved instruction exception
11 CpU Coprocessor Unusable exception
12 Ov Arithmetic Overflow exception
13 Tr Trap exception
14 VCEI Virtual Coherency Exception instruction
15 FPE Floating-Point exception
16–22 – Reserved
23 WATCH Reference to WatchHi/WatchLo address
24–30 – Reserved
31 VCED Virtual Coherency Exception data
EPC Register
31 0
32-bit EPC
Mode
32
63 0
64-bit EPC
Mode
64
WatchLo Register
31 3 2 1 0
PAddr0 0 R W
29 1 1 1
WatchHi Register
31 4 3 0
0 PAddr1
28 4
Field Description
PAddr1 Bits 35:32 of the physical address
PAddr0 Bits 31:3 of the physical address
R Trap on load references if set to 1
W Trap on store references if set to 1
Reserved. Must be written as zeroes, and returns
0
zeroes when read.
XContext Register
63 33 32 31 30 4 3 0
PTEBase R BadVPN2 0
31 2 27 4
Figure 5-10 XContext Register Format
The 27-bit BadVPN2 field has bits 39:13 of the virtual address that caused
the TLB miss; bit 12 is excluded because a single TLB entry maps to an
even-odd page pair. For a 4-Kbyte page size, this format may be used
directly to address the pair-table of 8-byte PTEs. For other page and PTE
sizes, shifting and masking this value produces the appropriate address.
Field Description
The Bad Virtual Page Number/2 field is written by hardware on a miss. It
BadVPN2
contains the VPN of the most recent invalidly translated virtual address.
The Region field contains bits 63:62 of the virtual address.
002 = user
R
012 = supervisor
112 = kernel.
The Page Table Entry Base read/write field is normally written with a value
PTEBase that allows the operating system to use the Context register as a pointer into
the current PTE array in memory.
ECC Register
31 8 7 0
0 ECC
24 8
Figure 5-11 ECC Register Format
Field Description
An 8-bit field specifying the ECC bits read from or
ECC written to a secondary cache, or the even byte parity bits
to be read from or written to a primary cache.
Reserved. Must be written as zeroes, and returns zeroes
0
when read.
CacheErr Register
31 30 29 28 27 26 25 24 23 22 21 2 0
ER EC ED ET ES EE EB EI EW 0 SIdx PIDx
1 1 1 1 1 1 1 1 1 1 19 3
Figure 5-12 CacheErr Register Format
Field Description
Type of reference
ER 0 → instruction
1 → data
Cache level of the error
EC 0 → primary
1 → secondary
Indicates if a data field error occurred
ED 0 → no error
1 → error
Indicates if a tag field error occurred
ET 0 → no error
1 → error
Field Description
Indicates the error occurred while accessing primary or secondary cache in
response to an external request.
ES
0 → internal reference
1 → external reference
EE This bit is set if the error occurred on the SysAD bus.
This bit is set if a data error occurred in addition to the instruction error
EB (indicated by the remainder of the bits). If so, this requires flushing the
data cache after fixing the instruction error.
This bit is set on a secondary data cache ECC error while refilling the
EI primary cache on a store miss. The ECC handler must first do an Index
Store Tag to invalidate the incorrect data from the primary data cache.
This bit is only available on the R4400 processor. It is set on an
multiprocessor cache error when the CacheErr register is already holding
the values of a previous cache error. This bit could be set by the processor
from the time the CacheErr register is loaded due to an error until the time
that an ERET instruction is executed. Once the EW bit is set, it can only be
EW
cleared by a reset. The following errors set the EW bit:
• Secondary cache tag errors arising from an external request
(multibit errors only)
• Secondary cache data errors arising from an external update
• Primary cache tag errors arising from an external request
Bits pAddr(21:3) of the reference that encountered the error (which is not
SIdx necessarily the same as the address of the doubleword in error, but is
sufficient to locate that doubleword in the secondary cache).
Bits vAddr(14:12) of the doubleword in error (used with SIdx to construct
PIdx
a virtual index for the primary caches).
0 Reserved. Must be written as zeroes, and returns zeroes when read.
ErrorEPC Register
31 0
32-bit ErrorEPC
Mode
32
63 0
64-bit ErrorEPC
Mode
64
Exception Types
This section gives sample exception handler operations for the following
exception types:
• reset
• soft reset
• nonmaskable interrupt (NMI)
• cache error
• remaining processor exceptions
When the EXL bit in the Status register is 0, either User, Supervisor, or
Kernel operating mode is specified by the KSU bits in the Status register.
When the EXL bit is a 1, the processor is in Kernel mode.
When the processor takes an exception, the EXL bit is set to 1, which means
the system is in Kernel mode. After saving the appropriate state, the
exception handler typically changes KSU to Kernel mode and resets the
EXL bit back to 0. When restoring the state and restarting, the handler
restores the previous value of the KSU field and sets the EXL bit back to 1.
Returning from an exception, also resets the EXL bit to 0 (see the ERET
instruction in Appendix A).
In the following sections, sample hardware processes for various
exceptions are shown, together with the servicing required by the handler
(software).
T: undefined
Random ← TLBENTRIES–1
Wired ← 0
Config ← CM || EC || EP || SB || SS || SW || EW || SC || SM || BE || EM || EB || 0 || IC
|| DC || undefined6
ErrorEPC ← RestartPC /* If the instruction is in a branch delay slot, RestartPC */
/* holds the value of PC-4, otherwise RestartPC = PC */
If R4400 then
CacheErr ← undefined8 || 0 || undefined23 /* Set EW bit to 0 */
endif
SR ← SR31:23 || 1 || 0 || 0 || SR19:3 || 1 || SR1:0
PC ← 0xFFFF FFFF BFC0 0000
Figure 5-17 General Exception Processing (Except Reset, Soft Reset, NMI, and Cache Error)
BEV
Exception
0 1
Cache Error 0xFFFF FFFF A000 0000 0xFFFF FFFF BFC0 0200
Others 0xFFFF FFFF 8000 0000 0xFFFF FFFF BFC0 0200
Reset, NMI,
0xFFFF FFFF BFC0 0000
Soft Reset
Priority of Exceptions
The remainder of this chapter describes exceptions in the order of their
priority shown in Table 5-13 with (certain of the exceptions, such as the
TLB exceptions and Instruction/Data exceptions, grouped together for
convenience). While more than one exception can occur for a single
instruction, only the exception with the highest priority is reported.
Reset Exception
Cause
The Reset exception occurs when the ColdReset*† signal is asserted and
then deasserted. This exception is not maskable.
Processing
The CPU provides a special interrupt vector for this exception:
• location 0xBFC0 0000 in 32-bit mode
• location 0xFFFF FFFF BFC0 0000 in 64-bit mode
The Reset vector resides in unmapped and uncached CPU address space,
so the hardware need not initialize the TLB or the cache to process this
exception. It also means the processor can fetch and execute instructions
while the caches and virtual memory are in an undefined state.
The contents of all registers in the CPU are undefined when this exception
occurs, except for the following register fields:
• In the Status register, SR and TS are cleared to 0, and ERL and
BEV are set to 1. All other bits are undefined.
• Config register is initialized with the boot mode bits read from
the serial input (see Figure 5-14).
• The Random register is initialized to the value of its upper
bound.
• The Wired register is initialized to 0.
• The EW bit in the CacheErr register is cleared (R4400 only).
Reset exception processing is shown in Figure 5-14.
Servicing
The Reset exception is serviced by:
• initializing all processor registers, coprocessor registers, caches,
and the memory system
• performing diagnostic tests
• bootstrapping the operating system
Cause
The Soft Reset exception occurs in response to either the Reset* input
signal or a Nonmaskable Interrupt (NMI)†.
The NMI is caused either by an assertion of the NMI* signal or an external
write to the Int*[6] bit of the Interrupt register.
This exception is not maskable.
Processing
Regardless of the cause, when this exception occurs the SR bit of the Status
register is set, distinguishing this exception from a Reset exception.
The processor does not indicate any distinction between an exception
caused by the Reset* signal or the NMI* signal.
• An exception caused by an NMI can only be taken if the
processor is processing instructions; it is taken at the
instruction boundary. It does not abort any state machines,
preserving the state of the processor for diagnosis.
• An exception caused by assertion of Reset* performs a subset
of the full reset initialization. After a processor is completely
initialized by a Reset exception (caused by ColdReset* or
Power-On), Reset* can be asserted on the processor in any
state, even if the processor is no longer processing instructions.
In this situation the processor does not read or set processor
configuration parameters. It does, however, initialize all other
processor state that requires hardware initialization (for
instance, the state machines and registers), in order that the
CPU can fetch and execute the Reset exception handler located
in uncached and unmapped space. Although no other
processor state is unnecessarily changed, a soft reset sequence
may be forced to alter some state since the exception can be
invoked arbitrarily on a cycle boundary, and abort any
multicycle operation in progress. Since bus, cache, or other
operations may be interrupted, portions of the cache, memory,
or other processor state may be inconsistent.
† In this book, a Soft Reset exception caused by assertion of the Reset* signal is referred to
as a “soft reset” or “warm reset.” A Soft Reset exception caused by a nonmaskable
interrupt (NMI) is referred to as a “nonmaskable interrupt exception.”
In both the Reset* and NMI cases the processor jumps to the Reset
exception vector located in unmapped and uncached address space, so
that the cache and TLB contents need not be initialized to service this
exception. Typically, the Reset exception vector is located in PROM, and
system memory does not need to be initialized to handle the exception.
As previously noted, state machines interrupted by Reset* may cause
some register contents to be inconsistent with the other processor state.
Otherwise, on an exception caused by Reset* or NMI the contents of all
registers are preserved, except for:
• EW bit in the CacheErr register, which is reset to 0 (R4400 only)
• ErrorEPC register, which contains the restart PC
• ERL bit of the Status register, which is set to 1
• SR bit of the Status register, which is set to 1
• BEV bit of the Status register, which is set to 1
• TS bit of the Status register, which is set to 0
• PC is set to the reset vector 0xFFFF FFFF BFC0 0000
Soft reset exception processing is shown in Figure 5-16.
Servicing
The exception initiated by Reset* is intended to quickly reinitialize a
previously operating processor after a fatal error such as a Master/
Checker mismatch. The NMI can be used for purposes other than resetting
the processor while preserving cache and memory contents. For example,
the system might use an NMI to cause an immediate, controlled shutdown
when it detects an impending power failure.
The exceptions due to Reset* and NMI appear identical to software; both
exceptions jump to the Reset exception vector and have the Status register
SR bit set. Unless external hardware provides a way to distinguish
between the two, they are serviced by saving the current user-visible
processor state for diagnostic purposes and reinitializing as for the Reset
exception. It is not normally possible to continue program execution after
returning from this exception, since a Reset* signal can be accepted
anytime and an NMI can occur in the midst of another error exception.
Cause
The Address Error exception occurs when an attempt is made to execute
one of the following:
• load or store a doubleword that is not aligned on a doubleword
boundary
• load, fetch, or store a word that is not aligned on a word
boundary
• load or store a halfword that is not aligned on a halfword
boundary
• reference the kernel address space from User or Supervisor
mode
• reference the supervisor address space from User mode
This exception is not maskable.
Processing
The common exception vector is used for this exception. The AdEL or
AdES code in the Cause register is set, indicating whether the instruction
caused the exception with an instruction reference, load operation, or store
operation shown by the EPC register and BD bit in the Cause register.
When this exception occurs, the BadVAddr register retains the virtual
address that was not properly aligned or that referenced protected
address space. The contents of the VPN field of the Context and EntryHi
registers are undefined, as are the contents of the EntryLo register.
The EPC register contains the address of the instruction that caused the
exception, unless this instruction is in a branch delay slot. If it is in a
branch delay slot, the EPC register contains the address of the preceding
branch instruction and the BD bit of the Cause register is set as indication.
Address Error exception processing is shown in Figure 5-17.
Servicing
The process executing at the time is handed a UNIX SIGSEGV
(segmentation violation) signal. This error is usually fatal to the process
incurring the exception.
TLB Exceptions
Three types of TLB exceptions can occur:
• TLB Refill occurs when there is no TLB entry that matches an
attempted reference to a mapped address space.
• TLB Invalid occurs when a virtual address reference matches a
TLB entry that is marked invalid.
• TLB Modified occurs when a store operation virtual address
reference to memory matches a TLB entry which is marked
valid but is not dirty (the entry is not writable).
The following three sections describe these TLB exceptions.
Cause
The TLB refill exception occurs when there is no TLB entry to match a
reference to a mapped address space. This exception is not maskable.
Processing
There are two special exception vectors for this exception; one for
references to 32-bit address spaces, and one for references to 64-bit address
spaces. The UX, SX, and KX bits of the Status register determine whether
the user, supervisor or kernel address spaces referenced are 32-bit or 64-
bit spaces. All references use these vectors when the EXL bit is set to 0 in
the Status register. This exception sets the TLBL or TLBS code in the
ExcCode field of the Cause register. This code indicates whether the
instruction, as shown by the EPC register and the BD bit in the Cause
register, caused the miss by an instruction reference, load operation, or
store operation.
When this exception occurs, the BadVAddr, Context, XContext and EntryHi
registers hold the virtual address that failed address translation. The
EntryHi register also contains the ASID from which the translation fault
occurred. The Random register normally contains a valid location in which
to place the replacement TLB entry. The contents of the EntryLo register
are undefined. The EPC register contains the address of the instruction
that caused the exception, unless this instruction is in a branch delay slot,
in which case the EPC register contains the address of the preceding
branch instruction and the BD bit of the Cause register is set.
TLB Refill exception processing is shown in Figure 5-17.
Servicing
To service this exception, the contents of the Context or XContext register
are used as a virtual address to fetch memory locations containing the
physical page frame and access control bits for a pair of TLB entries. The
two entries are placed into the EntryLo0/EntryLo1 register; the EntryHi and
EntryLo registers are written into the TLB.
It is possible that the virtual address used to obtain the physical address
and access control information is on a page that is not resident in the TLB.
This condition is processed by allowing a TLB refill exception in the TLB
refill handler. This second exception goes to the common exception vector
because the EXL bit of the Status register is set.
Cause
The TLB invalid exception occurs when a virtual address reference
matches a TLB entry that is marked invalid (TLB valid bit cleared). This
exception is not maskable.
Processing
The common exception vector is used for this exception. The TLBL or
TLBS code in the ExcCode field of the Cause register is set. This indicates
whether the instruction, as shown by the EPC register and BD bit in the
Cause register, caused the miss by an instruction reference, load operation,
or store operation.
When this exception occurs, the BadVAddr, Context, XContext and EntryHi
registers contain the virtual address that failed address translation. The
EntryHi register also contains the ASID from which the translation fault
occurred. The Random register normally contains a valid location in which
to put the replacement TLB entry. The contents of the EntryLo register are
undefined.
The EPC register contains the address of the instruction that caused the
exception unless this instruction is in a branch delay slot, in which case the
EPC register contains the address of the preceding branch instruction and
the BD bit of the Cause register is set.
TLB Invalid exception processing is shown in Figure 5-17.
Servicing
A TLB entry is typically marked invalid when one of the following is true:
• a virtual address does not exist
• the virtual address exists, but is not in main memory (a page
fault)
• a trap is desired on any reference to the page (for example, to
maintain a reference bit)
After servicing the cause of a TLB Invalid exception, the TLB entry is
located with TLBP (TLB Probe), and replaced by an entry with that entry’s
Valid bit set.
Cause
The TLB modified exception occurs when a store operation virtual address
reference to memory matches a TLB entry that is marked valid but is not
dirty and therefore is not writable. This exception is not maskable.
Processing
The common exception vector is used for this exception, and the Mod code
in the Cause register is set.
When this exception occurs, the BadVAddr, Context, XContext and EntryHi
registers contain the virtual address that failed address translation. The
EntryHi register also contains the ASID from which the translation fault
occurred. The contents of the EntryLo register are undefined.
The EPC register contains the address of the instruction that caused the
exception unless that instruction is in a branch delay slot, in which case the
EPC register contains the address of the preceding branch instruction and
the BD bit of the Cause register is set.
TLB Modified exception processing is shown in Figure 5-17.
Servicing
The kernel uses the failed virtual address or virtual page number to
identify the corresponding access control information. The page
identified may or may not permit write accesses; if writes are not
permitted, a write protection violation occurs.
If write accesses are permitted, the page frame is marked dirty/writable
by the kernel in its own data structures. The TLBP instruction places the
index of the TLB entry that must be altered into the Index register. The
EntryLo register is loaded with a word containing the physical page frame
and access control bits (with the D bit set), and the EntryHi and EntryLo
registers are written into the TLB.
Cause
The Cache Error exception occurs when either a secondary cache ECC
error, primary cache parity error, or SysAD bus parity/ECC error
condition occurs and error detection is enabled. This exception is not
maskable, but error detection can be disabled if either ERL or DE = 1 in the
Status register.
Processing
The processor sets the ERL bit in the Status register, saves the exception
restart address in the ErrorEPC register, records information about the
error in the CacheErr register, and then transfers to a special vector that is
always in uncached space (Tables 5-11 and 5-12). No other registers are
changed. Cache Error exception processing is shown in Figure 5-15.
Servicing
Unlike other exception conditions, cache errors cannot be avoided while
operating at exception level, so Cache Error exceptions must be handled
from exception level. Any general register used by the handler must be
saved before use and restored before return; this includes the registers
available to regular exception handlers without save/restore. When
ERL=1 in the Status register, the user address region becomes a 231-byte
uncached space mapped directly to physical addresses, allowing the
Cache Error handler to save registers to memory without using a register
to construct the address. The handler can save and restore registers using
operating system-reserved locations in low physical memory by using R0
as the base register for load and store instructions. All errors should be
logged. To correct single-bit ECC errors in the secondary cache, the
system uses the CACHE instruction. Execution then resumes through an
ERET instruction. To correct cache parity errors and non-single-bit ECC
errors in unmodified cache blocks, the system uses the CACHE instruction
to invalidate the cache block, overwrites the old data through a cache miss,
and resumes execution with an ERET. Other errors are not correctable and
are likely to be fatal to the current process. The exception handler cannot
be interrupted by another Cache Error exception because error detection
is disabled while ERL = 1, so the handler should avoid actions which
might cause an unnoticed cache error. The R4400 (but not R4000)
implements the EW bit in the CacheErr register to record a nonrecoverable
error occurring while ERL = 1.
Cause
A Virtual Coherency exception occurs when all of the following conditions
are true:
• a primary cache miss hits in the secondary cache
• bits 14:12 of the virtual address were not equal to the
corresponding bits of the PIdx field of the secondary cache tag
• the cache algorithm for the page (from the C field in the TLB)
specifies that the page is cached
This exception is not maskable.
Processing
The common exception vector is used for this exception.
The VCEI or VCED code in the Cause register is set for instruction and data
cache misses respectively.
The BadVAddr register holds the virtual address that caused the exception.
Virtual Coherency exception processing is shown in Figure 5-17.
Servicing
Using the appropriate CACHE instruction(s), the primary cache line at
both the previous and the new virtual index should be invalidated† (and
written back, if necessary), and the PIDx field of the secondary cache
should be written with the new virtual index. Once completed, the
program continues.
Software can avoid the cost of this exception by using consistent virtual
primary cache indexes to access the same physical data.
† When a cache miss occurs, the processor refills the primary cache line at the present virtual
index before taking an exception.
Cause
A Bus Error exception is raised by board-level circuitry for events such as
bus time-out, backplane bus parity errors, and invalid physical memory
addresses or access types. This exception is not maskable.
A Bus Error exception occurs either when the SysCmd(5) bit indicates the
data is erroneous (see Chapter 12) or the IvdErr* signal is asserted
(Chapter 12). This can only occur when a cache miss refill, uncached
reference, or an unbuffered write occurs synchronously; a Bus Error
exception resulting from a buffered write transaction must be reported
using the general interrupt mechanism.
Processing
The common interrupt vector is used for a Bus Error exception. The IBE
or DBE code in the ExcCode field of the Cause register is set, signifying
whether the instruction (as indicated by the EPC register and BD bit in the
Cause register) caused the exception by an instruction reference, load
operation, or store operation.
The EPC register contains the address of the instruction that caused the
exception, unless it is in a branch delay slot, in which case the EPC register
contains the address of the preceding branch instruction and the BD bit of
the Cause register is set. Bus Error processing is shown in Figure 5-17.
Servicing
The physical address at which the fault occurred can be computed from
information available in the CP0 registers.
• If the IBE code in the Cause register is set (indicating an
instruction fetch reference), the virtual address is contained in
the EPC register.
• If the DBE code is set (indicating a load or store reference), the
instruction that caused the exception is located at the virtual
address contained in the EPC register (or 4+ the contents of the
EPC register if the BD bit of the Cause register is set).
The virtual address of the load and store reference can then be obtained by
interpreting the instruction. The physical address can be obtained by
using the TLBP instruction and reading the EntryLo register to compute
the physical page number. The process executing at the time of this
exception is handed a UNIX SIGBUS (bus error) signal, which is usually
fatal.
Cause
An Integer Overflow exception occurs when an ADD, ADDI, SUB, DADD,
DADDI or DSUB† instruction results in a 2’s complement overflow. This
exception is not maskable.
Processing
The common exception vector is used for this exception, and the OV code
in the Cause register is set.
The EPC register contains the address of the instruction that caused the
exception unless the instruction is in a branch delay slot, in which case the
EPC register contains the address of the preceding branch instruction and
the BD bit of the Cause register is set.
Integer Overflow exception processing is shown in Figure 5-17.
Servicing
The process executing at the time of the exception is handed a UNIX
SIGFPE/FPE_INTOVF_TRAP (floating-point exception/integer
overflow) signal. This error is usually fatal to the current process.
Trap Exception
Cause
The Trap exception occurs when a TGE, TGEU, TLT, TLTU, TEQ, TNE,
TGEI, TGEUI, TLTI, TLTUI, TEQI, or TNEI† instruction results in a TRUE
condition. This exception is not maskable.
Processing
The common exception vector is used for this exception, and the Tr code
in the Cause register is set.
The EPC register contains the address of the instruction causing the
exception unless the instruction is in a branch delay slot, in which case the
EPC register contains the address of the preceding branch instruction and
the BD bit of the Cause register is set.
Trap exception processing is shown in Figure 5-17.
Servicing
The process executing at the time of a Trap exception is handed a UNIX
SIGFPE/FPE_INTOVF_TRAP (floating-point exception/integer
overflow) signal. This error is usually fatal.
Cause
A System Call exception occurs during an attempt to execute the
SYSCALL instruction. This exception is not maskable.
Processing
The common exception vector is used for this exception, and the Sys code
in the Cause register is set.
The EPC register contains the address of the SYSCALL instruction unless
it is in a branch delay slot, in which case the EPC register contains the
address of the preceding branch instruction.
If the SYSCALL instruction is in a branch delay slot, the BD bit of the Status
register is set; otherwise this bit is cleared.
System Call exception processing is shown in Figure 5-17.
Servicing
When this exception occurs, control is transferred to the applicable system
routine.
To resume execution, the EPC register must be altered so that the
SYSCALL instruction does not re-execute; this is accomplished by adding
a value of 4 to the EPC register (EPC register + 4) before returning.
If a SYSCALL instruction is in a branch delay slot, a more complicated
algorithm, beyond the scope of this description, may be required.
Breakpoint Exception
Cause
A Breakpoint exception occurs when an attempt is made to execute the
BREAK instruction. This exception is not maskable.
Processing
The common exception vector is used for this exception, and the BP code
in the Cause register is set.
The EPC register contains the address of the BREAK instruction unless it
is in a branch delay slot, in which case the EPC register contains the
address of the preceding branch instruction.
If the BREAK instruction is in a branch delay slot, the BD bit of the Status
register is set, otherwise the bit is cleared.
Breakpoint exception processing is shown in Figure 5-17.
Servicing
When the Breakpoint exception occurs, control is transferred to the
applicable system routine. Additional distinctions can be made by
analyzing the unused bits of the BREAK instruction (bits 25:6), and
loading the contents of the instruction whose address the EPC register
contains. A value of 4 must be added to the contents of the EPC register
(EPC register + 4) to locate the instruction if it resides in a branch delay
slot.
To resume execution, the EPC register must be altered so that the BREAK
instruction does not re-execute; this is accomplished by adding a value of
4 to the EPC register (EPC register + 4) before returning.
If a BREAK instruction is in a branch delay slot, interpretation of the
branch instruction is required to resume execution.
Cause
The Reserved Instruction exception occurs when one of the following
conditions occurs:
• an attempt is made to execute an instruction with an undefined
major opcode (bits 31:26)
• an attempt is made to execute a SPECIAL instruction with an
undefined minor opcode (bits 5:0)
• an attempt is made to execute a REGIMM instruction with an
undefined minor opcode (bits 20:16)
• an attempt is made to execute 64-bit operations in 32-bit mode
when in User or Supervisor modes
64-bit operations are always valid in Kernel mode regardless of the value
of the KX bit in the Status register.
This exception is not maskable.
Reserved Instruction exception processing is shown in Figure 5-17.
Processing
The common exception vector is used for this exception, and the RI code
in the Cause register is set.
The EPC register contains the address of the reserved instruction unless it
is in a branch delay slot, in which case the EPC register contains the
address of the preceding branch instruction.
Servicing
No instructions in the MIPS ISA are currently interpreted. The process
executing at the time of this exception is handed a UNIX SIGILL/
ILL_RESOP_FAULT (illegal instruction/reserved operand fault) signal.
This error is usually fatal.
Cause
The Coprocessor Unusable exception occurs when an attempt is made to
execute a coprocessor instruction for either:
• a corresponding coprocessor unit that has not been marked
usable, or
• CP0 instructions, when the unit has not been marked usable
and the process executes in either User or Supervisor mode.
This exception is not maskable.
Processing
The common exception vector is used for this exception, and the CPU code
in the Cause register is set. The contents of the Coprocessor Usage Error field
of the coprocessor Control register indicate which of the four coprocessors
was referenced. The EPC register contains the address of the unusable
coprocessor instruction unless it is in a branch delay slot, in which case the
EPC register contains the address of the preceding branch instruction.
Coprocessor Unusable exception processing is shown in Figure 5-17.
Servicing
The coprocessor unit to which an attempted reference was made is
identified by the Coprocessor Usage Error field, which results in one of the
following situations:
• If the process is entitled access to the coprocessor, the
coprocessor is marked usable and the corresponding user state
is restored to the coprocessor.
• If the process is entitled access to the coprocessor, but the
coprocessor does not exist or has failed, interpretation of the
coprocessor instruction is possible.
• If the BD bit is set in the Cause register, the branch instruction
must be interpreted; then the coprocessor instruction can be
emulated and execution resumed with the EPC register
advanced past the coprocessor instruction.
• If the process is not entitled access to the coprocessor, the
process executing at the time is handed a UNIX SIGILL/
ILL_PRIVIN_FAULT (illegal instruction/privileged instruction
fault) signal. This error is usually fatal.
Floating-Point Exception
Cause
The Floating-Point exception is used by the floating-point coprocessor.
This exception is not maskable.
Processing
The common exception vector is used for this exception, and the FPE code
in the Cause register is set.
The contents of the Floating-Point Control/Status register indicate the cause
of this exception.
Floating-Point exception processing is shown in Figure 5-17.
Servicing
This exception is cleared by clearing the appropriate bit in the Floating-
Point Control/Status register.
For an unimplemented instruction exception, the kernel should emulate
the instruction; for other exceptions, the kernel should pass the exception
to the user program that caused the exception.
Watch Exception
Cause
A Watch exception occurs when a load or store instruction references the
physical address specified in the WatchLo/WatchHi System Control
Coprocessor (CP0) registers. The WatchLo register specifies whether a
load or store initiated this exception.
The CACHE instruction never causes a Watch exception.
The Watch exception is postponed if the EXL bit is set in the Status register,
and Watch is only maskable by setting the EXL bit in the Status register.
Processing
The common exception vector is used for this exception, and the Watch
code in the Cause register is set.
Watch exception processing is shown in Figure 5-17.
Servicing
The Watch exception is a debugging aid; typically the exception handler
transfers control to a debugger, allowing the user to examine the situation.
To continue, the Watch exception must be disabled to execute the faulting
instruction. The Watch exception must then be reenabled. The faulting
instruction can be executed either by interpretation or by setting
breakpoints.
Interrupt Exception
Cause
The Interrupt exception occurs when one of the eight interrupt conditions
is asserted. The significance of these interrupts is dependent upon the
specific system implementation.
Each of the eight interrupts can be masked by clearing the corresponding
bit in the Int-Mask field of the Status register, and all of the eight interrupts
can be masked at once by clearing the IE bit of the Status register.
Processing
The common exception vector is used for this exception, and the Int code
in the Cause register is set.
The IP field of the Cause register indicates current interrupt requests. It is
possible that more than one of the bits can be simultaneously set (or even
no bits may be set) if the interrupt is asserted and then deasserted before
this register is read.
Interrupt exception processing is shown in Figure 5-17.
Servicing
If the interrupt is caused by one of the two software-generated exceptions
(SW1 or SW0), the interrupt condition is cleared by setting the
corresponding Cause register bit to 0.
If the interrupt is hardware-generated, the interrupt condition is cleared
by correcting the condition causing the interrupt pin to be asserted.
Exceptions other than Reset, Soft Reset, NMI, CacheError or first-level miss
Note: Interrupts can be masked by IE or IMs
and Watch is masked if EXL = 1
Comments
Set Watch Register
Set FP Control Status Register *Watch & FP Control Status Register
are only set if the respective exception
EnHi <- VPN2, ASID
occurs.
Context <- VPN2 EnHi, X/Context are set only for
Set Cause Register *TLB- Invalid, Modified,
EXCCode, CE & Refill exceptions
Set BadVA BadVA is set only for
TLB- Invalid, Modified,
Refill- and VCED/I exceptions
Note: not set if it is a Bus Error
Check if exception within
another exception EXL =1
(SR1)
=0
Yes Instr. in No
Br.Dly. Slot?
=0 (normal) =1 (bootstrap)
BEV
PC <- 0xFFFF FFFF 8000 0000 + 180 PC <- 0xFFFF FFFF BFC0 0200 + 180
(unmapped, cached) (unmapped, uncached)
Comments
=1 Status
bit 21(TS) Optional: Check only if 2nd-level TLB miss
=0
Reset the processor
Service Code
EXL = 1
MTC0 -
EPC
STATUS
Yes Instr. in
Br.Dly. Slot?
No
=0 =0
Y XTLB N
Instruction?
PC <- 0xFFFF FFFF 8000 0000 + Vec.Off. PC <- 0xFFFF FFFF BFC0 0200 + Vec.Off.
(unmapped, cached) (unmapped, uncached)
Comments
* LLbit <- 0
Yes Instr. in
Br. Dly. Slot?
Cache Error Exception Handling (HW)
No
ERL <- 1
=0 (normal) =1 (bootstrap)
BEV
PC <- 0xFFFF FFFF A000 0000 + 100 PC <- 0xFFFF FFFF BFC0 0200 + 100
(unmapped, uncached) (unmapped, uncached)
Comments
Servicing Guidelines (SW)
Figure 5-22 Cache Error Exception Handling (HW) and Servicing Guidelines (SW)
ErrorEPC <- PC
Yes
NMI?
Servicing Guidelines (SW)
=0
NMI Service Code Status bit 20
(SR)
=1
Figure 5-23 Reset, Soft Reset & NMI Exception Handling (HW) and Servicing Guidelines (SW)
6.1 Overview
The FPU operates as a coprocessor for the CPU (it is assigned coprocessor
label CP1), and extends the CPU instruction set to perform arithmetic
operations on floating-point values.
Figure 6-1 illustrates the functional organization of the FPU.
Data Cache
FCU
64 Control
64
FP Bypass
Pipeline Chain
FAdd
+ FP Mul FP Div
FP Sqrt
64 64
64
64 64 64 64
FP Reg File
• • • •
• • • •
• • • •
(least) FGR28 FPR28 FGR28
FPR28
(most) FGR29 FPR29 FGR29
Floating-Point
Control Registers
(FCR)
Control/Status Register Implementation/Revision Register
31 FCR31 0 31 FCR0 0
Floating-Point Registers
The FPU provides:
• 16 Floating-Point registers (FPRs) when the FR bit in the Status
register equals 0, or
• 32 Floating-Point registers (FPRs) when the FR bit in the Status
register equals 1.
These 64-bit registers hold floating-point values during floating-point
operations and are physically formed from the General Purpose registers
(FGRs). When the FR bit in the Status register equals 1, the FPR references
a single 64-bit FGR.
The FPRs hold values in either single- or double-precision floating-point
format. If the FR bit equals 0, only even numbers (the least register, as
shown in Figure 6-2) can be used to address FPRs. When the FR bit is set
to a 1, all FPR register numbers are valid.
If the FR bit equals 0 during a double-precision floating-point operation,
the general registers are accessed in double pairs. Thus, in a double-
precision operation, selecting Floating-Point Register 0 (FPR0) actually
addresses adjacent Floating-Point General Purpose registers FGR0 and
FGR1.
Field Description
Imp Implementation number (0x05)
Rev Revision number in the form of y.x
Reserved. Must be written as zeroes, and returns zeroes
0
when read.
Field Description
When set, denormalized results are flushed to 0 instead of causing an
FS
unimplemented operation exception.
C Condition bit. See description of Control/Status register Condition bit.
Cause bits. See Figure 6-5 and the description of Control/Status register
Cause
Cause, Flag, and Enable bits.
Enable bits. See Figure 6-5 and the description of Control/Status register
Enables
Cause, Flag, and Enable bits.
Flag bits. See Figure 6-5 and the description of Control/Status register
Flags
Cause, Flag, and Enable bits.
Rounding mode bits. See Table 6-4 and the description of Control/Status
RM
register Rounding Mode Control bits.
Bit # 17 16 15 14 13 12
Cause
E V Z O U I Bits
Bit # 11 10 9 8 7
Enable
V Z O U I Bits
Bit # 6 5 4 3 2
Flag
V Z O U I Bits
Inexact Operation
Underflow
Overflow
Division by Zero
Invalid Operation
Unimplemented Operation
Cause Bits
Bits 17:12 in the Control/Status register contain Cause bits, as shown in
Figure 6-5, which reflect the results of the most recently executed
instruction. The Cause bits are a logical extension of the CP0 Cause register;
they identify the exceptions raised by the last floating-point operation and
raise an interrupt or exception if the corresponding enable bit is set. If
more than one exception occurs on a single instruction, each appropriate
bit is set.
The Cause bits are written by each floating-point operation (but not by
load, store, or move operations). The Unimplemented Operation (E) bit is
set to a 1 if software emulation is required, otherwise it remains 0. The
other bits are set to 0 or 1 to indicate the occurrence or non-occurrence
(respectively) of an IEEE 754 exception.
Enable Bits
A floating-point exception is generated any time a Cause bit and the
corresponding Enable bit are set. A floating-point operation that sets an
enabled Cause bit forces an immediate exception, as does setting both
Cause and Enable bits with CTC1.
There is no enable for Unimplemented Operation (E). Setting
Unimplemented Operation always generates a floating-point exception.
Before returning from a floating-point exception, software must first clear
the enabled Cause bits with a CTC1 instruction to prevent a repeat of the
interrupt. Thus, User mode programs can never observe enabled Cause
bits set; if this information is required in a User mode handler, it must be
passed somewhere other than the Status register.
For a floating-point operation that sets only unenabled Cause bits, no
exception occurs and the default result defined by IEEE 754 is stored. In
this case, the exceptions that were caused by the immediately previous
floating-point operation can be determined by reading the Cause field.
Flag Bits
The Flag bits are cumulative and indicate that an exception was raised by
an operation that was executed since they were explicitly reset. Flag bits
are set to 1 if an IEEE 754 exception is raised, otherwise they remain
unchanged. The Flag bits are never cleared as a side effect of floating-point
operations; however, they can be set or cleared by writing a new value into
the Status register, using a Move To Coprocessor Control instruction.
When a floating-point exception is taken, the flag bits are not set by the
hardware; floating-point exception software is responsible for setting
these bits before invoking a user handler.
Rounding
Mode Mnemonic Description
RM(1:0)
Round result to nearest representable
value; round to value with least-
0 RN
significant bit 0 when the two nearest
representable values are equally near.
Round toward 0: round to value closest to
1 RZ and not greater in magnitude than the
infinitely precise result.
Round toward +∞: round to value closest
2 RP to and not less than the infinitely precise
result.
Round toward – ∞: round to value closest
3 RM to and not greater than the infinitely
precise result.
31 30 23 22 0
s e f
Sign Exponent Fraction
1 8 23
Figure 6-6 Single-Precision Floating-Point Format
63 62 52 51 0
s e f
Sign Exponent Fraction
1 11 52
Figure 6-7 Double-Precision Floating-Point Format
No. Equation
31 30 0
Sign Integer
1 31
Field Description
sign sign bit
integer integer value
Table 6-9 FPU Instruction Summary: Load, Move and Store Instructions
OpCode Description
LWC1 Load Word to FPU
SWC1 Store Word from FPU
LDC1 Load Doubleword to FPU
SDC1 Store Doubleword From FPU
MTC1 Move Word To FPU
MFC1 Move Word From FPU
CTC1 Move Control Word To FPU
CFC1 Move Control Word From FPU
DMTC1 Doubleword Move To FPU
DMFC1 Doubleword Move From FPU
OpCode Description
CVT.S.fmt Floating-point Convert to Single FP
CVT.D.fmt Floating-point Convert to Double FP
CVT.W.fmt Floating-point Convert to 32-bit Fixed Point
CVT.L.fmt Floating-point Convert to 64-bit Fixed Point
ROUND.W.fmt Floating-point Round to 32-bit Fixed Point
ROUND.L.fmt Floating-point Round to 64-bit Fixed Point
TRUNC.W.fmt Floating-point Truncate to 32-bit Fixed Point
TRUNC.L.fmt Floating-point Truncate to 64-bit Fixed Point
CEIL.W.fmt Floating-point Ceiling to 32-bit Fixed Point
CEIL.L.fmt Floating-point Ceiling to 64-bit Fixed Point
FLOOR.W.fmt Floating-point Floor to 32-bit Fixed Point
FLOOR.L.fmt Floating-point Floor to 64-bit Fixed Point
OpCode Description
ADD.fmt Floating-point Add
SUB.fmt Floating-point Subtract
MUL.fmt Floating-point Multiply
DIV.fmt Floating-point Divide
ABS.fmt Floating-point Absolute Value
MOV.fmt Floating-point Move
NEG.fmt Floating-point Negate
SQRT.fmt Floating-point Square Root
OpCode Description
C.cond.fmt Floating-point Compare
BC1T Branch on FPU True
BC1F Branch on FPU False
BC1TL Branch on FPU True Likely
BC1FL Branch on FPU False Likely
Data Alignment
All coprocessor loads and stores reference the following aligned data
items:
• For word loads and stores, the access type is always WORD,
and the low-order 2 bits of the address must always be 0.
• For doubleword loads and stores, the access type is always
DOUBLEWORD, and the low-order 3 bits of the address must
always be 0.
Endianness
Regardless of byte-numbering order (endianness) of the data, the address
specifies the byte that has the smallest byte address in the addressed field.
For a big-endian system, it is the leftmost byte; for a little-endian system,
it is the rightmost byte.
Instruction Execution
Figure 6-9 illustrates the 8-instruction overlap in the FPU pipeline.
PCycle
MasterClock (8-Deep)
Cycle
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Current
CPU
Cycle
Figure 6-9 assumes that one instruction is completed every PCycle. Most
FPU instructions, however, require more than one cycle in the EX stage.
This means the FPU must stall the pipeline if an instruction execution
cannot proceed because of register or resource conflicts.
Figure 6-10 illustrates the effect of a three-cycle stall on the FPU pipeline.
To lessen the performance impact that results from stalling the instruction
pipeline, the FPU allows instructions to overlap so that instruction
execution can proceed as long as there are no resource conflicts, data
dependencies, or exception conditions. The following sections describe
the timing and overlapping of FPU instructions.
Stage Description
A FPU Adder Mantissa Add stage
E FPU Adder Exception Test stage
EX CPU EX stage
M FPU Multiplier 1st stage
N FPU Multiplier 2nd stage
R FPU Adder Result Round stage
S FPU Adder Operand Shift stage
U FPU Unpack stage
MUL.[S.D] I2 U M M M M N N/A R – – – – – – – – – – – – – – – – – No
MUL.[S.D] I3 U M M M M N N/A R – – – – – – – – – – – – – – No
MUL.[S.D] I3 U M M M M N N/A R – – – – – – – – – – – – – – No
MUL.[S.D] I4 U M M M M N N/A R – – – – – – – – – – – No
NEG.[S,D] U S
ADD.[S,D] U S+A A+R R+S
NOP U
NOP U
C.COND.[S,D] U A R
NOP U
SQRT.[S,D] U E A+R . . . A+R R
NOP U
...
...
NOP U
ADD.[S,D] U S A R
Resource Conflict. The adder must allow the cleanup stages (A, R) of a
multiplication instruction to be pipelined with the execution of an
ADD.[S,D], SUB.[S,D], or C.COND.[S,D] instruction, as long as no two
instructions simultaneously attempt to use the same A and R pipe stages.
For instance, Figure 6-14 shows a resource conflict between the mantissa
add (A, stage 7) of instructions 1, 5, and 6. This figure also shows the
resource conflict between result round (R), stage 8, of instructions 1, 5, and
6. The multiplication cleanup cycles (A, R) can neither overlap nor
pipeline with any other instruction currently in the adder pipe.
Figures 6-14 through 6-17 show these constraints.
Stage#
1 2 3 4 5 6 7 8 9 10 11 Legal to Issue?
MUL.D I1 U M M M M N N/A R
ADD.[S,D] I2 U S+A A+R R+S – – – – – – – – – – – – – – – – – – – – – – – – – – Yes
I3 U S+A A+R R+S – – – – – – – – – – – – – – – – – – – – – – – Yes
I4 U S+A A+R R+S – – – – – – – – – – – – – – – – – – – – Yes
I5 U S+A A+R R+S – – – – – – – – – – – – – – – – – No
I6 U S+A A+R R+S – – – – – – – – – – – – – – No
I7 U S+A A+R R+S – – – – – – – – – – – Yes
Indicates a resource conflict
I8 U S+A A+R R+S – – – – – – – – – Yes
Stage#
1 2 3 4 5 6 7 8 9 10 11 Legal to Issue?
MUL.S I1 U M M M N N/A R
ADD.[S,D] I2 U S+A A+R R+S – – – – – – – – – – – – – – – – – – – – – – – – – – Yes
I3 U S+A A+R R+S – – – – – – – – – – – – – – – – – – – – – – – Yes
I4 U S+A A+R R+S –––––––––––––––––––– No
I5 U S+A A+R R+S ––––––––––––––––– No
I6 U S+A A+R R+S – – – – – – – – – – – – – – Yes
I7 U S+A A+R R+S – – – – – – – – – – – Yes
Indicates a resource conflict
I8 U S+A A+R R+S – – – – – – – – – Yes
Stage#
1 2 3 4 5 6 7 8 9 10 Legal to Issue?
MUL.D I1 U M M M M N N/A R
CMP.[S,D] I2 U A R – – – – – – – – – – – – – – – – – – – – – – – – – – Yes
I3 U A R – – – – – – – – – – – – – – – – – – – – – – – Yes
I4 U A R – – – – – – – – – – – – – – – – – – – – Yes
I5 U A R – – – – – – – – – – – – – – – – – No†
I6 U A R –––––––––––––– No
I7 U A R – – – – – – – – – – – Yes
Indicates a resource conflict
I8 U A R – – – – – – – – – Yes
†
While there is no resource conflict in issuing this CMP.[S,D] instruction, the hardware does
not allow it.
Figure 6-16 MUL.D and CMP.[S,D] Cleanup Cycle Conflict in FPU Adder
Stage#
1 2 3 4 5 6 7 8 9 10 Legal to Issue?
MUL.S I1 U M M M N N/A R
CMP.[S,D] I2 U A R – – – – – – – – – – – – – – – – – – – – – – – – – – Yes
I3 U A R – – – – – – – – – – – – – – – – – – – – – – – Yes
†
I4 U A R – – – – – – – – – – – – – – – – – – – – No
I5 U ––––––––––––––––– No
A R
I6 U A R – – – – – – – – – – – – – – Yes
I7 U A R – – – – – – – – – – – Yes
Indicates a resource conflict
I8 U A R – – – – – – – – – Yes
†
While there is no resource conflict in issuing this CMP.[S,D] instruction, the hardware does
not allow it.
Figure 6-17 MUL.S and CMP.[S,D] Cleanup Cycle Conflict in FPU Adder
Prep and Cleanup Cycle Overlap. Τhe adder does not allow the
preparation (U stage) and cleanup cycles (N, A, R) of a division instruction
to be pipelined with any other instruction; however, the adder does allow
the last cycle of preparation or cleanup to be overlapped one clock by the
following instruction’s U stage (the CPU EX cycle). Figure 6-18 shows this
process.
Table 6-16 Latency, Repeat Rate, and Pipe Stages of FPU Instructions
MUL.[S,D] can start only when all of the following conditions are met in
the RF stage:
• The multiplier is one of the following:
- idle, or in its second-to-last execution cycle.
- not within the first two execution cycles (EX, EX+1) if the
most recent instruction in the multiplier pipe is MUL.S
- not within the first three execution cycles (EX...EX+2) if
the most recent instruction in the multiplier pipe is
MUL.D
• The adder is one of the following:
- idle, or in its second-to-last execution cycle.
- not processing the first execution cycle (EX) of CVT.S.L
• The adder is not processing a square root instruction
• The divider is one of the following:
- idle, or in its second-to-last execution cycle.
- in the first 8 execution cycles (EX...EX+7) of a DIV.S
- in the first 21 execution cycles, except for the second
execution cycle, (cycles EX, EX+2...EX+20) of a DIV.D)
SQRT.[S,D] can start only when all of the following conditions are met in
the RFstage:
• The divider is either idle, or in its second-to-last execution cycle.
• The adder is either idle, or in its second-to-last execution cycle.
• The multiplier is either idle, or in its second-to-last execution
cycle.
CVT.fmt, NEG.[S,D] or ABS.[S,D] instructions can only start when all of
the following conditions are met in the RF stage:
• The adder is either idle, or in its second-to-last execution cycle.
• The multiplier is either idle, or in its second-to-last execution
cycle.
• The divider is one of the following:
- idle, or in its second-to-last execution cycle.
- in the third through eighth execution cycle (EX+2...EX+7)
of a DIV.S
- in the third through twenty-first execution cycle
(EX+2...EX+20) of a DIV.D
Bit # 17 16 15 14 13 12
Cause
E V Z O U I Bits
Bit # 11 10 9 8 7
Enable
V Z O U I Bits
Bit # 6 5 4 3 2
Flag
V Z O U I Bits
Inexact Operation
Underflow
Overflow
Division by Zero
Invalid Operation
Unimplemented Operation
7.3 Flags
A Flag bit is provided for each IEEE exception. This Flag bit is set to a 1 on
the assertion of its corresponding exception, with no corresponding
exception trap signaled.
The Flag bit is reset by writing a new value into the Status register; flags
can be saved and restored by software either individually or as a group.
When no exception trap is signaled, floating-point coprocessor takes a
default action, providing a substitute value for the exception-causing
result of the floating-point operation. The particular default action taken
depends upon the type of exception. Table 7-1 lists the default action taken
by the FPU for each of the IEEE exceptions.
Rounding
Field Description Default action
Mode
Inexact
I Any Supply a rounded result
exception
Modify underflow values to 0 with the sign of the
RN
intermediate result
Modify underflow values to 0 with the sign of the
RZ
Underflow intermediate result
U
exception Modify positive underflows to the format’s smallest positive
RP
finite number; modify negative underflows to -0
Modify negative underflows to the format’s smallest
RM
negative finite number; modify positive underflows to 0
Modify overflow values to ∞ with the sign of the
RN
intermediate result
Modify overflow values to the format’s largest finite number
RZ
Overflow with the sign of the intermediate result
O
exception Modify negative overflows to the format’s most negative
finite number; modify positive overflows to + ∞
RP
The FPU detects the eight exception causes internally. When the FPU
encounters one of these unusual situations, it causes either an IEEE
exception or an Unimplemented Operation exception (E).
Table 7-2 lists the exception-causing situations and contrasts the behavior
of the FPU with the requirements of the IEEE Standard 754.
This chapter describes the signals used by and in conjunction with the
R4000 processor. The signals include the System interface, the Clock/
Control interface, the Secondary Cache interface, the Interrupt interface,
the Joint Test Action Group (JTAG) interface, and the Initialization
interface.
Signals are listed in bold, and low active signals have a trailing asterisk—
for instance, the low-active Read Ready signal is RdRdy*. The signal
description also tells if the signal is an input (the processor receives it) or
output (the processor sends it out).
Figure 8-1 illustrates the functional groupings of the processor signals.
64 128
SysAD(63:0) SCData (127:0)
8 16
SysADC(7:0) SCDChk (15:0)
17
ValidIn* SCAddr (17:1)
4
ValidOut* SCAddr0 (w,x,y,z)
3
ExtRqst* SCAPar(2:0)
Release* SCOE*
4
RdRdy* SCWr(w,x,y,z)*
WrRdy* SCDCS*
IvdAck* (3) SCTCS*
IvdErr* (3) R4000
Interrupt Interface
2 Logic 5
TClock(1:0) Int(5:1)* (2)
2 Symbol
RClock(1:0) Int0*
MasterClock NMI*
MasterOut
SyncOut
ModeClock
Clock/Control Interface
SyncIn
Initialization
ModeIN
Interface
IOOut
IOIn VCCOk
Fault* ColdReset*
VccP Reset*
VssP
8
Status(7:0) (4)
JTDI
Interface
VccSense (1)
JTAG
JTDO
VssSense (1) JTMS
JTCK
(1) = R4000SC and R4000MC only (2) = R4000PC only
(3) = R4000MC only (4) = R4400 only
Asserted
Description Name I/O 3-State
State
Secondary cache data bus SCData(127:0) I/O High Yes
Secondary cache data ECC bus SCDChk(15:0) I/O High Yes
Secondary cache tag bus SCTag(24:0) I/O High Yes
Secondary cache tag ECC bus SCTChk(6:0) I/O High Yes
Secondary cache address bus SCAddr(17:1) O High No
Secondary cache address LSB SCAddr0Z O High No
Secondary cache address LSB SCAddr0Y O High No
Secondary cache address LSB SCAddr0X O High No
Secondary cache address LSB SCAddr0W O High No
Secondary cache address parity bus SCAPar(2:0) O High No
Secondary cache output enable SCOE* O Low No
Secondary cache write enable SCWrZ* O Low No
Secondary cache write enable SCWrY* O Low No
Secondary cache write enable SCWrX* O Low No
Secondary cache write enable SCWrW* O Low No
Secondary cache data chip select SCDCS* O Low No
Secondary cache tag chip select SCTCS* O Low No
System address/data bus SysAD(63:0) I/O High Yes
System address/data check bus SysADC(7:0) I/O High Yes
System command/data identifier bus SysCmd(8:0) I/O High Yes
System command/data identifier bus parity SysCmdP I/O High Yes
Valid input ValidIn* I Low No
Valid output ValidOut* O Low No
External request ExtRqst* I Low No
Release interface Release* O Low No
Read ready RdRdy* I Low No
Write ready WrRdy* I Low No
Invalidate acknowledge IvdAck* I Low No
Invalidate error IvdErr* I Low No
Asserted
Description Name I/O 3-State
State
Interrupt Int*(0) I Low No
Nonmaskable interrupt NMI* I Low No
Boot mode data in ModeIn I High No
Boot mode clock ModeClock O High No
JTAG data in JTDI I High No
JTAG data out JTDO O High No
JTAG command JTMS I High No
JTAG clock input JTCK I High No
Transmit clocks TClock(1:0) O High No
Receive clocks RClock(1:0) O High No
Master clock MasterClock I High No
Master clock out MasterOut O High No
Synchronization clock out SyncOut O High No
Synchronization clock in SyncIn I High No
I/O output IOOut O High No
I/O input IOIn I High No
Vcc is OK VCCOk I High No
Cold reset ColdReset* I Low No
Reset Reset* I Low No
Fault Fault* O Low No
Quiet Vcc for PLL VccP I High No
Quiet Vss for PLL VssP I High No
Status Status(7:0) O High No
Vcc sense VccSense I/O N/A No
Vss sense VssSense I/O N/A No
Asserted
Description Name I/O 3-State
State
System address/data bus SysAD(63:0) I/O High Yes
System address/data check bus SysADC(7:0) I/O High Yes
System command/data identifier bus SysCmd(8:0) I/O High Yes
System command/data identifier bus parity SysCmdP I/O High Yes
Valid input ValidIn* I Low No
Valid output ValidOut* O Low No
External request ExtRqst* I Low No
Release interface Release* O Low No
Read ready RdRdy* I Low No
Write ready WrRdy* I Low No
Interrupts Int*(5:1) I Low No
Interrupt Int*(0) I Low No
Nonmaskable interrupt NMI* I Low No
Boot mode data in ModeIn I High No
Boot mode clock ModeClock O High No
JTAG data in JTDI I High No
JTAG data out JTDO O High No
JTAG command JTMS I High No
JTAG clock input JTCK I High No
Transmit clocks TClock(1:0) O High No
Receive clocks RClock(1:0) O High No
Master clock MasterClock I High No
Master clock out MasterOut O High No
Synchronization clock out SyncOut O High No
Synchronization clock in SyncIn I High No
I/O output IOOut O High No
I/O input IOIn I High No
Vcc is OK VCCOk I High No
Asserted
Description Name I/O 3-State
State
Cold reset ColdReset* I Low No
Reset Reset* I Low No
Fault Fault* O Low No
Quiet Vcc for PLL VccP I High No
Quiet Vss for PLL VssP I High No
This chapter describes the R4000 Initialization interface. This includes the
reset signal description and types, initialization sequence, with signals
and timing dependencies, and boot modes, which are set at initialization
time.
Signal names are listed in bold letters—for instance the signal VCCOk
indicates +5 voltage is stable. Low-active signals are indicated by a
trailing asterisk, such as ColdReset*, the power-on/cold reset signal.
† Asserted means the signal is true, or in its valid state. For example, the low-active Reset*
signal is said to be asserted when it is in a low (true) state; the high-active VCCOk signal
is true when it is asserted high.
Power-on Reset
The sequence for a power-on reset is listed below.
1. Power-on reset applies a stable Vcc of at least 4.75 volts from the
+5 volt power supply to the processor. It also supplies a stable,
continuous system clock at the processor operational frequency.
2. After at least 100 ms of stable Vcc and MasterClock, the VCCOk
signal is asserted to the processor. The assertion of VCCOk
initializes the processor operating parameters. After the mode
bits have been read in, the processor allows its internal phase
locked loops to lock, stabilizing the processor internal clock,
PClock, the SyncOut-SyncIn clock path (described in Chapter
10), and the master clock output, MasterOut. Note that when
JTAG is not used, JTCK must be tied low at the rising edge of
VCCOk for the processor to properly reset. If JTAG is used, JTCK
may be toggled during power-up.
3. ColdReset* is asserted for at least 64K (216) MasterClock cycles
after the assertion of VCCOk. Once the processor reads the boot-
time mode control serial data stream, ColdReset* can be
deasserted. ColdReset* must be deasserted synchronously with
MasterClock.
4. The deassertion of ColdReset* synchronizes the rising edges of
SClock and TClock with the rising edge of the next MasterClock,
aligning SClock, TClock, and RClock (which is 90 degrees ahead
of phase with SClock and TClock) of all processors in a
multiprocessor system. However, these clocks are only
guaranteed to be stabilized 64 MasterClock cycles after
ColdReset* is deasserted.
5. After ColdReset* is deasserted synchronously and SClock,
TClock, and RClock have stabilized, Reset* is deasserted to allow
the processor to begin running. (Reset* must be held asserted for
at least 64 MasterClock cycles after the deassertion of
ColdReset*.) Reset* must be deasserted synchronously with
MasterClock.
NOTE: ColdReset* must be asserted when VCCOk asserts. The
behavior of the processor is undefined if VCCOk asserts while
ColdReset* is deasserted.
Cold Reset
A cold reset can begin anytime after the processor has read the
initialization data stream, causing the processor to start with the Reset
exception. For information about saving processor states, see the
description of the Reset exception in Chapter 5.
A cold reset requires the same sequence as a power-on reset except that the
power is presumed to be stable before the assertion of the reset inputs and
the deassertion of VCCOk.
To begin the reset sequence, VCCOk must be deasserted for a minimum
of at least 64 MasterClock cycles before reassertion.
Warm Reset
To execute a warm reset, the Reset* input is asserted synchronously with
MasterClock. It is then held asserted for at least 64 MasterClock cycles
before being deasserted synchronously with MasterClock. The processor
internal clocks, PClock and SClock, and the System interface clocks,
TClock and RClock, are not affected by a warm reset. The boot-time
mode control serial data stream is not read by the processor on a warm
reset. A warm reset forces the processor to start with a Soft Reset
exception. For information about saving processor states, see the
description of the Soft Reset exception in Chapter 5.
The master clock output, MasterOut, can be used to generate any reset-
related signals for the processor that must be synchronous with
MasterClock.†
After a power-on reset, cold reset, or warm reset, all processor internal
state machines are reset, and the processor begins execution at the reset
vector. All processor internal states are preserved during a warm reset,
although the precise state of the caches depends on whether or not a cache
miss sequence has been interrupted by resetting the processor state
machines.
† Since MasterOut is undefined until after the serial PROM is read, reset logic must not
depend on MasterOut before the boot PROM is read.
5.25V
4.75V
Vcc Wavy lines indicate one or more identical
cycles, not shown due to space constraints
MasterClock
(MClk) TDS
> 100ms 256
VCCOK 256 MClk cycles MClk
Figure 9-1
cycles
ModeClock
TMDS
TMDH
Bit
Power-on Reset
Bit 0 Bit 1
ModeIn 255
TDS TDS
> 64K MClk cycles*
ColdReset* > 64 MClk cycles
TDS *Considering multiple processing variables and systems- TDS
Undefined
MasterOut
Undefined
SyncOut
TClock and RClock are stable
Undefined after 64 MClk cycles
TClock
219
Undefined
RClock
MIPS R4000 Microprocessor User's Manual Cold Reset
MasterClock
(MClk) TDS
TDS
> 64 MClk
cycles 256
VCCOK 256 MClk cycles MClk
cycles
Figure 9-2
ModeClock
TMDS
TMDH
Bit 1 Bit
Bit 0
ModeIn 255
Cold Reset
TDS TDS
> 64K MClk cycles*
ColdReset* > 64 MClk cycles
TDS *Considering multiple processing variables and systems- TDS
Undefined
MasterOut
Undefined
SyncOut
TClock and RClock are stable
Undefined
after 64 MClk cycles
TClock
220
Undefined
RClock
MIPS R4000 Microprocessor User's Manual Warm Reset
MasterClock
(MClk)
VCCOK
256 MClk cycles
ModeClock
Figure 9-3
ModeIn
Warm Reset
ColdReset*
TDS TDS
> 64 MClk cycles
Reset*
Undefined
MasterOut
Undefined
SyncOut
Undefined
TClock
Undefined
RClock
221
Chapter 9
10
This chapter describes the clock signals (“clocks”) used in the R4000
processor and the processor status reporting mechanism.
The subject matter includes basic system clocks, system timing
parameters, connecting clocks to a phase-locked system, connecting clocks
to a system without phase locking, and processor status outputs.
1 2 3 4
high-to-low
transition low-to-high
transition
data out
Q
data in
clock input
Clock-to-Q
delay
MasterClock
The processor bases all internal and external clocking on the single
MasterClock input signal. The processor generates the clock output
signal, MasterOut, at the same frequency as MasterClock and aligns
MasterOut with MasterClock, if SyncIn is connected to SyncOut.
MasterOut
The processor generates the clock output signal, MasterOut, at the same
frequency as MasterClock and aligns MasterOut with MasterClock, if
SyncIn is connected to SyncOut. MasterOut clocks external logic, such as
the reset logic.
SyncIn/SyncOut
The processor generates SyncOut at the same frequency as MasterClock
and aligns SyncIn with MasterClock.
SyncOut must be connected to SyncIn either directly, or through an
external buffer. The processor can compensate for both output driver and
input buffer delays (and, when necessary, delay caused by an external
buffer) when aligning SyncIn with MasterClock. Figure 10-7 gives an
illustration of SyncOut connected to SyncIn through an external buffer.
PClock
The processor generates an internal clock, PClock, at twice the frequency
of MasterClock and precisely aligns every other rising edge of PClock
with the rising edge of MasterClock.
All internal registers and latches use PClock.
SClock
The R4000 processor divides PClock by 2, 3, or 4 (as programmed at boot-
mode initialization) to generate the internal clock signal, SClock. The
R4400 processor divides PClock by 2, 3, 4, 6 or 8 (as programmed at boot-
mode initialization) to generate SClock. The processor uses SClock to
sample data at the system interface and to clock data into the processor
system interface output registers.
The first rising edge of SClock, after ColdReset* is deasserted, is aligned
with the first rising edge of MasterClock.
TClock
TClock (transmit clock) clocks the output registers of an external agent,†
and can be a global system clock for any other logic in the external agent.
TClock is the same frequency as SClock. When SyncIn is shorted to
SyncOut, the edges of TClock align precisely with the edges of SClock
and MasterClock.
When a delay is added between SyncIn and SyncOut, the TClock at the
pins leads SClock (and thus MasterClock) by the same amount of delay.
If the delay between SyncIn and SyncOut is matched to an external delay
between TClock at the processor and TClock at the external logic, the
TClock at the external logic aligns to SClock and MasterClock.
RClock
The external agent uses RClock (receive clock) to clock its input registers.
The processor generates RClock at the same frequency as TClock, but
RClock always leads TClock and SClock by 25 percent of SClock cycle
time. The relationship between RClock and TClock is independent of the
delay between SyncIn and SyncOut.
PClock-to-SClock Division
Figure 10-3 shows the clocks for a PClock-to-SClock division by 2; Figure
10-4 shows the clocks for a PClock-to-SClock division by 4.
Cycle 1 2 3 4
MasterClock
tMCkHigh
tMCkLow
tMCkP
MasterOut
PClock
SClock
TClock
RClock
SysAD Driven D D D D
tDM
tDO
SysAD Received D D D D
tDS
tDH
cycle 1 2 3 4
MasterClock
SyncOut
PClock
SClock
TClock
RClock
SysAD Driven D D
tDM
tDO
SysAD Received D D
tDS
tDH
Alignment to SClock
Processor data becomes stable a minimum of tDM ns and a maximum of
tDO ns after the rising edge of SClock. This drive-time is the sum of the
maximum delay through the processor output drivers together with the
maximum clock-to-Q delay of the processor output registers.
Alignment to MasterClock
Certain processor inputs (specifically VCCOk, ColdReset*, and Reset*)
are sampled based on MasterClock, while others (specifically, Status(7:0))
are output based on MasterClock. The same setup, hold, and drive-off
parameters, tDS, tDH, tDM, and tDO, shown in Figures 10-3 and 10-4, apply
to these inputs and outputs, but they are measured by MasterClock
instead of SClock.
MasterClock
MasterClock MasterClock
SysCmd SysCmd
SysAD SysAD
SyncOut
SyncIn
RClock
TClock
Figure 10-6 is a block diagram of a system without phase lock, using the
R4000 processor with an external agent implemented as a gate array.
Sampling Staging
Gate Register Register
MasterClock Array
R4000
MasterClock
SysCmd
SysAD
SyncOut
SyncIn
RClock
TClock
CE
Sampling Staging
Register Register
CE
Figure 10-6 Gate-Array System without Phase Lock, using the R4000 Processor
In a system without phase lock, the transmission time for a signal from the
processor to an external agent composed of gate arrays can be calculated
from the following equation:
Transmission Time = (75 percent of TClock period) – (tDO for R4000)
+ (Minimum External Clock Buffer Delay)
– (External Sample Register Setup Time)
– (Maximum Clock Jitter for R4000 Internal Clocks)
– (Maximum Clock Jitter for RClock)
The transmission time for a signal from an external agent composed of gate
arrays to the processor in a system without phase lock can be calculated
from the following equation:
Transmission Time = (TClock period) – (tDS for R4000)
– (Maximum External Clock Buffer Delay)
– (Maximum External Output Register Clock-to-Q Delay)
– (Maximum Clock Jitter for TClock)
– (Maximum Clock Jitter for R4000 Internal Clocks)
MasterClock
R4000
MasterClock
SysCmd
SysAD Control
Gate
Array
SyncOut
SyncIn
RClock
TClock
Sample
Registers CE CE
Memory
Memory
Figure 10-7 Gate Array and CMOS System without Phase Lock, using the R4000 Processor
In this clocking methodology, the hold time of data driven from the
processor to an external sampling register is a critical parameter. To
guarantee hold time, the minimum output delay of the processor, tDM,
must be greater than the sum of:
minimum hold time for the external sampling register
+ maximum clock jitter for R4000 internal clocks
+ maximum clock jitter for TClock
+ maximum delay mismatch of the external clock buffers
Table 10-1 shows the encoding of processor’s status for pins Status(7:4) or
Status(3:0).
11
This chapter describes in detail the cache memory: its place in the R4000
memory organization, individual operations of the primary and
secondary caches, cache interactions, and an example of a cache coherency
request cycle. The chapter concludes with a description of R4000
processor synchronization in a multiprocessor environment.
This chapter uses the following terminology:
• The primary cache may also be referred to as the P-cache.
• The secondary cache may also be referred to as the S-cache.
• The primary data cache may also be referred to as the D-cache.
• The primary instruction cache may also be referred to as the
I-cache.
These terms are used interchangeably throughout this book.
R4000 CPU
Registers
Registers Registers
I-cache D-cache
Caches
Primary Cache
Main Memory
Peripherals
Disk, CD-ROM,
Tape, etc.
The R4000 processor has two on-chip primary caches: one holds
instructions (the instruction cache), the other holds data (the data cache).
Off-chip, the R4000 processor supports a secondary cache on the R4000SC
and MC models.
R4000PC
I-cache
Primary
Caches
D-cache
R4000SC/MC
I-cache
Primary
Caches
D-cache
† Primary and secondary cache tags are described in the following sections.
25 24 23 0
P V PTag
1 1 24
71 64 63 0
PTag Physical tag (bits 35:12 of the physical address) DataP Data
V Valid bit
Data Cache data DataP Data
P Even parity for the PTag and V fields
DataP Data
DataP Even parity; 1 parity bit per byte of data
DataP Data
8 64
28 27 26 25 24 23 0
W’ W P CS PTag
1 1 1 2 24
71 64 63 0
DataP Data
DataP Data
Data
DataP
DataP Data
Data
DataP
DataP Data
Data
8 64
In all R4000 processors, the W (write-back) bit, not the cache state,
indicates whether or not the primary cache contains modified data that
must be written back to memory or to the secondary cache.
Tags
Data
Tag line
W W’ State Tag P
64
Data
31 25 24 22 21 19 18 0
ECC CS PIdx STag
7 3 3 19
Tags
Data
Tag line
Data
Shared State
Read hit
I/O invalidate received
Invalidate received
Write hit
I/O
Invalid invalidate
Invalidate received
received
I/O
Invalidate invalidate
received received
Clean
Shared Bus Exclusive
Read hit, read
Update received Read hit
Write hit
[update]
Update
received Write hit
Write hit
[invalidate]
Bus read [intervention]
Write hit [update],
Read hit Write hit [invalidate]
Dirty Dirty Read hit,
Shared Exclusive Write hit
The state of a secondary cache line is provided by the external agent and
is set as follows:
Case 1. If the cache line is not present in another cache, it should be loaded
in the clean exclusive state.
Case 2. If the cache line is retained by another cache and the state of the
line in that cache remains shared or dirty shared, the line should
be loaded in the shared state.
Case 3. If the cache line is retained by another cache and the cache
relinquishes ownership to the processor making the read request,
the line should be returned in the dirty shared state.
Case 4. If the cache line is retained by another cache and ownership is
relinquished to memory, the line should be loaded in the shared
state.
Case 5. If the cache line is relinquished by another cache and ownership
is transferred to the processor making the read request, the line
should be loaded in the dirty exclusive or dirty shared state.
For case 1, if the refill occurs on a store miss, the processor changes the
cache line state to dirty exclusive. For each of the remaining cases listed
above, the R4000 processor passes the state received from the external
agent to the secondary cache.
The invalid state is never used for a refill. Software, however, should
initialize the secondary cache to the invalid state after the system is
powered up.
Uncached
Lines within an uncached page are never in a cache. When a page has the
uncached coherency attribute, the processor issues a doubleword, partial-
doubleword, word, or partial-word read or write request directly to main
memory (bypassing the cache) for any load or store to a location within
that page.
Noncoherent
Lines with a noncoherent attribute can reside in a cache; a load or store miss
causes the processor to issue a noncoherent block read request to a
location within the cached page.
Sharable
Lines with a sharable attribute must be in a multiprocessor environment
(using the R4000MC), since shared lines can be in more than one cache at
a time. When the coherency attribute is sharable, the processor operates as
follows:
• a coherent block read request is issued for a load miss to a
location within the page, or
• a coherent block read request that requests exclusivity is issued
for a store miss to a location within the page.
In most systems, coherent read requests require snoops or directory
checks, and noncoherent read requests do not.† Cache lines within the
page are managed with a write invalidate protocol; that is, the processor
issues an invalidate request on a store hit to a shared cache line.
Update
Lines with an update coherency attribute must be in a multiprocessor
environment and can reside in more than one cache at a time. When the
coherency attribute is update, the processor issues a coherent block read
request for a load or store miss to a location within the page. Cache lines
within the page are managed with a write update protocol; that is, the
processor issues an update request on a store hit to a shared cache line.
† A coherent read that requests exclusivity implies that the processor functions most
efficiently if the requested cache line is returned to it in an exclusive state, but the
processor still performs correctly if the cache line is returned in a shared state.
Exclusive
Lines with an exclusive coherency attribute must be in a multiprocessor
environment. When the coherency attribute is exclusive, the processor
issues a coherent block read request that requests exclusivity for a load or
store miss to a location within the page.
Cache lines within the page are managed with a write invalidate protocol.
NOTE: Load Linked-Store Conditional instruction sequences must
ensure that the link location is not in a page managed with the
exclusive coherency attribute.
Secondary-Cache Mode
In its secondary-cache mode, an R4000MC model provides a set of cache
states and mechanisms that implement a variety of cache coherency
protocols. In particular, the processor simultaneously supports both the
write-invalidate and write-update protocols.
No-Secondary-Cache Mode
A processor in no-secondary-cache mode supports the uncached and
noncoherent coherency attributes. These two attributes are described in
the section titled Cache Coherency Attributes in this chapter.
Strong Ordering
Cache-coherent multiprocessor systems must obey ordering constraints
on stores to shared data. A multiprocessor system that exhibits the same
behavior as a uniprocessor system in a multiprogramming environment is
said to be strongly ordered.
For this algorithm to succeed, stores must have a global ordering in time;
that is, every processor in the system must agree that either the store to
location X precedes the store to location Y, or vice versa. If this global
ordering is enforced, the test algorithm for strong ordering succeeds.
Invalidate
An invalidate request causes the processor to change the state of the
specified cache line to invalid in both the primary and secondary caches.
Update
An update request causes the processor to write the specified data element
into the specified cache line, and either change the state of the cache line
to shared in both the primary and secondary caches, or leave the state of
the cache line unchanged, depending on the nature of the update request.
An external agent can issue updates to cache lines that are in either the
exclusive or shared states without changing the state of the cache line (see
the SysCmd(3) bit description in Chapter 12).
NOTE: If there is an update to a line in the primary instruction cache,
the line in the secondary cache is updated and the primary instruction
cache line is invalidated.
Snoop
A snoop request to the processor causes the processor to return the
secondary cache state of the specified cache line.
At the same time, the processor atomically† sets the state of the specified
cache line in both the primary and secondary caches according to the value
of the SysCmd(2:0) bits, which define cache state change, and are supplied
by the external agent.
† An atomic operation is one that cannot be split, or portions of it deferred. In this case, the
processor sets the state of both secondary and primary caches in an indivisible action; it
cannot set the state of one cache line, allow another process to interrupt, and then
complete the first process by setting the state of the remaining cache line.
Intervention
An intervention request causes the processor to return the secondary
cache state of the specified cache line and, under certain conditions related
to the state of the cache line and the nature of the intervention request, the
contents of the specified secondary cache line.
At the same time, the processor atomically sets the state of the specified
cache line in both the primary and secondary caches according to the value
of the SysCmd(2:0) bits which define cache state change, and are supplied
by an external agent.
System Model
To describe the implications of a coherency conflict, this section uses a
system model that is snooping, split-read, and bus-based; I/O is not
considered in this model.
The system model used in this example has the following components:
• Four processor subsystems, each consisting of an R4000MC
processor, a secondary cache, and an external agent (shown in
Figure 11-12). The external agent communicates with the
R4000MC processor, accepting processor requests and issuing
external requests. Likewise, the system bus issues and receives
bus requests.
• A memory subsystem that communicates with main memory
and the system bus.
• A system bus that has the following characteristics:
- It is a multiple master, request-based, arbitrated bus.
When an agent wishes to perform a transaction on the
bus, it must request the bus and wait for global
arbitration logic to assert a grant signal before assuming
mastership of the bus. Once mastership has been
granted, the agent can begin a transaction.
- It supports read transactions, read exclusive transactions,
write transactions, and invalidate transactions.
- It is a split-read bus. This means bus operations can
separate a read request from the return of its data.
- It is a snooping bus. All agents connected to the bus
must monitor all bus traffic to correctly maintain cache
coherency.
• All of the TLB pages in the system have either a noncoherent or
a sharable coherency attribute. (Noncoherent data is not
allowed; noncoherent page attributes are used for instructions
only.)
• The sharable coherency attribute allows data to be shared
between the four caches in the system by using a write
invalidate cache coherency protocol.
• The secondary cache states used are invalid, shared, clean
exclusive, and dirty exclusive; the dirty shared secondary
cache state is not allowed.
Subsystem 4
External
Agent
Subsystem 3
R4000MC
External
Agent Main
Memory
S-cache
R4000MC
Subsystem 2
System Bus
S-cache
External
Agent
R4000MC
Subsystem 1
External
Agent
S-cache
R4000MC
S-cache
Load
A shown in Figure 11-12, when a processor misses in the primary and
secondary caches on a load, the processor issues a read request. The
subsystem external agent translates this to a read request on the bus. The
returned data is loaded in either the clean exclusive or shared state, based
on the shared indication returned with the read response data.†
Store
In this system model, when a processor misses in the primary and
secondary caches on a store, it issues a read request with exclusivity; this
is translated to a read exclusive on the bus and data is loaded in the dirty
exclusive state.
When a processor hits in the cache on a store to shared data, it issues an
invalidate request that must be forwarded to the system bus. Before the
store can be completed and the state changed to dirty exclusive, the
invalidate request must be acknowledged.
† The shared indication is the result of an intervention request to another processor, and is
supplied by an external agent that is a part of the other three processor subsystems.
Processor Invalidate
In this system model, an invalidate request is considered complete as soon
as it appears on the system bus. When an external agent observes an
invalidate request on the system bus, it reacts as if the invalidate has
changed the state of all caches at that instant.
Processor Write
In this system model, an external agent takes no action in response to a
write request on the bus.
Invalidate Conflicts
From the time the processor issues an invalidate request until that request
is acknowledged, any external coherency request issued to the processor
that conflicts with the unacknowledged invalidate must include a
cancellation.
In the model system shown in Figure 11-12, an acknowledge for the
invalidate is sent to the processor as soon as the invalidate is forwarded to
the system bus. Therefore, while the external agent is waiting to become
a bus master to forward the invalidate request, the external agent must
detect, by using comparators, any external coherency request that conflicts
with the unacknowledged invalidate. If a conflict is detected, the external
agent must not forward the invalidate request to the system bus; instead,
it must rescind the invalidate request and submit the conflicting external
request to the processor, with a cancellation for the invalidate request.
If the response to a coherent read request conflicts with a waiting
unacknowledged processor invalidate request, the external agent detects
this conflict and does not forward the processor invalidate request to the
bus. Instead, it discards the processor invalidate request and issues to the
processor an intervention request that includes a cancellation. The
processor then reevaluates its cache state and either reissues the invalidate
request or issues a coherent read request.
If an invalidate request appears on the bus while the external agent has a
processor invalidate request waiting, and the external agent detects the
conflict, the external agent does not forward the processor invalidate
request. Instead, it discards the processor invalidate request and issues an
external invalidate request that includes a cancellation to the processor.
The processor then reevaluates its cache state and either reissues the
invalidate request or issues a coherent read request.
It is not possible for a write request that conflicts with a waiting processor
invalidate request to appear on the system bus. To issue an invalidate
request, the state of the cache line must be shared with every cache in the
system that contains the line.
Memory
System Bus 3
External External
Agent A (EA) Agent B (EB)
Processor Processor
A (PA) B (PB)
DE INV
Secondary Secondary
Cache A (SA) Cache B (SB)
Memory
System Bus 3
External External
Agent A (EA) 5 Agent B (EB)
External
2
Intervention
Request (EIR)
Processor Processor
A (PA) B (PB)
7 6 1
DE
Secondary Secondary
Cache A (SA) Cache B (SB)
4. As shown in Figure 11-14, external agent EA reads the CRR from the
bus.
5. To service this CRR, EA issues an external intervention request (EIR)
to processor A, PA.
6. PA receives the EIR and examines its secondary cache, SA.
7. Depending on the type of intervention request—based on the state of
the SysCmd(3) bit—one of the following actions is taken:
• If the cache line in SA is in the dirty exclusive state, the entire
cache line is returned.
• Otherwise, PA just returns the state of the secondary cache line.
In Figure 11-14 the retrieved data is in the dirty exclusive state (DE),
servicing a load miss, when the state of cache line SA goes from dirty
exclusive to dirty shared (DS),† indicating PA is owner of the line.
Memory
System Bus 3
8
4 9
External External
Agent A (EA) Agent B (EB)
Read 10
5 2
Response
Processor Processor
A (PA) B (PB)
7 6 1
DS 11 S
Secondary Secondary
Cache A (SA) Cache B (SB)
8. Figure 11-15 shows the cache state and cache data returned from PA,
through EA to the bus.
9. This cache state and data are returned to EB.
10. EB issues a read response to PB.
11. PA remains owner of the cache line.
Test-and-Set (Spinlock)
Test-and-set† uses a variable called the semaphore, which protects data
from being simultaneously modified by more than one processor.
In other words, a processor can lock out other processors from accessing
shared data when the processor is in a critical section, a part of program in
which no more than a fixed number of processors is allowed to execute. In
the case of test-and-set, only one processor can enter the critical section.
Figure 11-16 illustrates a test-and-set synchronization procedure that uses
a semaphore; when the semaphore is set to 0, the shared data is unlocked,
and when the semaphore is set to 1, the shared data is locked.
1. Load semaphore
No
2. Unlocked?
(=0?)
Yes
3. Try locking
semaphore
No
4. Successful?
Yes
6. Unlock semaphore
Continue processing
Counter
Another common synchronization technique uses a counter. A counter is a
designated memory location that can be incremented or decremented.
In the test-and-set method, only one processor at a time is permitted to
enter the critical section. Using a counter, up to N processors are allowed
to concurrently execute the critical section. All processors after the Nth
processor must wait until one of the N processors exits the critical section
and a space becomes available.
The counter works by not allowing more than one processor to modify it
at any given time. Conceptually, the counter can be viewed as a variable
that counts the number of limited resources (for example, the number of
processes, or software licenses, etc.). Figure 11-17 shows this process.
Load counter
Try decrementing
counter
Try incrementing
counter
No
Successful?
No
Successful?
Yes
Yes
Continue processing
LL and SC
MIPS instructions Load Linked (LL) and Store Conditional (SC) provide
support for processor synchronization. These two instructions work very
much like their simpler counterparts, load and store. The LL instruction,
in addition to doing a simple load, has the side effect of setting a bit called
the link bit. This link bit forms a breakable link between the LL instruction
and the subsequent SC instruction. The SC performs a simple store if the
link bit is set when the store executes. If the link bit is not set, then the store
fails to execute. The success or failure of the SC is indicated in the target
register of the store.
The link is broken in the following circumstances:†
• if any external request (invalidate, snoop, or intervention)
changes the state of the line containing the lock variable to
invalid
• upon completion of an ERET (return from exception)
instruction
• an external update to the cache line containing the lock
variable
The most important features of LL and SC are:
• They provide a mechanism for generating all of the common
synchronization primitives including test-and-set, counters,
sequencers, etc., with no additional overhead.
• When they operate, bus traffic is generated only if the state of
the cache line changes; lock words stay in the cache until some
other processor takes ownership of that cache line.
† The most obvious case where the link is broken occurs when an invalidate to the cache line
is the subject of the load. In this case, some other processor has successfully completed a
store to that line.
No ORI r3,r2,1
Unlocked? BEQ r3,r2,Loop
(=0?) NOP
Yes
No
Counter > 0? BLEZ r2,Loop1
NOP
Yes
No
Successful? BEQ r3,0,Loop1
(r3=0?) NOP
Yes
.
.
Execute critical section
.
.
Load counter Loop2: LL r2,(r1)
No
Successful? BEQ r3,0,Loop2
NOP
Yes
Continue processing
12
12.1 Terminology
The following terms are used in this chapter:
• An external agent is any logic device connected to the processor,
over the System interface, that allows the processor to issue
requests.
• A system event is an event that occurs within the processor and
requires access to external system resources.
• Sequence refers to the precise series of requests that a processor
generates to service a system event.
• Protocol refers to the cycle-by-cycle signal transitions that occur
on the System interface pins to assert a processor or external
request.
• Syntax refers to the precise definition of bit patterns on
encoded buses, such as the command bus.
Interface Buses
Figure 12-1 shows the primary communication paths for the System
interface: a 64-bit address and data bus, SysAD(63:0), and a 9-bit
command bus, SysCmd(8:0). These SysAD and the SysCmd buses are
bidirectional; that is, they are driven by the processor to issue a processor
request, and by the external agent to issue an external request (see
Processor and External Requests, in this chapter, for more information).
A request through the System interface consists of:
• an address
• a System interface command that specifies the precise nature of
the request
• a series of data elements if the request is for a write, read
response, or update.
SysAD(63:0)
SysCmd(8:0)
Issue Cycles
There are two types of processor issue cycles:
• processor read, invalidate, and update request issue cycles
• processor write request issue cycles.
The processor samples the signal RdRdy* to determine the issue cycle for
a processor read, invalidate, or update request; the processor samples the
signal WrRdy* to determine the issue cycle of a processor write request.
As shown in Figure 12-2, RdRdy* must be asserted two cycles prior to the
address cycle of the processor read/invalidate/update request to define
the address cycle as the issue cycle.
SCycle 1 2 3 4 5 6
SClock
RdRdy*
Figure 12-2 State of RdRdy* Signal for Read, Invalidate, or Update Requests
As shown in Figure 12-3, WrRdy* must be asserted two cycles prior to the
first address cycle of the processor write request to define the address
cycle as the issue cycle.
SCycle 1 2 3 4 5 6
SClock
WrRdy*
The processor repeats the address cycle for the request until the conditions
for a valid issue cycle are met. After the issue cycle, if the processor
request requires data to be sent, the data transmission begins. There is
only one issue cycle for any processor request.
The processor accepts external requests, even while attempting to issue a
processor request, by releasing the System interface to slave state in
response to an assertion of ExtRqst* by the external agent.
Note that the rules governing the issue cycle of a processor request are
strictly applied to determine the action the processor takes. The processor
either:
• completes the issuance of the processor request in its entirety
before the external request is accepted, or
• releases the System interface to slave state without completing
the issuance of the processor request.
In the latter case, the processor issues the processor request (provided the
processor request is still necessary) after the external request is complete.
The rules governing an issue cycle again apply to the processor request.
Handshake Signals
The processor manages the flow of requests through the following eight
control signals:
• RdRdy*, WrRdy* are used by the external agent to indicate
when it can accept a new read (RdRdy*) or write (WrRdy*)
transaction.
• ExtRqst*, Release* are used to transfer control of the SysAD
and SysCmd buses. ExtRqst* is used by an external agent to
indicate a need to control the interface. Release* is asserted by
the processor when it transfers the mastership of the System
interface to the external agent.
• The R4000 processor uses ValidOut* and the external agent
uses ValidIn* to indicate valid command/data on the
SysCmd/SysAD buses.
• IvdAck*, IvdErr* are used in multiprocessor systems; they are
asserted by the external agent to indicate the successful
completion (IvdAck*) or the unsuccessful completion (IvdErr*)
of a pending processor invalidate or update request.†
† When using the R4000SC processor, IvdAck* and IvdErr* must be connected to Vcc.
R4000
Output data
Input data
SClock
† SClock is an internal clock used by the processor to sample data at the System interface
and to clock data into the processor System interface output registers; see Chapter 10 for
more details.
External Arbitration
The System interface must be in slave state for the external agent to issue
an external request through the System interface. The transition from
master state to slave state is arbitrated by the processor using the System
interface handshake signals ExtRqst* and Release*. This transition is
described by the following procedure:
1. An external agent signals that it wishes to issue an external request by
asserting ExtRqst*.
2. When the processor is ready to accept an external request, it releases
the System interface from master to slave state by asserting Release*
for one cycle.
3. The System interface returns to master state as soon as the issue of the
external request is complete.
This process is described in External Arbitration Protocol, later in this
chapter.
Processor Requests
• Read
• Write External Requests
• Null write • Read
• Invalidate • Write
• Update • Null
• Invalidate
• Update
• Snoop
• Intervention
System Events
• Load Miss
• Store Miss
• Store Hit
• Uncached Load/Store
• CACHE operations
SCycle 1 2 3 4 5 6 7 8 9 10
SClock
Cycles 1 2 3 4
SysAD Bus Addr Data Unused Unused Addr Data
Write #1 Write #2
WrRdy*
Processor Requests
A processor request is a request or a series of requests, through the System
interface, to access some external resource. As shown in Figure 12-7,
processor requests include read, write, null write, invalidate, and update.
This section also describes clusters.
Processor Requests
• Read
• Write
• Null write
• Invalidate
• Update
System bus
1. Processor issues
invalidate request 2. Invalidate arrives from
the system
3. External invalidate with
cancellation sent to processor
System bus
5. IvdAck* or IvdErr*
Clusters
A cluster consists of a single processor read request, followed by one or
two additional processor requests that are issued while the initial read
request is pending.
The processor supports three types of clusters:
• a processor read request, followed by a write request
• a processor read request, followed by potential update request
• a processor read request, followed by a potential update
request, followed by a write request.
In secondary-cache mode, the processor issues individual requests (as in
no-secondary-cache mode), or cluster requests. All requests in the cluster
must be accepted before the response to the read request that began the
cluster can be returned to the processor.
Potential update requests within a cluster can be disabled through the
boot-time mode control interface.
External Requests
External requests include read, write, invalidate, update, snoop,
intervention, and null requests, as shown in Figure 12-11. External
invalidate, update, snoop and intervention requests, as a group, are
referred to as external coherence requests. This section also includes a
description of read response, a special case of an external request.
External Requests
• Read
• Write
• Null
• Invalidate
• Update
• Snoop
• Intervention
Read request asks for a word of data from the processor’s internal resource.
Write request provides a word of data to be written to the processor’s
internal resource.
Invalidate request specifies a cache line, in the primary and secondary
caches of the processor, that must be marked invalid.
Update request provides a doubleword, partial doubleword, word, or
partial word of data to be written to the processor’s primary and
secondary caches.
Snoop request checks the processor’s secondary cache to see if a valid copy
of a particular cache line exists. If a valid copy exists, the processor returns
the state of the cache line at the specified physical address in the secondary
cache, and can modify the state of the cache line.
Intervention request requires the processor to return the state of the
secondary cache line at the specified physical address. Under certain
conditions related to the state of the cache line and the nature of the
intervention request, the contents of the primary and secondary cache line
can be returned. The state of the line can also be modified by this request.
Read Response
A read response returns data in response to a processor read request, as
shown in Figure 12-13. While a read response is technically an external
request, it has one characteristic that differentiates it from all other
external requests—it does not perform System interface arbitration. For
this reason, read responses are handled separately from all other external
requests, and are simply called read responses.
1. Read request
2. Read response
Load Miss
When a processor load misses in both the primary and secondary caches,
before the processor can proceed it must obtain the cache line that contains
the data element to be loaded from the external agent.
If the new cache line replaces a current dirty exclusive or dirty shared
cache line, the current cache line must be written back before the new line
can be loaded in the primary and secondary caches.
The processor examines the coherency attribute (cache coherency
attributes are described in Chapter 11) in the TLB entry for the page that
contains the requested cache line, and executes one of the following
requests:
• If the coherency attribute is exclusive, the processor issues a
coherent read request that also requests exclusivity.
• If the coherency attribute is sharable or update, the processor
issues a coherent read request.
• If the coherency attribute is noncoherent, the processor issues a
noncoherent read request.
Table 12-3 shows the actions taken on a load miss to primary and
secondary caches.
Secondary-Cache Mode
In secondary-cache mode, if the current cache line does not have to be
written back and the coherency attribute for the page that contains the
requested cache line is not exclusive, the processor issues a coherent block
read request for the cache line that contains the data element to be loaded.
If the current cache line needs to be written back and the coherency
attribute for the requested cache line is sharable or update, the processor
issues a cluster. The cluster consists of a coherent block read-with-write-
forthcoming request for the cache line that contains the data element to be
loaded, followed by a block write request for the current cache line.
If the current cache needs to be written back and the coherency attribute
for the page containing the requested cache line is exclusive, the processor
issues a cluster consisting of an exclusive read-with-write-forthcoming
request, followed by a write request for the current cache line.
Table 12-3 lists these actions.
No-Secondary-Cache Mode
In no-secondary-cache mode, if the cache line must be written back on a
load miss, the read request is issued and completed before the write
request is handled. The processor takes the following steps:
1. The processor issues a noncoherent read request† for the cache line
that contains the data element to be loaded.
2. The processor then waits for an external agent to provide the read
response.
If the current cache line must be written back, the processor issues a write
request to save the dirty cache line in memory.
Store Miss
When a processor store misses in both the primary and secondary caches,
the processor must obtain, from the external agent, the cache line that
contains the target location of the store. The processor examines the
coherency attribute in the TLB entry for the page (TLB page coherency
attributes are listed in Chapter 4) that contains the requested cache line to
see if the cache line is being maintained with either a write invalidate or a
write update cache coherency protocol.
The processor then executes one of the following requests:
• If the coherency attribute is either sharable or exclusive, a write
invalidate protocol is in effect, and a coherent block read that
requests exclusivity is issued.
• If the coherency attribute is update, a write update protocol is in
effect and a coherent block read request is issued.
• If the coherency attribute is noncoherent, a noncoherent block
read request is issued.
Table 12-4 shows the actions taken on a store miss to primary and
secondary caches.
Secondary-Cache Mode
In secondary-cache mode, if the new cache line replaces a current cache
line that is in either the dirty exclusive or dirty shared state, the current
cache line must be written back before the new line can be loaded in the
primary and secondary caches. The processor requests issued are a
function of the page attributes listed below.
If the current cache line does not need to be written back, the coherency
attribute for the page that contains the requested cache line is update, and
potential updates are enabled, the processor issues a cluster consisting of
a read request, followed by a potential update request.
In an update protocol, the cache line requested by a processor coherent
read request can be returned in a shared state; the processor then has to
issue an update request before it can complete a store instruction. A
potential update issued with a read request in a cluster allows the external
agent to anticipate the read response on the system bus. If the read
response is in a shared state, the required update is quickly transmitted to
the rest of the system. This provides the processor with the acknowledge
and allows the processor to complete the store instruction as rapidly as
possible.
Without the potential update request, the response data must be returned
to the processor. If the line is returned in the shared or dirty shared state,
the processor issues an update request, which must then be forwarded to
the system bus before an acknowledge can be returned to the processor.
Note that potential updates behave as if they have not yet been issued by
the processor. Potential updates are not subject to cancellation, and do not
require an acknowledge. When a potential update is nullified, the
processor behaves as if no update request was ever issued; when a
potential update becomes compulsory, the processor behaves as if it had
issued an update request at that instant.
Compulsory Update: If the processor issues a cluster that contains a
potential update, and the response data for the read request is
returned with an indication that it must be placed in the cache in either
a shared or dirty shared state, the potential update then becomes
compulsory. Once a potential update becomes compulsory, it is
subject to cancellation, and the processor requires an acknowledge for
the update request. The external agent must forward the update to the
system, then signal the acknowledge to the processor when the update
is complete. The processor will not complete the store until it has
received an acknowledge for the update request.
No-Secondary-Cache Mode
The processor issues a read request for the cache line that contains the data
element to be loaded, then awaits the external agent to provide read data
in response to the read request. Then, if the current cache line must be
written back, the processor issues a write request for the current cache line.
In no-secondary-cache mode, if the new cache line replaces a current cache
line whose Write back (W) bit is set, the current cache line moves to an
internal write buffer before the new cache line is loaded in the primary
cache.
Store Hit
This section describes store hits in both secondary-cache and no-
secondary-cache mode.
Secondary-Cache Mode
When the processor hits in the secondary cache, on a line that is marked
either shared or dirty shared, the processor must issue an update or
invalidate request and then wait to receive an acknowledge, before the
store is complete. The processor checks the coherency attribute in the TLB
for the page containing the cache line that is target of the store, to
determine if the cache line is managed by either a write invalidate or write
update cache coherency protocol.
• If the coherency attribute is sharable or exclusive, a write
invalidate protocol is in effect, and the processor issues an
invalidate request. The processor cannot complete the store
until the external agent signals an acknowledge for this
invalidate request.
• If the coherency attribute is update, a write update protocol is
in effect, and the processor issues an update request. The
processor cannot complete the store until the external agent
signals an acknowledge for this update request.
No-Secondary-Cache Mode
In no-secondary-cache mode, all lines are set to the dirty exclusive state.
This means store hits cause no bus transactions.
CACHE Operations
The processor provides a variety of CACHE operations to maintain the
state and contents of the primary and secondary caches. During the
execution of the CACHE operation instructions, the processor can issue
either write requests or invalidate requests.
location is replaced by the instruction line containing the code. The link
address is kept in a register separate from the cache, and remains active as
long as the link bit, set by the Load Linked instruction, is set.
The link bit, which is set by the load linked instruction, is cleared by a
change of cache state for the line containing the link address, or by a
Return From Exception.
In order for the Load Linked Store Conditional instruction sequence to
work correctly, all coherency traffic targeting the link address must be
visible to the processor, and the cache line containing the link location
must remain in a shared state in every cache in the system. This
guarantees that a Store Conditional executed by some other processor is
visible to the processor as a coherence request, changing the state of the
cache line containing the link location.
To accomplish this, a read request issued by the processor, causing the
cache line containing the link location to be replaced. In the mean time,
the link address retained bit is set, indicating the link address is being
retained. This informs the external agent that, although the processor has
replaced this cache line, the processor must still see any coherence traffic
that targets this cache line.
Any snoop or intervention request that targets a cache line which is not
present in the cache—but for which the snoop or intervention address
matches the current link address while the link bit is set—returns an
indication that the cache line is present in the cache in a shared state. This
is consistent with the coherency model, since the processor never returns
data, in response to an intervention request, for a cache line that is in the
shared state. The shared response guarantees that the cache line
containing the link location remains in a shared state in all other
processor’s caches, and therefore that any other processor attempting a
store conditional to this link location must issue a coherence request in
order to complete the store conditional.
For more information, refer to Chapter 11, or see the specific Load Linked
and Store Conditional instructions described in Appendix A.
NOTE: The external agent must not assert the signal ExtRqst* for the
purposes of returning a read response, but rather must wait for the un-
compelled change to slave state. The signal ExtRqst* can be asserted
before or during a read response to perform an external request other
than a read response.
5. The processor releases the SysCmd and the SysAD buses one SCycle
after the assertion of Release*.
6. The external agent drives the SysCmd and the SysAD buses within
two cycles after the assertion of Release*.
Once in slave state (starting at cycle 5 in Figure 12-17), the external agent
can return the requested data through a read response. The read response
can return the requested data or, if the requested data could not be
successfully retrieved, an indication that the returned data is erroneous. If
the returned data is erroneous, the processor takes a bus error exception.
Figure 12-17 illustrates a processor read request, coupled with an
uncompelled change to slave state, that occurs as the read request is
issued. Figure 12-18 illustrates a processor read request, and the
subsequent uncompelled change to slave state, that occurs sometime after
the read request is issued.
NOTE: Timings for the SysADC and SysCmdP buses are the same as
those of the SysAD and SysCmd buses, respectively.
Master Slave
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut* 3
2
ValidIn*
RdRdy* 1
WrRdy*
4
Release*
Master Slave
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut* 3
2
ValidIn*
RdRdy* 1
WrRdy*
Release* 4
Figure 12-18 Processor Read Request Protocol, Change to Slave State Delayed
Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut* 2 4
1
ValidIn*
3
RdRdy*
WrRdy*
Release*
Figure 12-19 Processor Noncoherent Single Word Write Request Protocol
† Called word to distinguish it from block request protocol. Data transferred can actually be
doubleword, partial doubleword, word, or partial word.
Processor block write requests are issued with the System interface in
master state, as described below; a processor coherent block request for
eight words of data is illustrated in Figures 12-20 and 12-21.
1. The processor issues a write command on the SysCmd bus and a write
address on the SysAD bus.
2. The processor asserts ValidOut*.
3. The processor drives a data identifier on the SysCmd bus and data on
the SysAD bus.
4. The processor asserts ValidOut* for a number of cycles sufficient to
transmit the block of data.
5. The data identifier associated with the last data cycle must contain a
last data cycle indication.
NOTE: As shown in Figure 12-21, however, the first data cycle does
not have to immediately follow the address cycle.
Figures 12-20 and 12-21 illustrate a processor coherent block request for
eight words of data.
Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
WrRdy*
Release*
Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
6
SysAD Bus Addr Data0 Data1 Data2 Data3
RdRdy*
WrRdy*
Release*
Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut*
ValidIn*
RdRdy*
WrRdy*
Release*
Figure 12-22 Processor Null Write Request Protocol
Master Slave
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
SysAD Bus Addr Addr Data0 Addr Data0 Data1 Data2 Data3
SysCmd Bus RwWF Upd CEOD Write CData CData CData CEOD
ValidOut*
1
ValidIn* 2 3
4
RdRdy*
WrRdy*
Release*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut*
ValidIn* 4
RdRdy*
WrRdy* 2
Release*
Figure 12-24 Two Processor Write Requests, Second Write Delayed for the Assertion of WrRdy*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
SysAD Bus Addr Addr Data0 Addr Data0 Data1 Data2 Data3
SysCmd Bus Read Upd CEOD Write CData CData CData CEOD
3
ValidOut*
ValidIn*
RdRdy* 1
WrRdy*
Release*
Figure 12-25 Processor Read Request within a Cluster Delayed for the Assertion of RdRdy*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
SysAD Bus Addr Addr Data0 Addr Data0 Data1 Data2 Data3
SysCmd Bus Read Upd CEOD Write CData CData CData CEOD
ValidOut*
ValidIn* 4
RdRdy*
WrRdy* 2
Release*
Figure 12-26 Processor Write Request within a Cluster Delayed for the Assertion of WrRdy*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut*
ValidIn* 4
RdRdy*
WrRdy* 2
ExtRqst*
Release*
Figure 12-27 Processor Write Request Delayed for the Assertion of WrRdy* and the Completion
of an External Invalidate Request
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidIn*
ExtRqst* 1 5
2
Release*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut* 4 5
ValidIn*
ExtRqst* 1
Release* 2
NOTE: The processor does not contain any resources that are
readable by an external read request; in response to an external read
request the processor returns undefined data and a data identifier
with its Erroneous Data bit, SysCmd(5), set.
Any time the processor releases the System interface to slave state to
accept an external request, it also allows the external agent to use the
secondary cache, in anticipation of a cache coherence request. When the
external agent uses the SysAD bus for a transfer unrelated to the processor
(for example, a DMA transfer), this ownership of the secondary cache
prevents the processor from satisfying subsequent primary cache misses.
To satisfy such a primary cache miss, the external agent issues a secondary
cache release external null request, returning ownership of the secondary
cache to the processor.
External null requests require no action from the processor other than to
return the System interface to master state, or to regain ownership of the
secondary cache.
Figures 12-30 and 12-31 show timing diagrams of the two external null
request cycles, which consist of the following steps:
1. The external agent asserts ExtRqst* to arbitrate for the System
interface.
2. The processor releases the System interface to slave state by asserting
Release*.
3. The external agent drives a secondary cache release external null
request command on the SysCmd bus, and asserts ValidIn* for one
cycle to return the secondary cache interface ownership to the
processor.
4. The SysAD bus is unused (does not contain valid data) during the
address cycle associated with an external null request.
5. After the address cycle is issued, the null request is complete.
For a secondary cache release external null request, the System interface
remains in slave state.
For a System interface release external null request, the external agent releases
the SysCmd and SysAD buses, and expects the System interface to return
to master state.
Master Slave
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
4
SysAD Bus Unsd
ValidOut* 3
5
ValidIn*
ExtRqst* 1
Release* 2
Slave Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
4
SysAD Bus Unsd
5
SysCmd Bus SINull
ValidOut* 3
ValidIn*
ExtRqst*
Release*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidIn* 4
ExtRqst* 1
Release* 2
Figure 12-32 External Write Request, with System Interface initially a Bus Master
Slave
Master Slave Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut*
ValidIn*
ExtRqst*
Release*
Figure 12-33 External Invalidate Request following an Uncompelled Change to Slave State
† If the cache line that is the target of the intervention request is not present in the cache—
that is, the tag comparison for the cache line at the target cache address fails—the cache
line that is the target of the intervention request is considered to be in the invalid state.
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
5
SysAD Bus Addr Unsd
ExtRqst* 1
Release* 2
Figure 12-34 External Intervention Request, Shared Line, System Interface in Master State
The case in which the processor returns cache line contents is described in
the steps below. In this example, the system is already in slave state.
1. The external intervention request is driven onto the SysCmd bus and
the address onto the SysAD bus. ValidIn* is asserted for one cycle.
2. The processor drives data on the SysAD bus and a data identifier on
the SysCmd bus. The processor asserts ValidOut* for each data cycle.
3. The data identifier associated with the last data cycle must contain a
last data cycle indicator.
Slave
Slave Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
1
SysAD Bus Addr Data0 Data1 Data2 Data3
ValidIn*
ExtRqst*
Release*
Figure 12-35 External Intervention Request, Dirty Exclusive Line, System Interface in Slave State
The processor returns the contents of a cache line, along with an indication
of the cache state in which it was found, by issuing a sequence of data
cycles sufficient to transmit the contents of the cache line, as shown in
Figure 12-35. The data identifier transmitted with each data cycle
indicates the cache state in which the cache line was found, together with
an indication that this data is response data. The data identifier associated
with the last data cycle contains a last data cycle indication.
If the contents of a cache line are returned in response to an intervention
request, they are returned in subblock order starting with the doubleword
at the address supplied with the intervention request. Note, however, that
if the intervention address targets the doubleword at the beginning of the
block, subblock ordering is equivalent to sequential ordering.
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
5
SysAD Bus Addr Unsd
Snoop CEOD 6
SysCmd Bus 4
ValidOut* 3
ValidIn*
ExtRqst* 1
Release* 2
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
5
SysAD Bus Addr Unsd
6
SysCmd Bus Snoop CEOD
4
ValidOut* 3
ValidIn*
ExtRqst*
Release*
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut* 2 6
ValidIn*
ExtRqst*
1
Release*
Figure 12-38 Processor Word Read Request, followed by a Word Read Response
Slave Master
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut* 6
2 5
ValidIn*
ExtRqst*
Release*
Figure 12-39 Block Read Response, System Interface already in Slave State
Maximum Secondary
Maximum Data Rate Data Pattern
Cache Access
1 Double/1 SClock Cycle D 4 PCycles
2 Doubles/3 SClock Cycles DDx 6 PCycles
1 Double/2 SClock Cycles DDxx 8 PCycles
1 Double/2 SClock Cycles DxDx 8 PCycles
2 Doubles/5 SClock Cycles DDxxx 10 PCycles
1 Double/3 SClock Cycles DDxxxx 12 PCycles
1 Double/3 SClock Cycles DxxDxx 12 PCycles
1 Double/4 SClock Cycles DDxxxxxx 16 PCycles
1 Double/4 SClock Cycles DxxxDxxx 16 PCycles
In Tables 12-6 and 12-7, data patterns are specified using the letters D and
x; D indicates a data cycle and x indicates an unused cycle. Figure 12-40
shows a read response in which data is provided to the processor at a rate
of two doublewords every three cycles using the data pattern DDx.
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut*
ValidIn*
ExtRqst*
Release*
Figure 12-40 Read Response, Reduced Data Rate, System Interface in Slave State
SCycle 1 2 3 4 5 6 7 8 9 10 11 12
SClock
ValidOut*
ValidIn*
ExtRqst*
Release*
Table 12-7 shows the maximum transmit data rates for a given set of
secondary cache parameters, based on a PClock-to-SClock divisor of 2. To
find the maximum allowable secondary cache write cycle time and
secondary cache access time, multiply the maximum secondary cache
numbers for each pattern by:
(PClock_to_SClock_Divisor)/2
The minimum number for these parameters is always the minimum access
time supported by processor.
Release Latency
Release latency is generally defined as the number of cycles the processor
can wait to release the System interface to slave state for an external
request. When no processor requests are in progress, internal activity—
such as refilling the primary cache from the secondary cache—can cause
the processor to wait some number of cycles before releasing the System
interface. Release latency is therefore more specifically defined as the
number of cycles that occur between the assertion of ExtRqst* and the
assertion of Release*.
There are three categories of release latency:
• Category 1: when the external request signal is asserted two
cycles before the last cycle of a processor request, or two cycles
before the last cycle of the last request in a cluster.
• Category 2: when the external request signal is not asserted
during a processor request or cluster, or is asserted during the
last cycle of a processor request or cluster.
• Category 3: when the processor makes an uncompelled change
to slave state.
Table 12-9 summarizes the minimum and maximum release latencies for
requests that fall into categories 1, 2, 3a and 3b. Note that the maximum
and minimum cycle count values are subject to change.
8 7 5 4 0
SysCmd(7:5) Command
0 Read Request
1 Read-With-Write-Forthcoming Request
2 Write Request
3 Null Request
4 Invalidate Request
5 Update Request
6 Intervention Request
7 Snoop Request
SysCmd(4:0) are specific to each type of request and are defined in each of
the following sections.
Read Requests
Figure 12-43 shows the format of a SysCmd read request.
8 7 5 4 3 2 1 0
000
Read Request Specific
0 or
(see tables)
001
Tables 12-12 through 12-14 list the encodings of SysCmd(4:0) for read
requests.
Write Requests
Figure 12-44 shows the format of a SysCmd write request.
Table 12-15 lists the write attributes encoded in bits SysCmd(4:3). Table
12-16 lists the block write replacement attributes encoded in bits
SysCmd(2:0). Table 12-17 lists the write request bit encodings in
SysCmd(2:0).
8 7 5 4 3 2 1 0
Null Requests
Figure 12-45 shows the format of a SysCmd null request.
8 7 5 4 3 2 1 0
Invalidate Requests
Figure 12-46 shows the format for an invalidate request, and Table 12-20
lists the encodings of SysCmd(4:0) for an external invalidate request.
SysCmd(4:0) are reserved on a processor invalidate request.
8 7 5 4 3 2 0
Invalidate Request
0 100 Specific
(see table)
Update Requests
Figure 12-47 shows the format for a SysCmd update request.
8 7 5 4 3 2 0
SysCmd(4) Reserved
SysCmd(3) Update type
0 Compulsory
1 Potential
SysCmd(2:0) Update Data Size
0 1 byte valid (Byte)
1 2 bytes valid (Halfword)
2 3 bytes valid (Tribyte).
3 4 bytes valid (Word)
4 5 bytes valid (Quintibyte)
5 6 bytes valid (Sextibyte)
6 7 bytes valid (Septibyte)
7 8 bytes valid (Doubleword)
8 7 5 4 3 2 0
8 7 5 4 3 2 0
8 7 6 5 4 3 2 0
See
1 Last Resp Err Note Reserved Cache
Data Data Data below State
Coherent Data
Coherent data is defined as follows:
• data that is returned in response to a processor coherent block
read request
• data that is returned in response to an external intervention
request.
Noncoherent Data
Noncoherent data is defined as follows:
• data that is associated with processor block write requests and
processor doubleword, partial doubleword, word, or partial
word write requests
• data that is returned in response to a processor noncoherent
block read request or a processor doubleword, partial
doubleword, word, or partial word read request
• data that is associated with external update requests
• data that is associated with external write requests
• data that is returned in response to an external read request
• data that is associated with processor update requests.
Addressing Conventions
Addresses associated with doubleword, partial doubleword, word, or
partial word transactions and update requests, are aligned for the size of
the data element. The system uses the following address conventions:
• Addresses associated with block requests are aligned to
double-word boundaries; that is, the low-order 3 bits of
address are 0.
• Doubleword requests set the low-order 3 bits of address to 0.
• Word requests set the low-order 2 bits of address to 0.
• Halfword requests set the low-order bit of address to 0.
• Byte, tribyte, quintibyte, sextibyte, and septibyte requests use
the byte address.
13
† Other cache designs within this constraint are also acceptable. For example, a smaller
cache design can use 22 8-Kbyte-by-8-bit static RAMs; this design presents less load on the
address pins and control signals, and reduces the overall parts count.
24 22 21 19 18 0
CS PIdx Physical_Tag
3 3 19
Figure 13-1 SCTag Fields
The SCDCS* and SCTCS* signals disable reads or writes of either the data
array or tag array when the opposite array is being accessed. These signals
are useful for saving power on snoop and invalidate requests since access
to the data array is not necessary. These signals also write data from the
primary data cache to the secondary cache.
Read Cycles
There are two basic read cycles: 4-word read and 8-word read.
Each secondary cache read cycle begins by driving an address out on the
address pins. The output enable signal SCOE* is asserted at the same
time.
This section describes both 4-word and 8-word read cycles, including
timing diagrams.
PCycle 1 2 3 4 5 6
SCAddr(17:0) Address
tRd1Cyc
SCData(127:0)
SCTag(24:0)
SCDChk(15:0) Data
SCTChk(6:0)
SCOE*
tDis
SCAPar(2:0)
SCDCS*:
SCTCS*:
PCycle 1 2 3 4 5 6 7 8 9
SCAddr(17:1) Address
tRd1Cyc
SCAPar(2:0)
SCData(127:0)
SCTag(24:0)
SCDChk(15:0) Data Data
SCTChk(6:0)
SCOE*
tDis
SCDCS*
SCTCS*
Write Cycles
There are two basic write cycles: a 4-word write cycle and an 8-word write
cycle. The secondary cache write cycle begins with the assertion of an
address onto the address pins.
This section describes both 4-word and 8-word write cycles, including
timing diagrams.
PCycle 1 2 3 4
SCAddr(17:0) Address
SCData(63:0)/
SCDChk(7:0) or
SCData(127:64)/ Data
SCDChk(15:8)
SCTChk(6:0)/
SCTag(24:0) Data
SCData(127:64)/
SCDChk(15:8) or
SCData(63:0)/ Data
SCDChk(7:0)
tWrSUp
SCAPar(2:0)
SCWR*
tWr1Dly tWrRc
SCOE*
SCDCS*
SCTCS*
PCycle 1 2 3 4 5 6 7 8
SCAddr(17:1) Address
SCAddr(0) First_Address Second_Address
SCData(63:0)/
SCDChk(7:0) First_Data Second_Data
SCTag(24:0)/
SCTChk(6:0) First_Data Second_Data
First_Data_MS/DTag_Chk Second_Data_MS/DTag_Chk
SCDChk(15:8)
SCAPar(2:0)
SCWR*
tWrSUp tWrSUp
tWr1Dly tWr2Dly
tWrRc tWrRc
SCOE*
SCDCS*
SCTCS*
14
Integrated
Circuit
IC package pin
Boundary-scan cells
2 0
Instruction
Context is
register
saved
CPU
JTDI pin
0
Bypass
Context is
register
saved JTD0 pin
Boundary-
Context
scan is JTCK pin
saved
register
Instruction Register
The JTAG Instruction register includes three shift register-based cells; this
register is used to select the test to be performed and/or the test data
register to be accessed. As listed in Table 14-1, this encoding selects either
the Boundary-scan register or the Bypass register.
2 1 0
MSB LSB
Bypass Register
The Bypass register is 1 bit wide. When the TAP controller is in the Shift-
DR (Bypass) state, the data on the JTDI pin is shifted into the Bypass
register, and the Bypass register output shifts to the JTDO output pin.
In essence, the Bypass register is a short-circuit which allows bypassing of
board-level devices, in the serial boundary-scan chain, which are not
required for a specific test. The logical location of the Bypass register in the
boundary-scan chain is shown in Figure 14-4. Use of the Bypass register
speeds up access to boundary-scan registers in those ICs that remain
active in the board-level test datapath.
JTDI
Bypass
Board register
input JTDO
JTDO JTDI
Board JTDI JTDO
output
JTDO JTDI
JTDI JTDO
IC package Boundary-scan
register pad cell
Board
Boundary-Scan Register
The Boundary-scan register is a single, 319-bit-wide, shift register-based
path containing cells connected to all input and output pads on the R4000
processor. Figure 14-5 shows the three most-significant bits of the
Boundary-scan register; these three bits control the output enables on the
various bidirectional buses.
The most-significant bit, OE3 (bit 319), is the JTAG output enable bit for
the SysAD, SysADC, SysCmd, and SysCmdP buses. Output is enabled
when this bit is set to 1.
OE2 (bit 318) is the JTAG output enable for the SCData and SCDChk
buses. Output is enabled when this bit is set to 1.
OE1 (bit 317) is the JTAG output enable for the SCTag and SCTChk buses.
The remaining 316 bits correspond to 316 signal pads of the processor.
Output is enabled when this bit is set to 1.
At the end of this chapter, Table 14-2 lists the scan order of these 316 scan
bits, starting from JTDI and ending with JTDO.
JTCK
JTMS and JTDI sampled JTD0 sampled on
on rising edge of JTCK falling edge of JTCK
Data scanned in serially Data scanned out serially
2 0 2 0
Instruction
Context is Instruction
Context is
register
saved register
saved
CPU CPU
0 0 (MSB)
LSB
Bypass is
Context JTDI pin Bypass is JTD0 pin
Context
register
saved register
saved
JTMS pin
319 1 319 1
Boundary- Boundary-
Context is Context is
scan scan
saved saved
register register
Data on the JTDI and JTMS pins is sampled on the rising edge of the
JTCK input clock signal. Data on the JTDO pin changes on the falling
edge of the JTCK clock signal.
TAP Controller
The processor implements the 16-state TAP controller as defined in the
IEEE JTAG specification.
Controller Reset
The TAP controller state machine can be put into Reset state by one of the
following:
• deassertion of the VCCOk input resets the TAP controller
• keeping the JTMS input signal asserted through five
consecutive rising edges of JTCK input sends the TAP
controller state machine into its Reset state.
In either case, keeping JTMS asserted maintains the Reset state.
Controller States
The TAP controller has four states: Reset, Capture, Shift, and Update.
They can reflect either instructions (as in the Shift-IR state) or data (as in
the Capture-DR state).
• When the TAP controller is in the Reset state, the value 0x7 is
loaded into the parallel output latch, selecting the Bypass
register as default. The three most significant bits of the
Boundary-scan register are cleared to 0, disabling the outputs.
• When the TAP controller is in the Capture-IR state, the value
0x4 is loaded into the shift register stage.
• When the TAP controller is in the Capture-DR (Boundary-scan)
state, the data currently on the processor input and I/O pins is
latched into the Boundary-scan register. In this state, the
Boundary-scan register bits corresponding to output pins are
arbitrary and cannot be checked during the scan out process.
• When the TAP controller is in the Shift-IR state, data is loaded
serially into the shift register stage of the Instruction register
from the JTDI input pin, and the MSB of the Instruction
register’s shift register stage is shifted onto the JTDO pin.
†See the section titled Boundary-Scan Register earlier in this chapter, for a
description of the last three output enable bits, 319:317.
15
Interrupt register
SysAD(6:0) 0
Interrupt Value
1
6 5 4 3 2 1 0
2 See Figures 15-1,
15-2, and 15-3.
3
22 21 20 19 18 17 16 5
SysAD(22:16) 6
Write Enables
Figure 15-2 shows how the R4000SC and R4000MC interrupts are readable
through the Cause register.
• Bit 5 of the Interrupt register in the R4000SC and R4000MC is
multiplexed with the TimerInterrupt signal and the result is
directly readable as bit 15 of the Cause register.
• Bits 4:1 of the Interrupt register are directly readable as bits
14:11 of the Cause register.
• Bit 0 of the Interrupt register is latched into the internal register
by the rising edge of SClock, then ORed with the Int*(0) pin,
and the result is directly readable as bit 10 of the Cause register.
10
IP2
11
IP3
12
IP4
See Figure 15-5.
IP5
13
IP6
14
IP7
15
Timer Cause
Interrupt register(15:10)
TimerIntDis
(Internal OR gate
Int*(0) register)
multiplexer
SClock
The select line for the Timer Interrupt multiplexer is enabled by boot-
mode bit 19, TimerIntDis, as described in Chapter 9. The Timer Interrupt
input to the multiplexer is asserted when the Count register equals the
Compare register.
Figure 15-3 shows how the R4000PC interrupts are readable through the
Cause register. The interrupt bits, Int*(5:0), are latched into the internal
register by the rising edge of SClock.
• Bit 5 of the Interrupt register in the R4000PC is ORed with the
Int*(5) pin and then multiplexed with the TimerInterrupt
signal. This result is directly readable as bit 15 of the Cause
register.
• Bits 4:0 of the Interrupt register are bit-wise ORed with the
current value of the interrupt pins Int*[4:0] and the result is
directly readable as bits 14:10 of the Cause register.
10
IP2
11
IP3
See
12
IP4 Figure 15-5.
IP5
13
IP6
14
IP7
15
Cause
Timer register
Interrupt
(Internal OR gate
SClock 5 4 3 2 1 0 register)
multiplexer
Figure 15-4 shows the internal derivation of the NMI signal, for all
versions of the R4000 processor.
The NMI* pin is latched by the rising edge of SClock, however the NMI
exception occurs in response to the falling edge of the NMI* signal, and is
not level-sensitive.
Bit 6 of the Interrupt register is then ORed with the inverted value of NMI*
to form the nonmaskable interrupt.
(Internal
register) NMI
(Internal)
NMI*
Edge-
SClock triggered
Flip-flop Inverter OR gate
Status register
SR(0)
IE
Status register
SR(15:8)
IM0
IM1
IM2
IM3 8
IM4
IM5
IM6
IM7
1 1 R4000 Interrupt
IP0
IP1
IP2
8 AND
IP3 function
IP4
IP5
IP6 AND-OR
IP7 function
Cause register
(15:8)
16
The example above shows a single bit in Data(3:0) with a value of 1; this
bit is Data(1).
• In even parity, the parity bit is set to 1. This makes 2 (an even
number) the total number of bits with a value of 1.
• Odd parity makes the parity bit a 0 to keep the total number of
1-value bits an odd number—in the case shown above, the
single bit Data(1).
The example below shows odd and even parity bits for various data
values:
Data(3:0) Odd Parity Bit Even Parity Bit
0 1 1 0 1 0
0 0 0 0 1 0
1 1 1 1 1 0
1 1 0 1 0 1
Parity allows single-bit error detection, but it does not indicate which bit
is in error—for example, suppose an odd-parity value of 00011 arrives.
The last bit is the parity bit, and since odd parity demands an odd number
(1,3,5) of 1s, this data is in error: it has an even number of 1s. However it
is impossible to tell which bit is in error. To resolve this problem, SECDED
ECC was developed.
† The 64-bit data code is a modification of one of the 64-bit codes proposed by M. Y. Hsiao,
to include the ability to detect 3- and 4-bit errors within a nibble. The 25-bit tag code was
created using the patterns observed in the 64-bit data code.
† A nibble is defined here as any group of four bits located within the vertical rules of Figure
16-1.
‡ This makes it possible to decode the syndrome to find which data bit is in error, using 4-
input NAND gates, provided a pre-decode AND of bits 0-3 and bits 4-7 of the syndrome
is available. For the check bits, a full 8-bit decode of the syndrome is required.
System Interface
The processor generates correct check bits for doubleword, word, or
partial-word data transmitted to the System interface. As it checks for
data correctness, the processor passes data check bits from the secondary
cache, directly without changing the bits, to the System interface if the
interface is set to ECC mode. If the System interface is set to parity mode,
the processor indicates a secondary cache ECC error by corrupting the
state of the SysCmdP signal.
The processor does not check data received from the System interface for
external updates and external writes. By setting the SysCmd(4) bit in the
data identifier, it is possible to prevent the processor from checking read
response data from the System interface.
The processor does not check addresses received from the System
interface, but does generate correct check bits for addresses transmitted to
the System interface.
The processor does not contain a data corrector; instead, the processor
takes a cache error exception when it detects an error based on data check
bits. Software, in conjunction with an off-processor data corrector, is
responsible for correcting the data when SECDED code is employed.
Check Bit 43 52 70 61
6666 55 5555 55 5544 4444 4444 3333 3333 3322 2222 2222 1111 1111 11
Data Bit
3210 98 7654 32 1098 7654 3210 9876 5432 1098 7654 3210 9876 5432 10 9876 54 3210
Number of
1s in 3333 5511 3333 5511 3333 3333 3333 3333 3333 3333 3333 3333 3333 3333 5511 3333 5511 3333
syndrome*
NOTE: * This row indicates the number of 1s in the generated syndrome for each data
bit in error.
4. This even parity value, 0001 00112, is sent out over the bus as ECC
check bits, ECC(7:0).
The following example uses data with several 1-value bits: Data(63:0) =
0x0000 0000 0000 0043.
1. Expand the data to its binary equivalent in order to generate the
ECC check bits.
0x0000 0000 0000 0043 has 1s in the last byte only. The last byte
binary value is: 0x43 = 0100 00112.
column # 7654 3210
0x0043 = 0100 00112
Since only columns 0, 1, and 6 have 1s, they are the only columns
that can generate the even parity bits.
2. Using Figure 16-1, generate even parity for the ECC check codes
in columns 0, 1, and 6:
Column 0 ECC Column 1 ECC Column 6 ECC Parity (even)
0 0 0 0
0 0 0 0
0 1 0 1
1 0 0 1
0 0 1 1
0 0 1 1
1 1 0 0
1 1 1 1
3. This parity value, 0011 11002, is sent out over the ECC(7:0) check
bus.
System A System B
Data(63:0)
System B
Exclusive OR
System A
Data(63:0) ECC Checker Syndrome
Check Bit 0 12 34 56
222 22 11 11 1111 11
Data Bit
432 10 98 76 5432 1098 7654 3210
MSB 11 . 1. . 1. . . 1. . . ...1 1111 1. . . 1. . . 1. . .
13 1. . . . 1. . . 1. . . . 1. 1111 1111 .... . 1. .
ECC 10 . . 1. 1. . . ...1 1. . . .... 1111 . 1. . . . 1.
Code 10 . 1. . . 1. . . . 1. . 1. . 1. . . . 1. . 1111 ....
Bits 13 1. . . ...1 1. . . 1. . . . 1. . .... 1111 1111
11 . . 1. . . 1. . 1. . . 1. . . . 1. . . 1. . . 1. 1111
LSB 14 1111 11. . 11. . 11. . ...1 ...1 ...1 ...1
Number of
1s in 3331 3311 3311 3311 3333 3333 3333 3333
syndrome*
Figure 16-4 Check Matrix for the Tag ECC Code
NOTE: * This row indicates the number of 1s in the generated syndrome for each data
bit in error.
Table 16-1 Error Checking and Correcting Summary for Internal Transactions
Secondary Primary
Cache to Cache to Uncached Uncached
Bus
Primary Secondary Load Store
Cache Cache
Processor or Checked; Primary From Not
Secondary Cache Trap on Error Cache parity System Checked
Data checked; Trap Interface
on Error
Secondary Cache Checked; Generated NA NA
Data Check Bits Trap on Error
Secondary Cache Tag Checked; not NA NA NA
and Check Bits corrected in
Secondary
cache; Trap on
error
System Interface NA NA Generated Generated
Address/Command
and Check Bits:
Transmit
System Interface NA NA Not NA
Address/Command Checked;
and Check Bits: reported to
Receive the Fault*
pin
System Interface Data NA NA Checked From
Trap on Processor
error†
System Interface Data NA NA Checked; Generated
Check Bits Trap on
Error†
† If error level (ERL bit of the Status register) is 1, the error is reported to the Fault* pin.
Table 16-2 Error Checking and Correcting Summary for Internal Transactions
Secondary
Store to Secondary
Cache Cache Load
Bus Shared Cache Write
Instruction from System
Cache Line to System Interface
Interface
Check on From
Processor or
cache System Checked; Trap on
Secondary Cache NA
writeback; Interface Error
Data
Trap on Error unchanged
Check on From
Secondary Cache cache System Checked; Trap on
NA
Data Check Bits writeback; Interface Error
Trap on Error unchanged
Checked on
Checked;
read part of
corrected Checked; not
Secondary Cache Tag RMW†; correct
Secondary Generated corrected; Trap on
and Check Bits Secondary
cache tag*; Error
cache tag; Trap
Trap on Error
on Error
System Interface
Address, Command,
Generated Generated Generated Generated
and Check Bits:
Transmit
System Interface
Address, Command, Not
NA NA NA
and Check Bits: Checked
Receive
From Primary Checked;
From From Secondary
System Interface Data or Secondary Trap on
Processor Cache
Cache Error‡
From Secondary
Cache (SysCmdP
From Primary Checked;
System Interface Data signal corrupted
Generated or Secondary Trap on
Check Bits if System
Cache Error‡
interface set to
parity mode)
† Read-Modify-Write cycle
‡ If error level (ERL bit of the Status register) is 1, the error is reported to the Fault* pin.
* Only if the current CACHE op needs to modify and write back the tag.
Table 16-3 Error Checking and Correcting Summary for External Transactions
Table 16-4 Error Checking and Correcting Summary for External Transactions
SCMasterMd SIMasterMd
Mode
(Bit 42) (Bit 18)
Complete Master
0 0
(required for single-chip operation)
Complete Listener
1 1
(paired with Complete Master)
System Interface Master
1 0
(SIMaster)
Secondary Cache Master
0 1
(SCMaster, paired with SIMaster)
For a non-fault tolerant system, these bits must be set to 002. This is the
Complete Master mode.
In a fault tolerant system, there are two possible configurations using the
Master-Listener and Cross-Coupled modes described in Table 16-5. These
are referred to as lock-step configurations, and are described later in this
section.
† Fault* is a non-persistent signal which is synchronous with the System interface. Fault*
signal timing is determined by the PClock-to-SClock divisor from boot-time mode bit
settings.
Master-Listener Configuration
As shown in Figure 16-5, the Master-Listener lock step configuration pairs
a Complete Master (mode bits 42 and 18 = 002) with a Complete Listener
(mode bits 42 and 18 = 112). In this configuration, the Complete Listener
has disabled output drivers; otherwise, the two R4400 processors operate
identically, both receiving the same inputs. On all output cycles, the
Complete Listener compares data on the output and I/O buses with
expected data, and asserts the Fault* signal in the event of a
miscomparison.
R4400
Secondary cache bus
Complete
System Interface bus Master SCAddr
SysAD/ SCData/
SysCmd
SCTag
SysADC/
Data Chk/
SysCmdP
Tag Chk Secondary cache
External SysAD/
SysCmd SCData/
Agent SCTag
SysADC/ Data Chk/
SCAddr
SysCmdP Tag Chk
R4400
Fault* Complete Listener
=?
SysAD/
SysCmd =?
SCData/
=? SCTag
SysADC/
SysCmdP =?
=? Data Chk/
Tag Chk
Maintenance
Fault* Processor
R4400
SI Master
System Interface bus Secondary cache bus
SysAD/ =? SCData/
SysCmd SCTag
SysADC/
=? =? Data Chk/
SysCmdP
Tag Chk Secondary cache
SysAD/
External SysCmd Address =? SCData/
Agent SCTag
Fault* Maintenance
Processor
The signals that are connected in parallel and driven from the System
Interface Master (1 in Figure 16-6) include:
• SysAD(63:0)
• SysCmd(8:0)
• SCAPar(2:0)
Signals that are connected in parallel and driven from the Secondary
Cache Master (2 in Figure 16-6) include:
• SysADC(7:0)
• SysCmdP
• ValidOut*
• Release*
• SCAddr(17:1)
• SCAddr0(W:Z)
• SCOE*
• SCWr(W:Z)*
• SCData(127:0)
• SCDChk(15:0)
• SCTag(24:0)
• SCTChk(6:0)
• SCDCS*
• SCTCS*
It should be noted that the fault detection mechanism associated with the
Fault* pin does not cause any exceptions; the processor continues to run
normally regardless of the state of the Fault* signal. It is up to external
logic to handle an asserted Fault* signal.
Fault Detection
Fault detection of an output miscomparison occurs at the end of the bus
cycle (the length of the cycle is programmed at boot-mode time; see
Chapter 9). When the R4400 processor is in master state, outputs at the
System interface are checked at the end of every System interface cycle. At
the Secondary Cache interface, outputs are checked at the end of each read
or write cycle.
SCAPar(2:0) transition and check times are delayed from the rest of the
Secondary Cache interface by one PClock. SCAPar(2:0) transitions occur
one PClock after SCAddr transitions, or when the R4400 is changing from
a read cycle to a write cycle without an address change. SCAPar(2:0)
signals do not follow the timing of SCWr* signals, which are set separately
through the programming of the boot-time mode bits.
The R4400 processor has an internal fault detection latency of 4 PClocks
(clock cycles are described in Chapter 10), whereupon Fault* is
synchronized with the System interface. An output fault detected and
propagated through the R4400 processor internal fault logic in a prior
System interface cycle is reported in the current cycle.
In Complete Master mode, output fault reporting is disabled for the
Secondary Cache interface, but enabled for the following System interface
signals: SysCmd, SysCmdP, SysAD, SysADC, ValidOut*, and Release*.
Reset Operation
When the R4400 processor is a Complete Listener, SIMaster, or SCMaster,
an assertion of Reset* after the initial boot sequence is significant.
If Reset* is asserted a second time and subsequently deasserted, the R4400
processor changes to Forced Complete Master mode and drives all
outputs.
If Reset* is asserted and deasserted a third time, the R4400 processor
returns to its prior mode, as programmed by the boot-mode bits.
On any subsequent assertion and deassertion of Reset*, the processor
alternates between the two modes described above: the mode determined
by boot-time mode bits if the Master/Checker mode is Complete Listener,
SIMaster, or SCMaster, or Forced Complete Master mode.
In Forced Complete Master mode, the Fault* pin reports all output faults,
not just faults of the System interface as are reported in Complete Master
mode.
Fault History
Two internal fault history bits, Output Fault History and Input Fault
History, record output faults and certain input faults reported through the
Fault* pin. These bits are cleared with each deassertion of Reset*.
The two fault history bits are readable when Reset* is asserted, and the
Fault* pin changes from reporting live faults to indicating which fault
history bit was set when Reset* was deasserted in the previous cycle. The
ModeIn pin acts as selector; if ModeIn = 0, Fault* indicates the inverted
state of the Output fault history bit. If ModeIn = 1, Fault* indicates the
inverted state of the Input fault history bit.
The fault history bits can be reset (cleared) while the R4400 processor is
running by asserting 1 to the ModeIn pin. Consequently, ModeIn must
be held to 0 to maintain the status of the fault history bits. Table 16-6
presents this information in tabular form.
I-Type (Immediate)
31 26 25 21 20 16 15 0
op rs rt immediate
J-Type (Jump)
31 26 25 0
op target
R-Type (Register)
31 26 25 21 20 16 15 11 10 6 5 0
op rs rt rd shamt funct
Example #1:
Example #2:
(immediate15)16 || immediate15...0
Function Meaning
Uses the TLB to find the physical address given the virtual
AddressTranslation address. The function fails and an exception is taken if the
required translation is not present in the TLB.
Uses the cache and main memory to find the contents of
the word containing the specified physical address. The
low-order two bits of the address and the Access Type field
LoadMemory indicates which of each of the four bytes within the data
word need to be returned. If the cache is enabled for this
access, the entire word is returned and loaded into the
cache.
Uses the cache, write buffer, and main memory to store the
word or part of word specified as data in the word
containing the specified physical address. The low-order
StoreMemory
two bits of the address and the Access Type field indicates
which of each of the four bytes within the data word
should be stored.
As shown in Table A-3, the Access Type field indicates the size of the data
item to be loaded or stored. Regardless of access type or byte-numbering
order (endianness), the address specifies the byte which has the smallest
byte address in the addressed field. For a big-endian machine, this is the
leftmost byte and contains the sign for a 2’s complement number; for a
little-endian machine, this is the rightmost byte.
The bytes within the addressed doubleword which are used can be
determined directly from the access type and the three low-order bits of
the address.
Format:
ADD rd, rs, rt
Description:
The contents of general register rs and the contents of general register rt
are added to form the result. The result is placed into general register rd.
In 64-bit mode, the operands must be valid sign-extended, 32-bit values.
An overflow exception occurs if the carries out of bits 30 and 31 differ (2’s
complement overflow). The destination register rd is not modified when
an integer overflow exception occurs.
Operation:
32 T: GPR[rd] ←GPR[rs] + GPR[rt]
Exceptions:
Integer overflow exception
Format:
ADDI rt, rs, immediate
Description:
The 16-bit immediate is sign-extended and added to the contents of general
register rs to form the result. The result is placed into general register rt.
In 64-bit mode, the operand must be valid sign-extended, 32-bit values.
An overflow exception occurs if carries out of bits 30 and 31 differ (2’s
complement overflow). The destination register rt is not modified when
an integer overflow exception occurs.
Operation:
32 T: GPR [rt] ← GPR[rs] +(immediate15)16 || immediate15...0
Exceptions:
Integer overflow exception
Format:
ADDIU rt, rs, immediate
Description:
The 16-bit immediate is sign-extended and added to the contents of general
register rs to form the result. The result is placed into general register rt.
No integer overflow exception occurs under any circumstances. In 64-bit
mode, the operand must be valid sign-extended, 32-bit values.
The only difference between this instruction and the ADDI instruction is
that ADDIU never causes an overflow exception.
Operation:
Exceptions:
None
SPECIAL rs rt rd 0 ADDU
000000 00000 100001
6 5 5 5 5 6
Format:
ADDU rd, rs, rt
Description:
The contents of general register rs and the contents of general register rt
are added to form the result. The result is placed into general register rd.
No overflow exception occurs under any circumstances. In 64-bit mode,
the operands must be valid sign-extended, 32-bit values.
The only difference between this instruction and the ADD instruction is
that ADDU never causes an overflow exception.
Operation:
32 T: GPR[rd] ←GPR[rs] + GPR[rt]
Exceptions:
None
SPECIAL rs rt rd 0 AND
000000 00000 100100
6 5 5 5 5 6
Format:
AND rd, rs, rt
Description:
The contents of general register rs are combined with the contents of
general register rt in a bit-wise logical AND operation. The result is placed
into general register rd.
Operation:
Exceptions:
None
ANDI rs rt immediate
001100
6 5 5 16
Format:
ANDI rt, rs, immediate
Description:
The 16-bit immediate is zero-extended and combined with the contents of
general register rs in a bit-wise logical AND operation. The result is placed
into general register rt.
Operation:
Exceptions:
None
Format:
BCzF offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If coprocessor z’s condition signal (CpCond), as sampled
during the previous instruction, is false, then the program branches to the
target address with a delay of one instruction.
Because the condition line is sampled during the previous instruction,
there must be at least one instruction between this instruction and a
coprocessor instruction that changes the condition line.
Operation:
32 T–1: condition ← not COC[z]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
PC ← PC + target
endif
64 T–1: condition ← not COC[z]
T: target ← (offset15)46 || offset || 02
T+1: if condition then
PC ← PC + target
endif
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Exceptions:
Coprocessor unusable exception
BC0F 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC1F 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC2F 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0
Branch On Coprocessor z
BCzFL False Likely BCzFL
31 26 25 21 20 16 15 0
Format:
BCzFL offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the contents of coprocessor z’s condition line, as
sampled during the previous instruction, is false, the target address is
branched to with a delay of one instruction.
If the conditional branch is not taken, the instruction in the branch delay
slot is nullified.
Because the condition line is sampled during the previous instruction,
there must be at least one instruction between this instruction and a
coprocessor instruction that changes the condition line.
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Branch On Coprocessor z
BCzFL False Likely
(continued)
BCzFL
Operation:
32 T–1: condition ← not COC[z]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T–1: condition ← not COC[z]
T: target ← (offset15)46 || offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
Coprocessor unusable exception
BCzFL Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC0FL 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC1FL 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC2FL 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0
Opcode
BC sub-opcode Branch condition
Coprocessor Unit Number
Format:
BCzT offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the coprocessor z’s condition signal (CpCond) is true,
then the program branches to the target address, with a delay of one
instruction.
Because the condition line is sampled during the previous instruction,
there must be at least one instruction between this instruction and a
coprocessor instruction that changes the condition line.
Operation:
32 T–1: condition ← COC[z]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
PC ← PC + target
endif
64 T–1: condition ← COC[z]
T: target ← (offset15)46 || offset || 02
T+1: if condition then
PC ← PC + target
endif
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Exceptions:
Coprocessor unusable exception
BCzT Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC0T 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC1T 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC2T 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Format:
BCzTL offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the contents of coprocessor z’s condition line, as
sampled during the previous instruction, is true, the target address is
branched to with a delay of one instruction.
If the conditional branch is not taken, the instruction in the branch delay
slot is nullified.
Because the condition line is sampled during the previous instruction,
there must be at least one instruction between this instruction and a
coprocessor instruction that changes the condition line.
Operation:
32 T–1: condition ← COC[z]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
else PC ← PC + target
NullifyCurrentInstruction
endif
64 T–1: condition ← COC[z]
T: target ← (offset15)46|| offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Branch On Coprocessor z
BCzTL True Likely BCzTL
(continued)
Exceptions:
Coprocessor unusable exception
BCzTL Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC0TL 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC1TL 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1
Bit # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0
BC2TL 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1
BEQ rs rt offset
000100
6 5 5 16
Format:
BEQ rs, rt, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs and the contents of
general register rt are compared. If the two registers are equal, then the
program branches to the target address, with a delay of one instruction.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs] = GPR[rt])
T+1: if condition then
PC ← PC + target
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs] = GPR[rt])
T+1: if condition then
PC ← PC + target
endif
Exceptions:
None
BEQL rs rt offset
010100
6 5 5 16
Format:
BEQL rs, rt, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs and the contents of
general register rt are compared. If the two registers are equal, the target
address is branched to, with a delay of one instruction. If the conditional
branch is not taken, the instruction in the branch delay slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs] = GPR[rt])
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs] = GPR[rt])
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
Format:
BGEZ rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the contents of general register rs have the sign bit
cleared, then the program branches to the target address, with a delay of
one instruction.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 0)
T+1: if condition then
PC ← PC + target
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 0)
T+1: if condition then
PC ← PC + target
endif
Exceptions:
None
Format:
BGEZAL rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. Unconditionally, the address of the instruction after the
delay slot is placed in the link register, r31. If the contents of general
register rs have the sign bit cleared, then the program branches to the
target address, with a delay of one instruction.
General register rs may not be general register 31, because such an
instruction is not restartable. An attempt to execute this instruction is not
trapped, however.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 0)
GPR[31] ← PC + 8
T+1: if condition then
PC ← PC + target
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 0)
GPR[31] ← PC + 8
T+1: if condition then
PC ← PC + target
endif
Exceptions:
None
Branch On Greater
BGEZL Than Or Equal To Zero Likely BGEZL
31 26 25 21 20 16 15 0
Format:
BGEZL rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the contents of general register rs have the sign bit
cleared, then the program branches to the target address, with a delay of
one instruction. If the conditional branch is not taken, the instruction in
the branch delay slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 0)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 0)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
BGTZ rs 0 offset
000111 00000
6 5 5 16
Format:
BGTZ rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs are compared to zero. If
the contents of general register rs have the sign bit cleared and are not
equal to zero, then the program branches to the target address, with a
delay of one instruction.
Operation:
Exceptions:
None
Branch On Greater
BGTZL Than Zero Likely BGTZL
31 26 25 21 20 16 15 0
BGTZL rs 0 offset
010111 00000
6 5 5 16
Format:
BGTZL rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs are compared to zero. If
the contents of general register rs have the sign bit cleared and are not
equal to zero, then the program branches to the target address, with a
delay of one instruction. If the conditional branch is not taken, the
instruction in the branch delay slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 0) and (GPR[rs] ≠ 032)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 0) and (GPR[rs] ≠ 064)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
BLEZ rs 0 offset
000110 00000
6 5 5 16
Format:
BLEZ rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs are compared to zero. If
the contents of general register rs have the sign bit set, or are equal to zero,
then the program branches to the target address, with a delay of one
instruction.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 1) or (GPR[rs] = 032)
T+1: if condition then
PC ← PC + target
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 1) or (GPR[rs] = 064)
T+1: if condition then
PC ← PC + target
endif
Exceptions:
None
BLEZL rs 0 offset
010110 00000
6 5 5 16
Format:
BLEZL rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs is compared to zero. If
the contents of general register rs have the sign bit set, or are equal to zero,
then the program branches to the target address, with a delay of one
instruction.
If the conditional branch is not taken, the instruction in the branch delay
slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 1) or (GPR[rs] = 032)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 1) or (GPR[rs] = 064)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
Format:
BLTZ rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the contents of general register rs have the sign bit set,
then the program branches to the target address, with a delay of one
instruction.
Operation:
Exceptions:
None
Branch On Less
BLTZAL Than Zero And Link BLTZAL
31 26 25 21 20 16 15 0
Format:
BLTZAL rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. Unconditionally, the address of the instruction after the
delay slot is placed in the link register, r31. If the contents of general
register rs have the sign bit set, then the program branches to the target
address, with a delay of one instruction.
General register rs may not be general register 31, because such an
instruction is not restartable. An attempt to execute this instruction with
register 31 specified as rs is not trapped, however.
Operation:
Exceptions:
None
Branch On Less
BLTZALL Than Zero And Link Likely BLTZALL
31 26 25 21 20 16 15 0
Format:
BLTZALL rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. Unconditionally, the address of the instruction after the
delay slot is placed in the link register, r31. If the contents of general
register rs have the sign bit set, then the program branches to the target
address, with a delay of one instruction.
General register rs may not be general register 31, because such an
instruction is not restartable. An attempt to execute this instruction with
register 31 specified as rs is not trapped, however. If the conditional
branch is not taken, the instruction in the branch delay slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 1)
GPR[31] ← PC + 8
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 1)
GPR[31] ← PC + 8
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
Format:
BLTZ rs, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the contents of general register rs have the sign bit set,
then the program branches to the target address, with a delay of one
instruction. If the conditional branch is not taken, the instruction in the
branch delay slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs]31 = 1)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs]63 = 1)
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
BNE rs rt offset
000101
6 5 5 16
Format:
BNE rs, rt, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs and the contents of
general register rt are compared. If the two registers are not equal, then
the program branches to the target address, with a delay of one
instruction.
Operation:
Exceptions:
None
BNEL rs rt offset
010101
6 5 5 16
Format:
BNEL rs, rt, offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. The contents of general register rs and the contents of
general register rt are compared. If the two registers are not equal, then
the program branches to the target address, with a delay of one
instruction.
If the conditional branch is not taken, the instruction in the branch delay
slot is nullified.
Operation:
32 T: target ← (offset15)14 || offset || 02
condition ← (GPR[rs] ≠ GPR[rt])
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T: target ← (offset15)46 || offset || 02
condition ← (GPR[rs] ≠ GPR[rt])
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
None
Format:
BREAK
Description:
A breakpoint trap occurs, immediately and unconditionally transferring
control to the exception handler.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
32, 64 T: BreakpointException
Exceptions:
Breakpoint exception
Format:
CACHE op, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The virtual address is translated to
a physical address using the TLB, and the 5-bit sub-opcode specifies a
cache operation for that address.
If CP0 is not usable (User or Supervisor mode) the CP0 enable bit in the
Status register is clear, and a coprocessor unusable exception is taken. The
operation of this instruction on any operation/cache combination not
listed below, or on a secondary cache when none is present, is undefined.
The operation of this instruction on uncached addresses is also undefined.
The Index operation uses part of the virtual address to specify a cache
block.
For a primary cache of 2CACHEBITS bytes with 2LINEBITS bytes per tag,
vAddrCACHEBITS ... LINEBITS specifies the block.
For a secondary cache of 2CACHEBITS bytes with 2LINEBITS bytes per tag,
pAddrCACHEBITS ... LINEBITS specifies the block.
Index Load Tag also uses vAddrLINEBITS... 3 to select the doubleword for
reading ECC or parity. When the CE bit of the Status register is set, Hit
WriteBack, Hit WriteBack Invalidate, Index WriteBack Invalidate, and Fill
also use vAddrLINEBITS ... 3 to select the doubleword that has its ECC or
parity modified. This operation is performed unconditionally.
The Hit operation accesses the specified cache as normal data references,
and performs the specified operation if the cache block contains valid data
with the specified physical address (a hit). If the cache block is invalid or
contains a different address (a miss), no operation is performed.
Cache
CACHE (continued) CACHE
Write back from a primary cache goes to the secondary cache (if there is
one), otherwise to memory. Write back from a secondary cache always
goes to memory. A secondary write back always writes the most recent
data; the data comes from the primary data cache, if present, and modified
(the W bit is set). Otherwise the data comes from the specified secondary
cache. The address to be written is specified by the cache tag and not the
translated physical address.
TLB Refill and TLB Invalid exceptions can occur on any operation. For
Index operations (where the physical address is used to index the cache
but need not match the cache tag) unmapped addresses may be used to
avoid TLB exceptions. This operation never causes TLB Modified or
Virtual Coherency exceptions.
Bits 17...16 of the instruction specify the cache as follows:
Cache
CACHE (continued) CACHE
Bits 20...18 (this value is listed under the Code column) of the instruction
specify the operation as follows:
Cache
CACHE (continued) CACHE
Cache
CACHE (continued) CACHE
Code Caches Name Operation
If the cache block contains the specified address, write back the data (if
dirty), and mark the secondary cache block and all matching blocks in
both primary caches invalid. As usual with secondary writebacks,
Hit Writeback modified data in the primary data cache (matching block with the W bit
5 SD
Invalidate set) is used during the writeback. The PIdx field of the secondary tag is
used to determine the locations in the primaries to check for matching
primary blocks. The CH bit in the Status register is set or cleared to
indicate a hit or miss.
Fill the primary instruction cache block from secondary cache or memory.
If the CE bit of the Status register is set, the content of the ECC register
is used instead of the computed parity bits for addressed doubleword
5 I Fill when written to the instruction cache. For the R4000PC, the cache is
filled from memory. For the R4000SC and R4000MC, the cache is filled
from the secondary cache whether or not the secondary cache block is
valid or contains the specified address.
If the cache block contains the specified address, and the W bit is set,
write back the data. The W bit is not cleared; a subsequent miss to the
block will write it back again. This second writeback is redundant, but not
incorrect. When a secondary cache is present, and the CE bit of the
Status register is set, the content of the ECC register is XOR’d into the
6 D Hit Writeback
computed check bits during the write to the secondary cache for the
addressed doubleword. Note: The W bit is not cleared during this
operation due to an artifact of the implementation; the W bit is
implemented as part of the data side of the cache array so that it can be
written during a data write.
If the cache block contains the specified address, and the cache state is
Dirty Exclusive or Dirty Shared, data is written back to memory. The
cache state is unchanged; a subsequent miss to the block causes it to be
written back again. This second writeback is redundant, but not
incorrect. The CH bit in the Status register is set or cleared to indicate a
6 SD Hit Writeback
hit or miss. The writeback looks in the primary data cache for modified
data, but does not invalidate or clear the Writeback bit in the primary data
cache. Note: The state of the secondary block is not changed to clean
during this operation because the W bit of matching sub-blocks cannot
be cleared to put the primary block in a clean state.
If the cache block contains the specified address, data is written back
unconditionally. When a secondary cache is present, and the CE bit of
6 I Hit Writeback the Status register is set, the contents of the ECC register is XOR’d into
the computed check bits during the write to the secondary cache for the
addressed doubleword.
Cache
CACHE (continued) CACHE
Operation:
32, 64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
CacheOp (op, vAddr, pAddr)
Exceptions:
Coprocessor unusable exception
COPz CF rt rd 0
0 1 0 0 x x* 00010 00000
6 5 5 5 11
Format:
CFCz rt, rd
Description:
The contents of coprocessor control register rd of coprocessor unit z are
loaded into general register rt.
This instruction is not valid for CP0.
Operation:
32 T: data ← CCR[z,rd]
T+1: GPR[rt] ← data
64 T: data ← (CCR[z,rd]31)32 || CCR[z,rd]
T+1: GPR[rt] ← data
Exceptions:
Coprocessor unusable exception
COPz CO cofun
0 1 0 0 x x* 1
6 1 25
Format:
COPz cofun
Description:
A coprocessor operation is performed. The operation may specify and
reference internal coprocessor registers, and may change the state of the
coprocessor condition line, but does not modify state within the processor
or the cache/memory system. Details of coprocessor operations are
contained in Appendix B.
Operation:
Exceptions:
Coprocessor unusable exception
Coprocessor interrupt or Floating-Point Exception (R4000 CP1 only)
*Opcode Bit Encoding:
COPz Bit # 31 30 29 28 27 26 25 0
C0P0 0 1 0 0 0 0 1
Bit # 31 30 29 28 27 26 25 0
C0P1 0 1 0 0 0 1 1
Bit # 31 30 29 28 27 26 25 0
C0P2 0 1 0 0 1 0 1
COPz CT rt rd 0
0100xx* 00110 000 0000 0000
6 5 5 5 11
Format:
CTCz rt, rd
Description:
The contents of general register rt are loaded into control register rd of
coprocessor unit z.
This instruction is not valid for CP0.
Operation:
32,64 T: data ← GPR[rt]
T + 1: CCR[z,rd] ← data
Exceptions:
Coprocessor unusable
Format:
DADD rd, rs, rt
Description:
The contents of general register rs and the contents of general register rt
are added to form the result. The result is placed into general register rd.
An overflow exception occurs if the carries out of bits 62 and 63 differ (2’s
complement overflow). The destination register rd is not modified when
an integer overflow exception occurs.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
Integer overflow exception
Reserved instruction exception (R4000 in 32-bit mode)
Format:
DADDI rt, rs, immediate
Description:
The 16-bit immediate is sign-extended and added to the contents of general
register rs to form the result. The result is placed into general register rt.
An overflow exception occurs if carries out of bits 62 and 63 differ (2’s
complement overflow). The destination register rt is not modified when
an integer overflow exception occurs.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
Integer overflow exception
Reserved instruction exception (R4000 in 32-bit mode)
Doubleword Add
DADDIU Immediate Unsigned DADDIU
31 26 25 21 20 16 15 0
DADDIU rs rt immediate
011001
6 5 5 16
Format:
DADDIU rt, rs, immediate
Description:
The 16-bit immediate is sign-extended and added to the contents of general
register rs to form the result. The result is placed into general register rt.
No integer overflow exception occurs under any circumstances.
The only difference between this instruction and the DADDI instruction is
that DADDIU never causes an overflow exception.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt rd 0 DADDU
000000 00000 101101
6 5 5 5 5 6
Format:
DADDU rd, rs, rt
Description:
The contents of general register rs and the contents of general register rt
are added to form the result. The result is placed into general register rd.
No overflow exception occurs under any circumstances.
The only difference between this instruction and the DADD instruction is
that DADDU never causes an overflow exception.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt 0 DDIV
000000 00 0000 0000 011110
6 5 5 10 6
Format:
DDIV rs, rt
Description:
The contents of general register rs are divided by the contents of general
register rt, treating both operands as 2’s complement values. No overflow
exception occurs under any circumstances, and the result of this operation
is undefined when the divisor is zero.
This instruction is typically followed by additional instructions to check
for a zero divisor and for overflow.
When the operation completes, the quotient word of the double result is
loaded into special register LO, and the remainder word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
those instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by two or more instructions.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: LO ← GPR[rs] div GPR[rt]
HI ← GPR[rs] mod GPR[rt]
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt 0 DDIVU
000000 000000 0000 011111
6 5 5 10 6
Format:
DDIVU rs, rt
Description:
The contents of general register rs are divided by the contents of general
register rt, treating both operands as unsigned values. No integer
overflow exception occurs under any circumstances, and the result of this
operation is undefined when the divisor is zero.
This instruction is typically followed by additional instructions to check
for a zero divisor.
When the operation completes, the quotient word of the double result is
loaded into special register LO, and the remainder word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
those instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by two or more instructions.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: LO ← (0 || GPR[rs]) div (0 || GPR[rt])
HI ← (0 || GPR[rs]) mod (0 || GPR[rt])
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
Format:
DIV rs, rt
Description:
The contents of general register rs are divided by the contents of general
register rt, treating both operands as 2’s complement values. No overflow
exception occurs under any circumstances, and the result of this operation
is undefined when the divisor is zero.
In 64-bit mode, the operands must be valid sign-extended, 32-bit values.
This instruction is typically followed by additional instructions to check
for a zero divisor and for overflow.
When the operation completes, the quotient word of the double result is
loaded into special register LO, and the remainder word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
those instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by two or more instructions.
Divide
DIV (continued) DIV
Operation:
32 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: LO ← GPR[rs] div GPR[rt]
HI ← GPR[rs] mod GPR[rt]
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: q ← GPR[rs]31...0 div GPR[rt]31...0
r ← GPR[rs]31...0 mod GPR[rt]31...0
LO ← (q31)32 || q31...0
HI ← (r31)32 || r31...0
Exceptions:
None
Format:
DIVU rs, rt
Description:
The contents of general register rs are divided by the contents of general
register rt, treating both operands as unsigned values. No integer
overflow exception occurs under any circumstances, and the result of this
operation is undefined when the divisor is zero.
In 64-bit mode, the operands must be valid sign-extended, 32-bit values.
This instruction is typically followed by additional instructions to check
for a zero divisor.
When the operation completes, the quotient word of the double result is
loaded into special register LO, and the remainder word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
those instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by two or more instructions.
Divide Unsigned
DIVU (continued) DIVU
Operation:
32 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: LO ← (0 || GPR[rs]) div (0 || GPR[rt])
HI ← (0 || GPR[rs]) mod (0 || GPR[rt])
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: q ← (0 || GPR[rs]31...0) div (0 || GPR[rt]31...0)
r ← (0 || GPR[rs]31...0) mod (0 || GPR[rt]31...0)
LO ← (q31)32 || q31...0
HI ← (r31)32 || r31...0
Exceptions:
None
COP0 DMF rt rd 0
010000 00001 0 0 0 0 0 0 0 0 0 00
6 5 5 5 11
Format:
DMFC0 rt, rd
Description:
The contents of coprocessor register rd of the CP0 are loaded into general
register rt.
This operation is defined for the R4000 operating in 64-bit mode and in 32-
bit kernel mode. Execution of this instruction in 32-bit user or supervisor
mode causes a reserved instruction exception. All 64-bits of the general
register destination are written from the coprocessor register source. The
operation of DMFC0 on a 32-bit coprocessor 0 register is undefined.
Operation:
64 T: data ←CPR[0,rd]
T+1: GPR[rt] ← data
Exceptions:
Coprocessor unusable exception
Reserved instruction exception (R4000 in 32-bit user mode
R4000 in 32-bit supervisor mode)
Doubleword Move To
DMTC0 System Control Coprocessor DMTC0
31 26 25 21 20 16 15 11 10 0
COP0 DMT rt rd 0
010000 00101 0 0 0 0 0 0 0 0 0 00
6 5 5 5 11
Format:
DMTC0 rt, rd
Description:
The contents of general register rt are loaded into coprocessor register rd
of the CP0.
This operation is defined for the R4000 operating in 64-bit mode or in 32-
bit kernel mode. Execution of this instruction in 32-bit user or supervisor
mode causes a reserved instruction exception.
All 64-bits of the coprocessor 0 register are written from the general
register source. The operation of DMTC0 on a 32-bit coprocessor 0 register
is undefined.
Because the state of the virtual address translation system may be altered
by this instruction, the operation of load instructions, store instructions,
and TLB operations immediately prior to and after this instruction are
undefined.
Operation:
64 T: data ← GPR[rt]
T+1: CPR[0,rd] ← data
Exceptions:
Coprocessor unusable exception (R4000 in 32-bit user mode
R4000 in 32-bit supervisor mode)
SPECIAL rs rt 0 DMULT
000000 00 0000 0000 011100
6 5 5 10 6
Format:
DMULT rs, rt
Description:
The contents of general registers rs and rt are multiplied, treating both
operands as 2’s complement values. No integer overflow exception occurs
under any circumstances.
When the operation completes, the low-order word of the double result is
loaded into special register LO, and the high-order word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
these instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by a minimum of two other instructions.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: t ← GPR[rs] * GPR[rt]
LO ← t63...0
HI ← t127...64
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
Doubleword Multiply
DMULTU Unsigned DMULTU
31 26 25 21 20 16 15 6 5 0
SPECIAL rs rt 0 DMULTU
000000 00 0000 0000 011101
6 5 5 10 6
Format:
DMULTU rs, rt
Description:
The contents of general register rs and the contents of general register rt
are multiplied, treating both operands as unsigned values. No overflow
exception occurs under any circumstances.
When the operation completes, the low-order word of the double result is
loaded into special register LO, and the high-order word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
these instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by a minimum of two instructions.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: t ← (0 || GPR[rs]) * (0 || GPR[rt])
LO ← t63...0
HI ←t127...64
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL 0 rt rd sa DSLL
000000 00000 111000
6 5 5 5 5 6
Format:
DSLL rd, rt, sa
Description:
The contents of general register rt are shifted left by sa bits, inserting zeros
into the low-order bits. The result is placed in register rd.
Operation:
64 T: s ← 0 || sa
GPR[rd] ← GPR[rt](63–s)...0 || 0s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt rd 0 DSLLV
000000 00000 010100
6 5 5 5 5 6
Format:
DSLLV rd, rt, rs
Description:
The contents of general register rt are shifted left by the number of bits
specified by the low-order six bits contained in general register rs,
inserting zeros into the low-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← GPR[rs]5...0
GPR[rd]← GPR[rt](63–s)...0 || 0s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL 0 rt rd sa DSLL32
000000 00000 111100
6 5 5 5 5 6
Format:
DSLL32 rd, rt, sa
Description:
The contents of general register rt are shifted left by 32+sa bits, inserting
zeros into the low-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← 1 || sa
GPR[rd]← GPR[rt](63–s)...0 || 0s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
Doubleword
DSRA Shift Right Arithmetic DSRA
31 26 25 21 20 16 15 11 10 6 5 0
SPECIAL 0 rt rd sa DSRA
000000 00000 111011
6 5 5 5 5 6
Format:
DSRA rd, rt, sa
Description:
The contents of general register rt are shifted right by sa bits, sign-
extending the high-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← 0 || sa
GPR[rd] ← (GPR[rt]63)s || GPR[rt] 63...s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt rd 0 DSRAV
000000 00000 010111
6 5 5 5 5 6
Format:
DSRAV rd, rt, rs
Description:
The contents of general register rt are shifted right by the number of bits
specified by the low-order six bits of general register rs, sign-extending the
high-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← GPR[rs]5...0
GPR[rd] ← (GPR[rt]63)s || GPR[rt]63...s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL 0 rt rd sa DSRA32
000000 00000 111111
6 5 5 5 5 6
Format:
DSRA32 rd, rt, sa
Description:
The contents of general register rt are shifted right by 32+sa bits, sign-
extending the high-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ←1 || sa
GPR[rd] ← (GPR[rt]63)s || GPR[rt] 63...s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
Doubleword
DSRL Shift Right Logical
DSRL
31 26 25 21 20 16 15 11 10 6 5 0
SPECIAL 0 rt rd sa DSRL
000000 00000 111010
6 5 5 5 5 6
Format:
DSRL rd, rt, sa
Description:
The contents of general register rt are shifted right by sa bits, inserting
zeros into the high-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← 0 || sa
GPR[rd] ← 0s || GPR[rt]63...s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt rd 0 DSRLV
000000 00000 010110
6 5 5 5 5 6
Format:
DSRLV rd, rt, rs
Description:
The contents of general register rt are shifted right by the number of bits
specified by the low-order six bits of general register rs, inserting zeros
into the high-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← GPR[rs]5...0
GPR[rd] ← 0s || GPR[rt]63...s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL 0 rt rd sa DSRL32
000000 00000 111110
6 5 5 5 5 6
Format:
DSRL32 rd, rt, sa
Description:
The contents of general register rt are shifted right by 32+sa bits, inserting
zeros into the high-order bits. The result is placed in register rd.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: s ← 1 || sa
GPR[rd] ← 0s || GPR[rt]63...s
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt rd 0 DSUB
000000 00000 101110
6 5 5 5 5 6
Format:
DSUB rd, rs, rt
Description:
The contents of general register rt are subtracted from the contents of
general register rs to form a result. The result is placed into general
register rd.
The only difference between this instruction and the DSUBU instruction is
that DSUBU never traps on overflow.
An integer overflow exception takes place if the carries out of bits 62 and
63 differ (2’s complement overflow). The destination register rd is not
modified when an integer overflow exception occurs.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
Integer overflow exception
Reserved instruction exception (R4000 in 32-bit mode)
SPECIAL rs rt rd 0 DSUBU
000000 00000 101111
6 5 5 5 5 6
Format:
DSUBU rd, rs, rt
Description:
The contents of general register rt are subtracted from the contents of
general register rs to form a result. The result is placed into general
register rd.
The only difference between this instruction and the DSUB instruction is
that DSUBU never traps on overflow. No integer overflow exception
occurs under any circumstances.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
Reserved instruction exception (R4000 in 32-bit mode)
COP0 CO 0 ERET
010000 1 000 0000 0000 0000 0000 011000
6 1 19 6
Format:
ERET
Description:
ERET is the R4000 instruction for returning from an interrupt, exception,
or error trap. Unlike a branch or jump instruction, ERET does not execute
the next instruction.
ERET must not itself be placed in a branch delay slot.
If the processor is servicing an error trap (SR2 = 1), then load the PC from
the ErrorEPC and clear the ERL bit of the Status register (SR2). Otherwise
(SR2 = 0), load the PC from the EPC, and clear the EXL bit of the Status
register (SR1).
An ERET executed between a LL and SC also causes the SC to fail.
Operation:
32, 64 T: if SR2 = 1 then
PC ← ErrorEPC
SR ← SR31...3 || 0 || SR1...0
else
PC ← EPC
SR ← SR31...2 || 0 || SR0
endif
LLbit ← 0
Exceptions:
Coprocessor unusable exception
J Jump J
31 26 25 0
J target
000010
6 26
Format:
J target
Description:
The 26-bit target address is shifted left two bits and combined with the
high-order bits of the address of the delay slot. The program
unconditionally jumps to this calculated address with a delay of one
instruction.
Operation:
32 T: temp ← target
T+1: PC ← PC31...28 || temp || 02
64 T: temp ← target
T+1: PC ← PC63...28 || temp || 02
Exceptions:
None
JAL target
000011
6 26
Format:
JAL target
Description:
The 26-bit target address is shifted left two bits and combined with the
high-order bits of the address of the delay slot. The program
unconditionally jumps to this calculated address with a delay of one
instruction. The address of the instruction after the delay slot is placed in
the link register, r31.
Operation:
32 T: temp ← target
GPR[31] ← PC + 8
T+1: PC ← PC 31...28 || temp || 02
64 T: temp ← target
GPR[31] ← PC + 8
T+1: PC ← PC 63...28 || temp || 02
Exceptions:
None
SPECIAL rs 0 rd 0 JALR
000000 00000 00000 001001
6 5 5 5 5 6
Format:
JALR rs
JALR rd, rs
Description:
The program unconditionally jumps to the address contained in general
register rs, with a delay of one instruction. The address of the instruction
after the delay slot is placed in general register rd. The default value of rd,
if omitted in the assembly language instruction, is 31.
Register specifiers rs and rd may not be equal, because such an instruction
does not have the same effect when re-executed. However, an attempt to
execute this instruction is not trapped, and the result of executing such an
instruction is undefined.
Since instructions must be word-aligned, a Jump and Link Register
instruction must specify a target register (rs) whose two low-order bits are
zero. If these low-order bits are not zero, an address exception will occur
when the jump target instruction is subsequently fetched.
Operation:
Exceptions:
None
JR Jump Register JR
31 26 25 21 20 65 0
SPECIAL rs 0 JR
000000 000 0000 0000 0000 001000
6 5 15 6
Format:
JR rs
Description:
The program unconditionally jumps to the address contained in general
register rs, with a delay of one instruction.
Since instructions must be word-aligned, a Jump Register instruction
must specify a target register (rs) whose two low-order bits are zero. If
these low-order bits are not zero, an address exception will occur when the
jump target instruction is subsequently fetched.
Operation:
Exceptions:
None
LB Load Byte LB
31 26 25 21 20 16 15 0
LB base rt offset
100000
6 5 5 16
Format:
LB rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the byte at the
memory location specified by the effective address are sign-extended and
loaded into general register rt.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Format:
LBU rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the byte at the
memory location specified by the effective address are zero-extended and
loaded into general register rt.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE – 1 ...3 || (pAddr2...0 xor ReverseEndian3)
mem ← LoadMemory (uncached, BYTE, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor BigEndianCPU3
GPR[rt] ← 024 || mem7+8* byte...8* byte
Exceptions:
TLB refill exception TLB invalid exception
Bus error exception Address error exception
LD Load Doubleword
LD
31 26 25 21 20 16 15 0
LD base rt offset
110111
6 5 5 16
Format:
LD rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the 64-bit
doubleword at the memory location specified by the effective address are
loaded into general register rt.
If any of the three least-significant bits of the effective address are non-
zero, an address error exception occurs.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
mem ← LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA)
GPR[rt] ← mem
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit user mode
R4000 in 32-bit supervisor mode)
Format:
LDCz rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The processor reads a doubleword
from the addressed memory location and makes the data available to
coprocessor unit z. The manner in which each coprocessor uses the data
is defined by the individual coprocessor specifications.
If any of the three least-significant bits of the effective address are non-
zero, an address error exception takes place.
This instruction is not valid for use with CP0.
This instruction is undefined when the least-significant bit of the
rt field is non-zero.
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Coprocessor unusable exception
LDCz Bit # 31 30 29 28 27 26 0
LDC1 1 1 0 1 0 1
Bit # 31 30 29 28 27 26 0
LDC2 1 1 0 1 1 0
Format:
LDL rt, offset(base)
Description:
This instruction can be used in combination with the LDR instruction to
load a register with eight consecutive bytes from memory, when the bytes
cross a doubleword boundary. LDL loads the left portion of the register
with the appropriate part of the high-order doubleword; LDR loads the
right portion of the register with the appropriate part of the low-order
doubleword.
The LDL instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which can specify an
arbitrary byte. It reads bytes only from the doubleword in memory which
contains the specified starting byte. From one to eight bytes will be
loaded, depending on the starting byte specified.
Conceptually, it starts at the specified byte in memory and loads that byte
into the high-order (left-most) byte of the register; then it loads bytes from
memory into the register until it reaches the low-order byte of the
doubleword in memory. The least-significant (right-most) byte(s) of the
register will not be changed.
memory
(big-endian)
register
address 8 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 before A B C D E F G H $24
address 0
LDL $24,3($0)
after 3 4 5 6 7 F G H $24
Operation:
LDL
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 P B C DE F G H 0 0 7 I J K L MN O P 7 0 0
1 O P C DE F G H 1 0 6 J K L M N O P H 6 0 1
2 N O P DE F G H 2 0 5 K L M N OP G H 5 0 2
3 M N O PE F G P 3 0 4 L M N O P F G H 4 0 3
4 L M N OP F G H 4 0 3 M N O P E F G H 3 0 4
5 K L M NO P G H 5 0 2 N O P D E F G H 2 0 5
6 J K L MN O P H 6 0 1 O P C D E F G H 1 0 6
7 I J K L M N O P 7 0 0 P B C D E F G H 0 0 7
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
Format:
LDR rt, offset(base)
Description:
This instruction can be used in combination with the LDL instruction to
load a register with eight consecutive bytes from memory, when the bytes
cross a doubleword boundary. LDR loads the right portion of the register
with the appropriate part of the low-order doubleword; LDL loads the left
portion of the register with the appropriate part of the high-order
doubleword.
The LDR instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which can specify an
arbitrary byte. It reads bytes only from the doubleword in memory which
contains the specified starting byte. From one to eight bytes will be
loaded, depending on the starting byte specified.
Conceptually, it starts at the specified byte in memory and loads that byte
into the low-order (right-most) byte of the register; then it loads bytes from
memory into the register until it reaches the high-order byte of the
doubleword in memory. The most significant (left-most) byte(s) of the
register will not be changed.
memory
(big-endian)
register
address 8 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 before A B C D E F G H $24
address 0
LDR $24,4($0)
register
after A B C 0 1 2 3 4 $24
Operation:
LDR
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 I J K L M N O P 7 0 0 A B C D E F G I 0 7 0
1 A I J K L M N O 6 1 0 A B C D E F I J 1 6 0
2 A B I J K L M N 5 2 0 A B C D E I J K 2 5 0
3 A B C I J K L M 4 3 0 A B C D I J K L 3 4 0
4 A B C D I J K L 3 4 0 A B C I J K L M 4 3 0
5 A B C D E I J K 2 5 0 A B I J K L M N 5 2 0
6 A B C D E F I J 1 6 0 A I J K L MN O 6 1 0
7 A B C D E F G I 0 7 0 I J K L MNO P 7 0 0
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
LH Load Halfword LH
31 26 25 21 20 16 15 0
LH base rt offset
100001
6 5 5 16
Format:
LH rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the halfword at the
memory location specified by the effective address are sign-extended and
loaded into general register rt.
If the least-significant bit of the effective address is non-zero, an address
error exception occurs.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE – 1...3 || (pAddr2...0 xor (ReverseEndian || 0))
mem ← LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor (BigEndianCPU2 || 0)
GPR[rt] ← (mem15+8*byte)16 || mem15+8*byte...8* byte
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE – 1...3 || (pAddr2...0 xor (ReverseEndian || 0))
mem ← LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor (BigEndianCPU2 || 0)
GPR[rt] ← (mem15+8*byte)48 || mem15+8*byte...8* byte
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Format:
LHU rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the halfword at the
memory location specified by the effective address are zero-extended and
loaded into general register rt.
If the least-significant bit of the effective address is non-zero, an address
error exception occurs.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE – 1...3 || (pAddr2...0 xor (ReverseEndian2 || 0))
mem ← LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor (BigEndianCPU2 || 0)
GPR[rt] ← 016 || mem15+8*byte...8*byte
Exceptions:
TLB refill exception TLB invalid exception
Bus Error exception Address error exception
LL Load Linked LL
31 26 25 21 20 16 15 0
LL base rt offset
110000
6 5 5 16
Format:
LL rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the word at the
memory location specified by the effective address are loaded into general
register rt. In 64-bit mode, the loaded word is sign-extended.
The processor begins checking the accessed word for modification by
other processor and devices.
Load Linked and Store Conditional can be used to atomically update
memory locations as shown:
L1:
LL T1, (T0)
ADD T2, T1, 1
SC T2, (T0)
BEQ T2, 0, L1
NOP
This atomically increments the word addressed by T0. Changing the ADD
to an OR changes this to an atomic bit set. This instruction is available in
User mode, and it is not necessary for CP0 to be enabled.
The operation of LL is undefined if the addressed location is uncached
and, for synchronization between multiple processors, the operation of LL
is undefined if the addressed location is noncoherent. A cache miss that
occurs between LL and SC may cause SC to fail, so no load or store
operation should occur between LL and SC, otherwise the SC may never
be successful. Exceptions also cause SC to fail, so persistent exceptions
must be avoided. If either of the two least-significant bits of the effective
address are non-zero, an address error exception takes place.
Load Linked
LL (continued) LL
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02))
mem ← LoadMemory (uncached, WORD, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
GPR[rt] ← mem31+8*byte...8*byte
LLbit ← 1
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02))
mem ← LoadMemory (uncached, WORD, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
GPR[rt] ← (mem31+8*byte)32 || mem31+8*byte...8*byte
LLbit ← 1
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Format:
LLD rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the doubleword at
the memory location specified by the effective address are loaded into
general register rt.
The processor begins checking the accessed word for modification by
other processor and devices.
Load Linked Doubleword and Store Conditional Doubleword can be used
to atomically update memory locations:
L1:
LLD T1, (T0)
ADD T2, T1, 1
SCD T2, (T0)
BEQ T2, 0, L1
NOP
This atomically increments the word addressed by T0. Changing the ADD
to an OR changes this to an atomic bit set.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
LUI 0 rt immediate
001111 00000
6 5 5 16
Format:
LUI rt, immediate
Description:
The 16-bit immediate is shifted left 16 bits and concatenated to 16 bits of
zeros. The result is placed into general register rt. In 64-bit mode, the
loaded word is sign-extended.
Operation:
Exceptions:
None
LW Load Word
LW
31 26 25 21 20 16 15 0
LW base rt offset
100011
6 5 5 16
Format:
LW rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the word at the
memory location specified by the effective address are loaded into general
register rt. In 64-bit mode, the loaded word is sign-extended. If either of
the two least-significant bits of the effective address is non-zero, an
address error exception occurs.
Operation:
Exceptions:
TLB refill exception TLB invalid exception
Bus error exception Address error exception
Format:
LWCz rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The processor reads a word from
the addressed memory location, and makes the data available to
coprocessor unit z.
The manner in which each coprocessor uses the data is defined by the
individual coprocessor specifications.
If either of the two least-significant bits of the effective address is non-zero,
an address error exception occurs.
This instruction is not valid for use with CP0.
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02))
mem ← LoadMemory (uncached, WORD, pAddr, vAddr, DATA)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
COPzLW (byte, rt, mem)
Exceptions:
TLB refill exception TLB invalid exception
Bus error exception Address error exception
Coprocessor unusable exception
LWCz Bit # 31 30 29 28 27 26 0
LWC1 1 1 0 0 0 1
Bit # 31 30 29 28 27 26 0
LWC2 1 1 0 0 1 0
Format:
LWL rt, offset(base)
Description:
This instruction can be used in combination with the LWR instruction to
load a register with four consecutive bytes from memory, when the bytes
cross a word boundary. LWL loads the left portion of the register with the
appropriate part of the high-order word; LWR loads the right portion of
the register with the appropriate part of the low-order word.
The LWL instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which can specify an
arbitrary byte. It reads bytes only from the word in memory which
contains the specified starting byte. From one to four bytes will be loaded,
depending on the starting byte specified. In 64-bit mode, the loaded word
is sign-extended.
Conceptually, it starts at the specified byte in memory and loads that byte
into the high-order (left-most) byte of the register; then it loads bytes from
memory into the register until it reaches the low-order byte of the word in
memory. The least-significant (right-most) byte(s) of the register will not
be changed.
memory
(big-endian) register
address 4 4 5 6 7
before A B C D $24
address 0 0 1 2 3
LWL $24,1($0)
after 1 2 3 D $24
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE–1...3 || (pAddr2...0 xor ReverseEndian3)
if BigEndianMem = 0 then
pAddr ← pAddrPSIZE–1...2 || 02
endif
byte ← vAddr1...0 xor BigEndianCPU2
word ← vAddr2 xor BigEndianCPU
mem ← LoadMemory (uncached, 0 || byte, pAddr, vAddr, DATA)
temp ← mem32*word+8*byte+7...32*word || GPR[rt]23-8*byte...0
GPR[rt] ← temp
LWL
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 S S S S P F G H 0 0 7 S S S S I J K L 3 4 0
1 S S S S O P G H 1 0 6 S S S S J K L H 2 4 1
2 S S S S N O P H 2 0 5 S S S S K L G H 1 4 2
3 S S S S M N O P 3 0 4 S S S S L F G H 0 4 3
4 S S S S L F G H 0 4 3 S S S S MN O P 3 0 4
5 S S S S K L G H 1 4 2 S S S S N O P H 2 0 5
6 S S S S J K L H 2 4 1 S S S S OP G H 1 0 6
7 S S S S I J K L 3 4 0 S S S S P F G H 0 0 7
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Format:
LWR rt, offset(base)
Description:
This instruction can be used in combination with the LWL instruction to
load a register with four consecutive bytes from memory, when the bytes
cross a word boundary. LWR loads the right portion of the register with
the appropriate part of the low-order word; LWL loads the left portion of
the register with the appropriate part of the high-order word.
The LWR instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which can specify an
arbitrary byte. It reads bytes only from the word in memory which
contains the specified starting byte. From one to four bytes will be loaded,
depending on the starting byte specified. In 64-bit mode, if bit 31 of the
destination register is loaded, then the loaded word is sign-extended.
Conceptually, it starts at the specified byte in memory and loads that byte
into the low-order (right-most) byte of the register; then it loads bytes from
memory into the register until it reaches the high-order byte of the word
in memory. The most significant (left-most) byte(s) of the register will not
be changed.
memory
(big-endian) register
address 4 4 5 6 7
before A B C D $24
address 0 0 1 2 3
LWR $24,4($0)
after A B C 4
Operation:
LWR
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 S S S S M N O P 0 0 4 X X X X E F G I 0 7 0
1 X X X X E M N O 1 1 4 X X X X E F I J 1 6 0
2 X X X X E F M N 2 2 4 X X X X E I J K 2 5 0
3 X X X X E F G M 3 3 4 S S S S I J K L 3 4 0
4 S S S S I J K L 0 4 0 X X X X E F G M 0 3 4
5 X X X X E I J K 1 5 0 X X X X E F M N 1 2 4
6 X X X X E F I J 2 6 0 X X X X E MN O 2 1 4
7 X X X X E F G I 3 7 0 S S S S MNO P 3 0 4
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Format:
LWU rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of the word at the
memory location specified by the effective address are loaded into general
register rt. The loaded word is zero-extended.
If either of the two least-significant bits of the effective address is non-zero,
an address error exception occurs.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
Move From
MFC0 System Control Coprocessor MFC0
31 26 25 21 20 16 15 11 10 0
COP0 MF rt rd 0
010000 00000 000 0000 0000
6 5 5 5 11
Format:
MFC0 rt, rd
Description:
The contents of coprocessor register rd of the CP0 are loaded into general
register rt.
Operation:
32 T: data ← CPR[0,rd]
T+1: GPR[rt] ← data
64 T: data ← CPR[0,rd]
T+1: GPR[rt] ← (data31)32 || data31...0
Exceptions:
Coprocessor unusable exception
COPz MF rt rd 0
0 1 0 0 x x* 00000 000 0000 0000
6 5 5 5 11
Format:
MFCz rt, rd
Description:
The contents of coprocessor register rd of coprocessor z are loaded into
general register rt.
Operation:
32 T: data ← CPR[z,rd]
T+1: GPR[rt] ← data
64 T: if rd0 = 0 then
data ← CPR[z,rd4...1 || 0]31...0
else
data ← CPR[z,rd4...1 || 0]63...32
endif
T+1: GPR[rt] ← (data31)32 || data
Exceptions:
Coprocessor unusable exception
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
24 23 22 21
MFCz Bit # 31 30 29 28 27 26 25 0
MFC0 0 1 0 0 0 0 0 0 0 0 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 0
MFC1 0 1 0 0 0 1 0 0 0 0 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 0
MFC2 0 1 0 0 1 0 0 0 0 0 0
SPECIAL 0 rd 0 MFHI
000000 00 0000 0000 00000 010000
6 10 5 5 6
Format:
MFHI rd
Description:
The contents of special register HI are loaded into general register rd.
To ensure proper operation in the event of interruptions, the two
instructions which follow a MFHI instruction may not be any of the
instructions which modify the HI register: MULT, MULTU, DIV, DIVU,
MTHI, DMULT, DMULTU, DDIV, DDIVU.
Operation:
32, 64 T: GPR[rd] ← HI
Exceptions:
None
SPECIAL 0 rd 0 MFLO
000000 00 0000 0000 00000 010010
6 10 5 5 6
Format:
MFLO rd
Description:
The contents of special register LO are loaded into general register rd.
To ensure proper operation in the event of interruptions, the two
instructions which follow a MFLO instruction may not be any of the
instructions which modify the LO register: MULT, MULTU, DIV, DIVU,
MTLO, DMULT, DMULTU, DDIV, DDIVU.
Operation:
32, 64 T: GPR[rd] ← LO
Exceptions:
None
Move To
MTC0 System Control Coprocessor MTC0
31 26 25 21 20 16 15 11 10 0
COP0 MT rt rd 0
010000 00100 0 0 0 0 0 0 0 0 0 00
6 5 5 5 11
Format:
MTC0 rt, rd
Description:
The contents of general register rt are loaded into coprocessor register rd
of CP0.
Because the state of the virtual address translation system may be altered
by this instruction, the operation of load instructions, store instructions,
and TLB operations immediately prior to and after this instruction are
undefined.
Operation:
Exceptions:
Coprocessor unusable exception
MTCz
31 26 25
Move To Coprocessor
21 20 16 15 11 10
MTCz 0
COPz MT rt rd 0
0 1 0 0 x x* 00100 000 0000 0000
6 5 5 5 11
Format:
MTCz rt, rd
Description:
The contents of general register rt are loaded into coprocessor register rd
of coprocessor z.
Operation:
32 T: data ← GPR[rt]
T+1: CPR[z,rd] ← data
64 T: data ← GPR[rt]31...0
T+1: if rd0 = 0
CPR[z,rd4...1 || 0] ← CPR[z, rd4...1 || 0]63...32 || data
else
CPR[z,rd4...1 || 0] ← data || CPR[z,rd4...1 || 0]31...0
endif
Exceptions:
Coprocessor unusable exception
C0P1 0 1 0 0 0 1 0 0 1 0 0
Bit # 31 30 29 28 27 26 25 24 23 22 21 0
C0P2 0 1 0 0 1 0 0 0 1 0 0
SPECIAL rs 0 MTHI
000000 000 000000000000 010001
6 5 15 6
Format:
MTHI rs
Description:
The contents of general register rs are loaded into special register HI.
If a MTHI operation is executed following a MULT, MULTU, DIV, or
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI
instructions, the contents of special register LO are undefined.
Operation:
Exceptions:
None
SPECIAL rs 0 MTLO
000000 000000000000000 010011
6 5 15 6
Format:
MTLO rs
Description:
The contents of general register rs are loaded into special register LO.
If a MTLO operation is executed following a MULT, MULTU, DIV, or
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI
instructions, the contents of special register HI are undefined.
Operation:
32,64 T–2: LO ← undefined
T–1: LO ← undefined
T: LO ← GPR[rs]
Exceptions:
None
Format:
MULT rs, rt
Description:
The contents of general registers rs and rt are multiplied, treating both
operands as 32-bit 2’s complement values. No integer overflow exception
occurs under any circumstances. In 64-bit mode, the operands must be
valid 32-bit, sign-extended values.
When the operation completes, the low-order word of the double result is
loaded into special register LO, and the high-order word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
these instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by a minimum of two other instructions.
Multiply
MULT (continued) MULT
Operation:
32 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: t ← GPR[rs] * GPR[rt]
LO ← t31...0
HI ← t63...32
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: t ← GPR[rs]31...0 * GPR[rt]31...0
LO ← (t31)32 || t31...0
HI ← (t63)32 || t63...32
Exceptions:
None
SPECIAL rs rt 0 MULTU
000000 00 0000 0000 011001
6 5 5 10 6
Format:
MULTU rs, rt
Description:
The contents of general register rs and the contents of general register rt
are multiplied, treating both operands as unsigned values. No overflow
exception occurs under any circumstances. In 64-bit mode, the operands
must be valid 32-bit, sign-extended values.
When the operation completes, the low-order word of the double result is
loaded into special register LO, and the high-order word of the double
result is loaded into special register HI.
If either of the two preceding instructions is MFHI or MFLO, the results of
these instructions are undefined. Correct operation requires separating
reads of HI or LO from writes by a minimum of two instructions.
Multiply Unsigned
MULTU (continued) MULTU
Operation:
32 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: t ← (0 || GPR[rs]) * (0 || GPR[rt])
LO ← t31...0
HI ← t63...32
64 T–2: LO ← undefined
HI ← undefined
T–1: LO ← undefined
HI ← undefined
T: t ← (0 || GPR[rs]31...0) * (0 || GPR[rt]31...0)
LO ← (t31)32 || t31...0
HI ← (t63)32 || t63...32
Exceptions:
None
SPECIAL rs rt rd 0 NOR
000000 00000 100111
6 5 5 5 5 6
Format:
NOR rd, rs, rt
Description:
The contents of general register rs are combined with the contents of
general register rt in a bit-wise logical NOR operation. The result is placed
into general register rd.
Operation:
Exceptions:
None
OR Or OR
31 26 25 21 20 16 15 11 10 6 5 0
SPECIAL rs rt rd 0 OR
000000 00000 100101
6 5 5 5 5 6
Format:
OR rd, rs, rt
Description:
The contents of general register rs are combined with the contents of
general register rt in a bit-wise logical OR operation. The result is placed
into general register rd.
Operation:
Exceptions:
None
ORI rs rt immediate
001101
6 5 5 16
Format:
ORI rt, rs, immediate
Description:
The 16-bit immediate is zero-extended and combined with the contents of
general register rs in a bit-wise logical OR operation. The result is placed
into general register rt.
Operation:
Exceptions:
None
SB Store Byte SB
31 26 25 21 20 16 15 0
SB base rt offset
101000
6 5 5 16
Format:
SB rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The least-significant byte of register
rt is stored at the effective address.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
SC Store Conditional SC
31 26 25 21 20 16 15 0
SC base rt offset
111000
6 5 5 16
Format:
SC rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of general register rt
are conditionally stored at the memory location specified by the effective
address.
If any other processor or device has modified the physical address since
the time of the previous Load Linked instruction, or if an ERET instruction
occurs between the Load Linked instruction and this store instruction, the
store fails and is inhibited from taking place.
The success or failure of the store operation (as defined above) is indicated
by the contents of general register rt after execution of the instruction. A
successful store sets the contents of general register rt to 1; an unsuccessful
store sets it to 0.
The operation of Store Conditional is undefined when the address is
different from the address used in the last Load Linked.
This instruction is available in User mode; it is not necessary for CP0 to be
enabled.
If either of the two least-significant bits of the effective address is non-zero,
an address error exception takes place.
If this instruction should both fail and take an exception, the exception
takes precedence.
Store Conditional
SC (continued) SC
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02))
data ← GPR[rt]63-8*byte...0 || 08*byte
if LLbit then
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)
endif
GPR[rt] ← 031 || LLbit
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Format:
SCD rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of general register rt
are conditionally stored at the memory location specified by the effective
address.
If any other processor or device has modified the physical address since
the time of the previous Load Linked Doubleword instruction, or if an
ERET instruction occurs between the Load Linked Doubleword
instruction and this store instruction, the store fails and is inhibited from
taking place.
The success or failure of the store operation (as defined above) is indicated
by the contents of general register rt after execution of the instruction. A
successful store sets the contents of general register rt to 1; an unsuccessful
store sets it to 0.
The operation of Store Conditional Doubleword is undefined when the
address is different from the address used in the last Load Linked
Doubleword.
This instruction is available in User mode; it is not necessary for CP0 to be
enabled.
If either of the three least-significant bits of the effective address is non-
zero, an address error exception takes place.
If this instruction should both fail and take an exception, the exception
takes precedence.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
SD Store Doubleword SD
31 26 25 21 20 16 15 0
SD base rt offset
111111
6 5 5 16
Format:
SD rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of general register rt
are stored at the memory location specified by the effective address.
If either of the three least-significant bits of the effective address are non-
zero, an address error exception occurs.
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit user mode
R4000 in 32-bit supervisor mode)
Format:
SDCz rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. Coprocessor unit z sources a
doubleword, which the processor writes to the addressed memory
location. The data to be stored is defined by individual coprocessor
specifications.
If any of the three least-significant bits of the effective address are non-
zero, an address error exception takes place.
This instruction is not valid for use with CP0.
This instruction is undefined when the least-significant bit of the rt field is
non-zero.
Operation:
*See the table, “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Store Doubleword
SDCz From Coprocessor SDCz
(continued)
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Coprocessor unusable exception
SDCz Bit # 31 30 29 28 27 26 0
SDC1 1 1 1 1 0 1
Bit # 31 30 29 28 27 26 0
SDC2 1 1 1 1 1 0
Format:
SDL rt, offset(base)
Description:
This instruction can be used with the SDR instruction to store the contents
of a register into eight consecutive bytes of memory, when the bytes cross
a doubleword boundary. SDL stores the left portion of the register into the
appropriate part of the high-order doubleword of memory; SDR stores the
right portion of the register into the appropriate part of the low-order
doubleword.
The SDL instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which may specify an
arbitrary byte. It alters only the word in memory which contains that byte.
From one to four bytes will be stored, depending on the starting byte
specified.
Conceptually, it starts at the most-significant byte of the register and
copies it to the specified byte in memory; then it copies bytes from register
to memory until it reaches the low-order byte of the word in memory.
No address exceptions due to alignment are possible.
memory
(big-endian)
register
address 8 8 9 10 11 12 13 14 15
before A B C D E F G H $24
address 0 0 1 2 3 4 5 6 7
SDL $24,1($0)
address 8 8 9 10 11 12 13 14 15
after
address 0 0 B C D E F G H
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: vAddr ← ((offset15)48 || offset 15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE –1...3 || (pAddr2...0 xor ReverseEndian3)
If BigEndianMem = 0 then
pAddr ← pAddr31...3 || 03
endif
byte ← vAddr2...0 xor BigEndianCPU3
data ← 056–8*byte || GPR[rt]63...56–8*byte
Storememory (uncached, byte, data, pAddr, vAddr, DATA)
SDL
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type LEM BEM
LEM BEM
0 I J K L M N O A 0 0 7 A B C D E F G H 7 0 0
1 I J K L M N A B 1 0 6 I A B C D E F G 6 0 1
2 I J K L M A B C 2 0 5 I J A B C D E F 5 0 2
3 I J K L A B C D 3 0 4 I J K A B C D E 4 0 3
4 I J K A B C D E 4 0 3 I J K L A B C D 3 0 4
5 I J A B C D E F 5 0 2 I J K L MA B C 2 0 5
6 I A B C D E F G 6 0 1 I J K L MN A B 1 0 6
7 A B C D E F G H 7 0 0 I J K L MN O A 0 0 7
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
Format:
SDR rt, offset(base)
Description:
This instruction can be used with the SDL instruction to store the contents
of a register into eight consecutive bytes of memory, when the bytes cross
a boundary between two doublewords. SDR stores the right portion of the
register into the appropriate part of the low-order doubleword; SDL stores
the left portion of the register into the appropriate part of the low-order
doubleword of memory.
The SDR instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which may specify an
arbitrary byte. It alters only the word in memory which contains that byte.
From one to eight bytes will be stored, depending on the starting byte
specified.
Conceptually, it starts at the least-significant (rightmost) byte of the
register and copies it to the specified byte in memory; then it copies bytes
from register to memory until it reaches the high-order byte of the word in
memory. No address exceptions due to alignment are possible.
memory
(big-endian)
register
address 8 8 9 10 11 12 13 14 15
before A B C D E F G H $24
address 0 0 1 2 3 4 5 6 7
This operation is only defined for the R4000 operating in 64-bit mode.
Execution of this instruction in 32-bit mode causes a reserved instruction
exception.
Operation:
64 T: vAddr ← ((offset15)48 || offset 15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE – 1...3 || (pAddr2...0 xor ReverseEndian3)
If BigEndianMem = 0 then
pAddr ← pAddrPSIZE – 31...3 || 03
endif
byte ← vAddr1...0 xor BigEndianCPU3
data ← GPR[rt]63–8*byte || 08*byte
StoreMemory (uncached, DOUBLEWORD-byte, data, pAddr, vAddr, DATA)
SDR
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 A B C DE F G H 7 0 0 H J K L MN O P 0 7 0
1 B C D EF G H P 6 1 0 G H K L MN O P 1 6 0
2 C D E F G H O P 5 2 0 F G H L MN O P 2 5 0
3 D E F GH N O P 4 3 0 E F G H MN O P 3 4 0
4 E F G HM N O P 3 4 0 D E F G H N O P 4 3 0
5 F G H L M N O P 2 5 0 C D E F GH O P 5 2 0
6 G H K L M N O P 1 6 0 B C D E F G H P 6 1 0
7 H J K L M N O P 0 7 0 A B C D E F G H 7 0 0
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Reserved instruction exception (R4000 in 32-bit mode)
SH Store Halfword SH
31 26 25 21 20 16 15 0
SH base rt offset
101001
6 5 5 16
Format:
SH rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form an unsigned effective address. The least-significant
halfword of register rt is stored at the effective address. If the least-
significant bit of the effective address is non-zero, an address error
exception occurs.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian2 || 0))
byte ← vAddr2...0 xor (BigEndianCPU2 || 0)
data ← GPR[rt]63–8*byte...0 || 08*byte
StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA)
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian2 || 0))
byte ← vAddr2...0 xor (BigEndianCPU2 || 0)
data ← GPR[rt]63–8*byte...0 || 08*byte
StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA)
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
SPECIAL 0 rt rd sa SLL
000000 00000 000000
6 5 5 5 5 6
Format:
SLL rd, rt, sa
Description:
The contents of general register rt are shifted left by sa bits, inserting zeros
into the low-order bits.
The result is placed in register rd.
In 64-bit mode, the 32-bit result is sign extended when placed in the
destination register. It is sign extended for all shift amounts, including
zero; SLL with a zero shift amount truncates a 64-bit value to 32 bits and
then sign extends this 32-bit value. SLL, unlike nearly all other word
operations, does not require an operand to be a properly sign-extended
word value to produce a valid sign-extended word result.
NOTE: SLL with a shift amount of zero may be treated as a NOP by
some assemblers, at some optimization levels. If using SLL with a
zero shift to truncate 64-bit values, check the assembler you are using.
Operation:
32 T: GPR[rd] ← GPR[rt]31– sa...0 || 0sa
64 T: s ← 0 || sa
temp ← GPR[rt]31-s...0 || 0s
GPR[rd] ← (temp31)32 || temp
Exceptions:
None
SPECIAL rs rt rd 0 SLLV
000000 00000 000100
6 5 5 5 5 6
Format:
SLLV rd, rt, rs
Description:
The contents of general register rt are shifted left the number of bits
specified by the low-order five bits contained in general register rs,
inserting zeros into the low-order bits.
The result is placed in register rd.
In 64-bit mode, the 32-bit result is sign extended when placed in the
destination register. It is sign extended for all shift amounts, including
zero; SLLV with a zero shift amount truncates a 64-bit value to 32 bits and
then sign extends this 32-bit value. SLLV, unlike nearly all other word
operations, does not require an operand to be a properly sign-extended
word value to produce a valid sign-extended word result.
NOTE: SLLV with a shift amount of zero may be treated as a NOP by
some assemblers, at some optimization levels. If using SLLV with a
zero shift to truncate 64-bit values, check the assembler you are using.
Operation:
32 T: s ← GP[rs]4...0
GPR[rd]← GPR[rt](31–s)...0 || 0s
64 T: s ← 0 || GP[rs]4...0
temp ← GPR[rt](31-s)...0 || 0s
GPR[rd] ← (temp31)32 || temp
Exceptions:
None
SPECIAL rs rt rd 0 SLT
000000 00000 101010
6 5 5 5 5 6
Format:
SLT rd, rs, rt
Description:
The contents of general register rt are subtracted from the contents of
general register rs. Considering both quantities as signed integers, if the
contents of general register rs are less than the contents of general register
rt, the result is set to one; otherwise the result is set to zero.
The result is placed into general register rd.
No integer overflow exception occurs under any circumstances. The
comparison is valid even if the subtraction used during the comparison
overflows.
Operation:
Exceptions:
None
SLTI rs rt immediate
001010
6 5 5 16
Format:
SLTI rt, rs, immediate
Description:
The 16-bit immediate is sign-extended and subtracted from the contents of
general register rs. Considering both quantities as signed integers, if rs is
less than the sign-extended immediate, the result is set to one; otherwise
the result is set to zero.
The result is placed into general register rt.
No integer overflow exception occurs under any circumstances. The
comparison is valid even if the subtraction used during the comparison
overflows.
Operation:
32 T: if GPR[rs] < (immediate15)16 || immediate15...0 then
GPR[rd] ← 031 || 1
else
GPR[rd] ← 032
endif
Exceptions:
None
SLTIU rs rt immediate
001011
6 5 5 16
Format:
SLTIU rt, rs, immediate
Description:
The 16-bit immediate is sign-extended and subtracted from the contents of
general register rs. Considering both quantities as unsigned integers, if rs
is less than the sign-extended immediate, the result is set to one; otherwise
the result is set to zero.
The result is placed into general register rt.
No integer overflow exception occurs under any circumstances. The
comparison is valid even if the subtraction used during the comparison
overflows.
Operation:
Exceptions:
None
SPECIAL rs rt rd 0 SLTU
000000 00000 101011
6 5 5 5 5 6
Format:
SLTU rd, rs, rt
Description:
The contents of general register rt are subtracted from the contents of
general register rs. Considering both quantities as unsigned integers, if the
contents of general register rs are less than the contents of general register
rt, the result is set to one; otherwise the result is set to zero.
The result is placed into general register rd.
No integer overflow exception occurs under any circumstances. The
comparison is valid even if the subtraction used during the comparison
overflows.
Operation:
Exceptions:
None
SPECIAL 0 rt rd sa SRA
000000 00000 000011
6 5 5 5 5 6
Format:
SRA rd, rt, sa
Description:
The contents of general register rt are shifted right by sa bits, sign-
extending the high-order bits.
The result is placed in register rd.
In 64-bit mode, the operand must be a valid sign-extended, 32-bit value.
Operation:
64 T: s ← 0 || sa
temp ← (GPR[rt]31)s || GPR[rt] 31...s
GPR[rd] ← (temp31)32 || temp
Exceptions:
None
Shift Right
SRAV Arithmetic Variable SRAV
31 26 25 21 20 16 15 11 10 6 5 0
SPECIAL rs rt rd 0 SRAV
000000 00000 000111
6 5 5 5 5 6
Format:
SRAV rd, rt, rs
Description:
The contents of general register rt are shifted right by the number of bits
specified by the low-order five bits of general register rs, sign-extending
the high-order bits.
The result is placed in register rd.
In 64-bit mode, the operand must be a valid sign-extended, 32-bit value.
Operation:
32 T: s ← GPR[rs]4...0
GPR[rd] ← (GPR[rt]31)s || GPR[rt]31...s
64 T: s ← GPR[rs]4...0
temp ← (GPR[rt]31)s || GPR[rt]31...s
GPR[rd] ← (temp31)32 || temp
Exceptions:
None
SPECIAL 0 rt rd sa SRL
000000 00000 000010
6 5 5 5 5 6
Format:
SRL rd, rt, sa
Description:
The contents of general register rt are shifted right by sa bits, inserting
zeros into the high-order bits.
The result is placed in register rd.
In 64-bit mode, the operand must be a valid sign-extended, 32-bit value.
Operation:
32 T: GPR[rd] ← 0 sa || GPR[rt]31...sa
64 T: s ← 0 || sa
temp ← 0s || GPR[rt]31...s
GPR[rd] ← (temp31)32 || temp
Exceptions:
None
SPECIAL rs rt rd 0 SRLV
000000 00000 000110
6 5 5 5 5 6
Format:
SRLV rd, rt, rs
Description:
The contents of general register rt are shifted right by the number of bits
specified by the low-order five bits of general register rs, inserting zeros
into the high-order bits.
The result is placed in register rd.
In 64-bit mode, the operand must be a valid sign-extended, 32-bit value.
Operation:
32 T: s ← GPR[rs]4...0
GPR[rd] ← 0s || GPR[rt]31...s
64 T: s ← GPR[rs]4...0
temp ← 0s || GPR[rt]31...s
GPR[rd] ← (temp31)32 || temp
Exceptions:
None
SPECIAL rs rt rd 0 SUB
000000 00000 100010
6 5 5 5 5 6
Format:
SUB rd, rs, rt
Description:
The contents of general register rt are subtracted from the contents of
general register rs to form a result. The result is placed into general
register rd. In 64-bit mode, the operands must be valid sign-extended, 32-
bit values.
The only difference between this instruction and the SUBU instruction is
that SUBU never traps on overflow.
An integer overflow exception takes place if the carries out of bits 30 and
31 differ (2’s complement overflow). The destination register rd is not
modified when an integer overflow exception occurs.
Operation:
32 T: GPR[rd] ← GPR[rs] – GPR[rt]
Exceptions:
Integer overflow exception
SPECIAL rs rt rd 0 SUBU
000000 00000 100011
6 5 5 5 5 6
Format:
SUBU rd, rs, rt
Description:
The contents of general register rt are subtracted from the contents of
general register rs to form a result.
The result is placed into general register rd.
In 64-bit mode, the operands must be valid sign-extended, 32-bit values.
The only difference between this instruction and the SUB instruction is
that SUBU never traps on overflow. No integer overflow exception occurs
under any circumstances.
Operation:
32 T: GPR[rd] ← GPR[rs] – GPR[rt]
Exceptions:
None
SW Store Word SW
31 26 25 21 20 16 15 0
SW base rt offset
101011
6 5 5 16
Format:
SW rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. The contents of general register rt
are stored at the memory location specified by the effective address.
If either of the two least-significant bits of the effective address are non-
zero, an address error exception occurs.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
data ← GPR[rt]63-8*byte || 08*byte
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
data ← GPR[rt]63-8*byte || 08*byte
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)
Exceptions:
TLB refill exception TLB invalid exception
TLB modification exception Bus error exception
Address error exception
Format:
SWCz rt, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form a virtual address. Coprocessor unit z sources a word,
which the processor writes to the addressed memory location.
The data to be stored is defined by individual coprocessor specifications.
This instruction is not valid for use with CP0.
If either of the two least-significant bits of the effective address is non-zero,
an address error exception occurs.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
data ← COPzSW (byte, rt)
StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA)
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
(pAddr, uncached) ← AddressTranslation (vAddr, DATA)
pAddr ← pAddrPSIZE-1...3 || (pAddr2...0 xor (ReverseEndian || 02)
byte ← vAddr2...0 xor (BigEndianCPU || 02)
data ← COPzSW (byte,rt)
StoreMemory (uncached, WORD, data, pAddr, vAddr DATA)
*See the table “Opcode Bit Encoding” on next page, or “CPU Instruction
Opcode Bit Encoding” at the end of Appendix A.
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Coprocessor unusable exception
SWCz Bit # 31 30 29 28 27 26 0
SWC1 1 1 1 0 0 1
Bit # 31 30 29 28 27 26 0
SWC2 1 1 1 0 1 0
Format:
SWL rt, offset(base)
Description:
This instruction can be used with the SWR instruction to store the contents
of a register into four consecutive bytes of memory, when the bytes cross
a word boundary. SWL stores the left portion of the register into the
appropriate part of the high-order word of memory; SWR stores the right
portion of the register into the appropriate part of the low-order word.
The SWL instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which may specify an
arbitrary byte. It alters only the word in memory which contains that byte.
From one to four bytes will be stored, depending on the starting byte
specified.
Conceptually, it starts at the most-significant byte of the register and
copies it to the specified byte in memory; then it copies bytes from register
to memory until it reaches the low-order byte of the word in memory.
No address exceptions due to alignment are possible.
memory
(big-endian) register
address 4 4 5 6 7
before A B C D $24
address 0 0 1 2 3
SWL $24,1($0)
address 4 4 5 6 7
after
address 0 0 A B C
Operation:
SWL
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 I J K L M N O E 0 0 7 E F G H M N O P 3 4 0
1 I J K L M N E F 1 0 6 I E F G M N O P 2 4 1
2 I J K L M E F G 2 0 5 I J E F M N O P 1 4 2
3 I J K L E F G H 3 0 4 I J K E M N O P 0 4 3
4 I J K EM N O P 0 4 3 I J K L E F G H 3 0 4
5 I J E F M N O P 1 4 2 I J K L M E F G 2 0 5
6 I E F GM N O P 2 4 1 I J K L M N E F 1 0 6
7 E F G HM N O P 3 4 0 I J K L M N O E 0 0 7
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Format:
SWR rt, offset(base)
Description:
This instruction can be used with the SWL instruction to store the contents
of a register into four consecutive bytes of memory, when the bytes cross
a boundary between two words. SWR stores the right portion of the
register into the appropriate part of the low-order word; SWL stores the
left portion of the register into the appropriate part of the low-order word
of memory.
The SWR instruction adds its sign-extended 16-bit offset to the contents of
general register base to form a virtual address which may specify an
arbitrary byte. It alters only the word in memory which contains that byte.
From one to four bytes will be stored, depending on the starting byte
specified.
Conceptually, it starts at the least-significant (rightmost) byte of the
register and copies it to the specified byte in memory; then copies bytes
from register to memory until it reaches the high-order byte of the word in
memory.
No address exceptions due to alignment are possible.
memory
(big-endian) register
address 4 4 5 6 7
before A B C D $24
address 0 0 1 2 3
SWR $24,1($0)
address 4 D 5 6 7
after
address 0 0 1 2 3
SWR
Register A B C D E F G H
Memory I J K L M N O P
BigEndianCPU = 0 BigEndianCPU = 1
offset offset
vAddr2..0 destination type destination type
LEM BEM LEM BEM
0 I J K L E F G H 3 0 4 H J K L MN O P 0 7 0
1 I J K L F G H P 2 1 4 G H K L MN O P 1 6 0
2 I J K L G H O P 1 2 4 F G H L MN O P 2 5 0
3 I J K L H N O P 0 3 4 E F G H MN O P 3 4 0
4 E F G H M N O P 3 4 0 I J K L H N O P 0 3 4
5 F G H L M N O P 2 5 0 I J K L GH O P 1 2 4
6 G H K L M N O P 1 6 0 I J K L F G H P 2 1 4
7 H J K L M N O P 0 7 0 I J K L E F G H 3 0 4
Exceptions:
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
SPECIAL 0 SYNC
000000 0000 0000 0000 0000 0000 001111
6 20 6
Format:
SYNC
Description:
The SYNC instruction ensures that any loads and stores fetched prior to the
present instruction are completed before any loads or stores after this
instruction are allowed to start. Use of the SYNC instruction to serialize
certain memory references may be required in a multiprocessor
environment for proper synchronization. For example:
Processor A Processor B
SW R1, DATA 1: LW R2, FLAG
LI R2, 1 BEQ R2, R0, 1B
SYNC NOP
SW R2, FLAG SYNC
LW R1, DATA
Exceptions:
None
Format:
SYSCALL
Description:
A system call exception occurs, immediately and unconditionally
transferring control to the exception handler.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
32, 64 T: SystemCallException
Exceptions:
System Call exception
Format:
TEQ rs, rt
Description:
The contents of general register rt are compared to general register rs. If
the contents of general register rs are equal to the contents of general
register rt, a trap exception occurs.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
32, 64 T: if GPR[rs] = GPR[rt] then
TrapException
endif
Exceptions:
Trap exception
Format:
TEQI rs, immediate
Description:
The 16-bit immediate is sign-extended and compared to the contents of
general register rs. If the contents of general register rs are equal to the
sign-extended immediate, a trap exception occurs.
Operation:
32 T: if GPR[rs] = (immediate15)16 || immediate15...0 then
TrapException
endif
Exceptions:
Trap exception
Format:
TGE rs, rt
Description:
The contents of general register rt are compared to the contents of general
register rs. Considering both quantities as signed integers, if the contents
of general register rs are greater than or equal to the contents of general
register rt, a trap exception occurs.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
Exceptions:
Trap exception
Format:
TGEI rs, immediate
Description:
The 16-bit immediate is sign-extended and compared to the contents of
general register rs. Considering both quantities as signed integers, if the
contents of general register rs are greater than or equal to the sign-
extended immediate, a trap exception occurs.
Operation:
Exceptions:
Trap exception
Format:
TGEIU rs, immediate
Description:
The 16-bit immediate is sign-extended and compared to the contents of
general register rs. Considering both quantities as unsigned integers, if the
contents of general register rs are greater than or equal to the sign-
extended immediate, a trap exception occurs.
Operation:
Exceptions:
Trap exception
Format:
TGEU rs, rt
Description:
The contents of general register rt are compared to the contents of general
register rs. Considering both quantities as unsigned integers, if the
contents of general register rs are greater than or equal to the contents of
general register rt, a trap exception occurs.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
Exceptions:
Trap exception
COP0 CO 0 TLBP
010000 1 000 0000 0000 0000 0000 001000
6 1 19 6
Format:
TLBP
Description:
The Index register is loaded with the address of the TLB entry whose
contents match the contents of the EntryHi register. If no TLB entry
matches, the high-order bit of the Index register is set.
The architecture does not specify the operation of memory references
associated with the instruction immediately after a TLBP instruction, nor
is the operation specified if more than one TLB entry matches.
Operation:
Exceptions:
Coprocessor unusable exception
COP0 CO 0 TLBR
010000 1 000 0000 0000 0000 0000 000001
6 1 19 6
Format:
TLBR
Description:
The G bit (which controls ASID matching) read from the TLB is written
into both of the EntryLo0 and EntryLo1 registers.
The EntryHi and EntryLo registers are loaded with the contents of the TLB
entry pointed at by the contents of the TLB Index register. The operation
is invalid (and the results are unspecified) if the contents of the TLB Index
register are greater than the number of TLB entries in the processor.
Operation:
32 T: PageMask ← TLB[Index5...0]127...96
EntryHi ← TLB[Index5...0]95...64 and not TLB[Index5...0]127...96
EntryLo1 ←TLB[Index5...0]63...32
EntryLo0 ← TLB[Index5...0]31...0
64 T: PageMask ← TLB[Index5...0]255...192
EntryHi ← TLB[Index5...0]191...128 and not TLB[Index5...0]255...192
EntryLo1 ←TLB[Index5...0]127...65 || TLB[Index5...0]140
EntryLo0 ← TLB[Index5...0]63...1 || TLB[Index5...0]140
Exceptions:
Coprocessor unusable exception
COP0 CO 0 TLBWI
010000 1 000 0000 0000 0000 0000 000010
6 1 19 6
Format:
TLBWI
Description:
The G bit of the TLB is written with the logical AND of the G bits in the
EntryLo0 and EntryLo1 registers.
The TLB entry pointed at by the contents of the TLB Index register is loaded
with the contents of the EntryHi and EntryLo registers.
The operation is invalid (and the results are unspecified) if the contents of
the TLB Index register are greater than the number of TLB entries in the
processor.
Operation:
32, 64 T: TLB[Index5...0] ←
PageMask || (EntryHi and not PageMask) || EntryLo1 || EntryLo0
Exceptions:
Coprocessor unusable exception
COP0 CO 0 TLBWR
010000 1 000 0000 0000 0000 0000 000110
6 1 19 6
Format:
TLBWR
Description:
The G bit of the TLB is written with the logical AND of the G bits in the
EntryLo0 and EntryLo1 registers.
The TLB entry pointed at by the contents of the TLB Random register is
loaded with the contents of the EntryHi and EntryLo registers.
Operation:
32, 64 T: TLB[Random5...0] ←
PageMask || (EntryHi and not PageMask) || EntryLo1 || EntryLo0
Exceptions:
Coprocessor unusable exception
Format:
TLT rs, rt
Description:
The contents of general register rt are compared to general register rs.
Considering both quantities as signed integers, if the contents of general
register rs are less than the contents of general register rt, a trap exception
occurs.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
Exceptions:
Trap exception
Format:
TLTI rs, immediate
Description:
The 16-bit immediate is sign-extended and compared to the contents of
general register rs. Considering both quantities as signed integers, if the
contents of general register rs are less than the sign-extended immediate, a
trap exception occurs.
Operation:
32 T: if GPR[rs] < (immediate15)16 || immediate15...0 then
TrapException
endif
64 T: if GPR[rs] < (immediate15)48 || immediate15...0 then
TrapException
endif
Exceptions:
Trap exception
Format:
TLTIU rs, immediate
Description:
The 16-bit immediate is sign-extended and compared to the contents of
general register rs. Considering both quantities as signed integers, if the
contents of general register rs are less than the sign-extended immediate, a
trap exception occurs.
Operation:
32 T: if (0 || GPR[rs]) < (0 || (immediate15)16 || immediate15...0) then
TrapException
endif
64 T: if (0 || GPR[rs]) < (0 || (immediate15)48 || immediate15...0) then
TrapException
endif
Exceptions:
Trap exception
Format:
TLTU rs, rt
Description:
The contents of general register rt are compared to general register rs.
Considering both quantities as unsigned integers, if the contents of
general register rs are less than the contents of general register rt, a trap
exception occurs.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
Exceptions:
Trap exception
Format:
TNE rs, rt
Description:
The contents of general register rt are compared to general register rs. If
the contents of general register rs are not equal to the contents of general
register rt, a trap exception occurs.
The code field is available for use as software parameters, but is retrieved
by the exception handler only by loading the contents of the memory word
containing the instruction.
Operation:
Exceptions:
Trap exception
Format:
TNEI rs, immediate
Description:
The 16-bit immediate is sign-extended and compared to the contents of
general register rs. If the contents of general register rs are not equal to the
sign-extended immediate, a trap exception occurs.
Operation:
Exceptions:
Trap exception
SPECIAL rs rt rd 0 XOR
000000 00000 100110
6 5 5 5 5 6
Format:
XOR rd, rs, rt
Description:
The contents of general register rs are combined with the contents of
general register rt in a bit-wise logical exclusive OR operation.
The result is placed into general register rd.
Operation:
Exceptions:
None
XORI rs rt immediate
001110
6 5 5 16
Format:
XORI rt, rs, immediate
Description:
The 16-bit immediate is zero-extended and combined with the contents of
general register rs in a bit-wise logical exclusive OR operation.
The result is placed into general register rt.
Operation:
Exceptions:
None
28...26 Opcode
31...29 0 1 2 3 4 5 6 7
0 SPECIAL REGIMM J JAL BEQ BNE BLEZ BGTZ
1 ADDI ADDIU SLTI SLTIU ANDI ORI XORI LUI
2 COP0 COP1 COP2 * BEQL BNEL BLEZL BGTZL
3 DADDIε DADDIUε LDLε LDRε * * * *
4 LB LH LWL LW LBU LHU LWR LWUε
5 SB SH SWL SW SDLε SDRε SWR CACHE δ
6 LL LWC1 LWC2 * LLDε LDC1 LDC2 LDε
7 SC SWC1 SWC2 * SCDε SDC1 SDC2 SDε
18...16 REGIMM rt
20...19 0 1 2 3 4 5 6 7
0 BLTZ BGEZ BLTZL BGEZL * * * *
1 TGEI TGEIU TLTI TLTIU TEQI * TNEI *
2 BLTZAL BGEZAL BLTZALL BGEZALL * * * *
3 * * * * * * * *
23...21 COPz rs
25, 24 0 1 2 3 4 5 6 7
0 MF DMFε CF γ MT DMTε CT γ
1 BC γ γ γ γ γ γ γ
2 CO
3
18...16 COPz rt
20...19 0 1 2 3 4 5 6 7
BCF BCT BCFL BCTL γ γ γ γ
0
1 γ γ γ γ γ γ γ γ
γ γ γ γ γ γ γ γ
2
3 γ γ γ γ γ γ γ γ
CP0 Function
2 ... 0
5 ... 3 0 1 2 3 4 5 6 7
0 φ TLBR TLBWI φ φ φ TLBWR φ
1 TLBP φ φ φ φ φ φ φ
2 ξ φ φ φ φ φ φ φ
3 ERET χ φ φ φ φ φ φ φ
0 φ φ φ φ φ φ φ φ
1 φ φ φ φ φ φ φ φ
2 φ φ φ φ φ φ φ φ
3 φ φ φ φ φ φ φ φ
Key:
* Operation codes marked with an asterisk cause reserved
instruction exceptions in all current implementations and are
reserved for future versions of the architecture.
γ Operation codes marked with a gamma cause a reserved
instruction exception. They are reserved for future versions of the
architecture.
δ Operation codes marked with a delta are valid only for R4000
processors with CP0 enabled, and cause a reserved instruction
exception on other processors.
φ Operation codes marked with a phi are invalid but do not cause
reserved instruction exceptions in R4000 implementations.
ξ Operation codes marked with a xi cause a reserved instruction
exception on R4000 processors.
χ Operation codes marked with a chi are valid only on R4000.
ε Operation codes marked with epsilon are valid when the processor
is operating either in the Kernel mode or in the 64-bit non-Kernel
(User or Supervisor) mode. These instructions cause a reserved
instruction exception if 64-bit operation is not enabled in User or
Supervisor mode.
Source Format
Operation
Single Double Word Longword
ADD V V R R
SUB V V R R
MUL V V R R
DIV V V R R
SQRT V V R R
ABS V V R R
MOV V V
NEG V V R R
TRUNC.L V V
ROUND.L V V
CEIL.L V V
FLOOR.L V V
TRUNC.W V V
ROUND.W V V
CEIL.W V V
FLOOR.W V V
CVT.S V V V
CVT.D V V V
CVT.W V V
CVT.L V V
C V V R R
Floating-Point Operations
The floating-point unit operation set includes:
• floating-point add
• floating-point subtract
• floating-point multiply
• floating-point divide
• floating-point square root
• convert between fixed-point and floating-point formats
• convert between floating-point formats
• floating-point compare
These operations satisfy the requirements of IEEE Standard 754
requirements for accuracy. Specifically, these operations obtain a result
which is identical to an infinite-precision result rounded to the specified
format, using the current rounding mode.
Instructions must specify the format of their operands. Except for
conversion functions, mixed-format operations are not provided.
Example #1:
Example #2:
(immediate15)16 || immediate15...0
Function Meaning
Uses the TLB to find the physical address given the virtual
AddressTranslation address. The function fails and an exception is taken if the
required translation is not present in the TLB.
Uses the cache and main memory to find the contents of
the word containing the specified physical address. The
low-order two bits of the address and the Access Type field
LoadMemory indicates which of each of the four bytes within the data
word need to be returned. If the cache is enabled for this
access, the entire word is returned and loaded into the
cache.
Uses the cache, write buffer, and main memory to store the
word or part of word specified as data in the word
containing the specified physical address. The low-order
StoreMemory
two bits of the address and the Access Type field indicates
which of each of the four bytes within the data word
should be stored.
Figure B-1 shows the I-Type instruction format used by load and store
operations.
I-Type (Immediate)
31 26 25 21 20 16 15 0
op base ft offset
6 5 5 16
op is a 6-bit operation code
base is the 5-bit base register specifier
is a 5-bit source (for stores) or destination (for loads) FPA register
ft
specifier
offset is the 16-bit signed immediate offset
All coprocessor loads and stores reference aligned data items. Thus, for
word loads and stores, the access type field is always WORD, and the low-
order two bits of the address must always be zero.
For doubleword loads and stores, the access type field is always
DOUBLEWORD, and the low-order three bits of the address must always
be zero.
Regardless of byte-numbering order (endianness), the address specifies
that byte which has the smallest byte-address in the addressed field. For
a big-endian machine, this is the leftmost byte; for a little-endian machine,
this is the rightmost byte.
R-Type (Register)
31 26 25 21 20 16 15 11 10 6 5 0
6 5 5 5 5 6
COP1 is a 6-bit operation code
fmt is a 5-bit format specifier
fs is a 5-bit source1 register
ft is a 5-bit source2 register
fd is a 5-bit destination register
function is a 6-bit function field
Figure B-2 Computational Instruction Format
Code
Mnemonic Operation
(5: 0)
0 ADD Add
1 SUB Subtract
2 MUL Multiply
3 DIV Divide
4 SQRT Square root
5 ABS Absolute value
6 MOV Move
7 NEG Negate
Convert to 64-bit (long) fixed-point, rounded to nearest/
8 ROUND.L
even
9 TRUNC.L Convert to 64-bit (long) fixed-point, rounded toward zero
10 CEIL.L Convert to 64-bit (long) fixed-point, rounded to +∞
11 FLOOR.L Convert to 64-bit (long) fixed-point, rounded to -∞
12 ROUND.W Convert to single fixed-point, rounded to nearest/even
13 TRUNC.W Convert to single fixed-point, rounded toward zero
14 CEIL.W Convert to single fixed-point, rounded to + ∞
15 FLOOR.W Convert to single fixed-point, rounded to – ∞
16–31 – Reserved
32 CVT.S Convert to single floating-point
33 CVT.D Convert to double floating-point
34 – Reserved
35 – Reserved
36 CVT.W Convert to 32-bit binary fixed-point
37 CVT.L Convert to 64-bit (long) binary fixed-point
38–47 – Reserved
48–63 C Floating-point compare
In the following pages, the notation FGR refers to the 32 General Purpose
registers FGR0 through FGR31 of the FPU, and FPR refers to the floating-
point registers of the FPU.
• When the FR bit in the Status register (SR(26)) equals zero, only
the even floating-point registers are valid and the 32 General
Purpose registers of the FPU are 32-bits wide.
• When the FR bit in the Status register (SR(26)) equals one, both
odd and even floating-point registers may be used and the 32
General Purpose registers of the FPU are 64-bits wide.
The following routines are used in the description of the floating-point
operations to retrieve the value of an FPR or to change the value of an FGR:
value ← ValueFPR(fpr,fmt)
Floating-Point
ABS.fmt Absolute Value ABS.fmt
31 26 25 21 20 16 15 11 10 6 5 0
Format:
ABS.fmt fd, fs
Description:
The contents of the FPU register specified by fs are interpreted in the
specified format and the arithmetic absolute value is taken. The result is
placed in the floating-point register specified by fd.
The absolute value operation is arithmetic; a NaN operand signals invalid
operation.
This instruction is valid only for single- and double-precision floating-
point formats. The operation is not defined if bit 0 of any register
specification is set and the FR bit in the Status register equals zero, since
the register numbers specify an even-odd pair of adjacent coprocessor
general registers. When the FR bit in the Status register equals one, both
even and odd register numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Coprocessor exception trap
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Format:
ADD.fmt fd, fs, ft
Description:
The contents of the FPU registers specified by fs and ft are interpreted in
the specified format and arithmetically added. The result is rounded as if
calculated to infinite precision and then rounded to the specified format
(fmt), according to the current rounding mode. The result is placed in the
floating-point register (FPR) specified by fd.
This instruction is valid only for single- and double-precision floating-
point formats. The operation is not defined if bit 0 of any register
specification is set and the FR bit in the Status register equals zero, since
the register numbers specify an even-odd pair of adjacent coprocessor
general registers. When the FR bit in the Status register equals one, both
even and odd register numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Inexact exception
Overflow exception
Underflow exception
Format:
BC1F offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the result of the last floating-point compare is false
(zero), the program branches to the target address, with a delay of one
instruction.
There must be at least one instruction between C.cond.fmt and BC1F.
Operation:
Exceptions:
Coprocessor unusable exception
Format:
BC1FL offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the result of the last floating-point compare is false
(zero), the program branches to the target address, with a delay of one
instruction. If the conditional branch is not taken, the instruction in the
branch delay slot is nullified.
There must be at least one instruction between C.cond.fmt and BC1FL.
Operation:
32 T–1: condition ← not COC[1]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T–1: condition ← not COC[1]
T: target ← (offset15)46 || offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
Coprocessor unusable exception
Format:
BC1T offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the result of the last floating-point compare is true (one),
the program branches to the target address, with a delay of one
instruction.
There must be at least one instruction between C.cond.fmt and BC1T.
Operation:
32 T–1: condition ← COC[1]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
PC ← PC + target
endif
Exceptions:
Coprocessor unusable exception
Format:
BC1TL offset
Description:
A branch target address is computed from the sum of the address of the
instruction in the delay slot and the 16-bit offset, shifted left two bits and
sign-extended. If the result of the last floating-point compare is true (one),
the program branches to the target address, with a delay of one
instruction. If the conditional branch is not taken, the instruction in the
branch delay slot is nullified.
There must be at least one instruction between C.cond.fmt and BC1TL.
Operation:
32 T–1: condition ← COC[1]
T: target ← (offset15)14 || offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
64 T–1: condition ← COC[1]
T: target ← (offset15)46 || offset || 02
T+1: if condition then
PC ← PC + target
else
NullifyCurrentInstruction
endif
Exceptions:
Coprocessor unusable exception
Floating-Point
C.cond.fmt Compare C.cond.fmt
31 26 25 21 20 16 15 11 10 6 5 43 0
Format:
C.cond.fmt fs, ft
Description:
The contents of the floating-point registers specified by fs and ft are
interpreted in the specified format, fmt, and arithmetically compared.
A result is determined based on the comparison and the conditions
specified in the cond field. If one of the values is a Not a Number (NaN),
and the high-order bit of the cond field is set, an invalid operation
exception is taken. After a one-instruction delay, the condition is available
for testing with branch on floating-point coprocessor condition
instructions. There must be at least one instruction between the compare
and the branch.
Comparisons are exact and can neither overflow nor underflow. Four
mutually-exclusive relations are possible results: less than, equal, greater
than, and unordered. The last case arises when one or both of the
operands are NaN; every NaN compares unordered with everything,
including itself.
Comparisons ignore the sign of zero, so +0 = –0.
This instruction is valid only for single- and double-precision floating-
point formats. The operation is not defined if bit 0 of any register
specification is set and the FR bit in the Status register equals zero, since
the register numbers specify an even-odd pair of adjacent coprocessor
general registers. When the FR bit in the Status register equals one, both
even and odd register numbers are valid.
Floating-Point
C.cond.fmt Compare C.cond.fmt
(continued)
Operation:
Exceptions:
Coprocessor unusable
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Floating-Point
CEIL.L.fmt Ceiling to Long CEIL.L.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
CEIL.L.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the long
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round to +∞ (2).
This instruction is valid only for conversion from single- or double-
precision floating-point formats. When the FR bit in the Status register
equals one, both even and odd register numbers are valid.
When the source operand is an Infinity, NaN, or the correctly rounded
integer result is outside of –263 to 263– 1, the Invalid operation exception is
raised. If the Invalid operation is not enabled then no exception is taken
and 263–1 is returned.
Floating-Point
CEIL.L.fmt Ceiling to Long CEIL.L.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Floating-Point
CEIL.W.fmt Ceiling to Single CEIL.W.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
CEIL.W.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the single
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round to +∞ (2).
This instruction is valid only for conversion from a single- or double-
precision floating-point formats. The operation is not defined if bit 0 of
any register specification is set and the FR bit in the Status register equals
zero, since the register numbers specify an even-odd pair of adjacent
coprocessor general registers. When the FR bit in the Status register equals
one, both even and odd register numbers are valid.
When the source operand is an Infinity or NaN, or the correctly rounded
integer result is outside of –231 to 231– 1, the Invalid operation exception is
raised. If the Invalid operation is not enabled then no exception is taken
and 231–1 is returned.
Floating-Point
CEIL.W.fmt Ceiling to Single CEIL.W.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
COP1 CF rt fs 0
010001 00010 000 0000 0000
6 5 5 5 11
Format:
CFC1 rt, fs
Description:
The contents of the FPU control register fs are loaded into general register
rt.
This operation is only defined when fs equals 0 or 31.
The contents of general register rt are undefined for the instruction
immediately following CFC1.
Operation:
32 T: temp ← FCR[fs]
T+1: GPR[rt] ← temp
64 T: temp ← FCR[fs]
T+1: GPR[rt] ← (temp31)32 || temp
Exceptions:
Coprocessor unusable exception
COP1 CT rt fs 0
010001 00110 000 0000 0000
6 5 5 5 11
Format:
CTC1 rt, fs
Description:
The contents of general register rt are loaded into FPU control register fs.
This operation is only defined when fs equals 0 or 31.
Writing to Control Register 31, the floating-point Control/Status register,
causes an interrupt or exception if any cause bit and its corresponding
enable bit are both set. The register will be written before the exception
occurs. The contents of floating-point control register fs are undefined for
the instruction immediately following CTC1.
Operation:
32 T: temp ← GPR[rt]
T+1: FCR[fs] ← temp
COC[1] ← FCR[31]23
64 T: temp ← GPR[rt]31...0
T+1: FCR[fs] ← temp
COC[1] ← FCR[31]23
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Division by zero exception
Inexact exception
Overflow exception
Underflow exception
Floating-Point
CVT.D.fmt Convert to Double CVT.D.fmt
Floating-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
CVT.D.fmt fd, fs
Description:
The contents of the floating-point register specified by fs is interpreted in
the specified source format, fmt, and arithmetically converted to the
double binary floating-point format. The result is placed in the floating-
point register specified by fd.
This instruction is valid only for conversions from single floating-point
format, 32-bit or 64-bit fixed-point format.
If the single floating-point or single fixed-point format is specified, the
operation is exact. The operation is not defined if bit 0 of any register
specification is set and the FR bit in the Status register equals zero, since
the register numbers specify an even-odd pair of adjacent coprocessor
general registers. When the FR bit in the Status register equals one, both
even and odd register numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Underflow exception
Floating-Point
CVT.L.fmt Convert to Long CVT.L.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
CVT.L.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the long
fixed-point format. The result is placed in the floating-point register
specified by fd. This instruction is valid only for conversions from single-
or double-precision floating-point formats. The operation is not defined if
bit 0 of any register specification is set and the FR bit in the Status register
equals zero.
When the source operand is an Infinity, NaN, or the correctly rounded
integer result is outside of –263 to 263–1, the Invalid operation exception is
raised. If the Invalid operation is not enabled then no exception is taken
and 263–1 is returned.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Floating-Point
CVT.S.fmt Convert to Single CVT.S.fmt
Floating-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
CVT.S.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the single
binary floating-point format. The result is placed in the floating-point
register specified by fd. Rounding occurs according to the currently
specified rounding mode.
This instruction is valid only for conversions from double floating-point
format, or from 32-bit or 64-bit fixed-point format. The operation is not
defined if bit 0 of any register specification is set and the FR bit in the Status
register equals zero, since the register numbers specify an even-odd pair
of adjacent coprocessor general registers. When the FR bit in the Status
register equals one, both even and odd register numbers are valid.
Operation:
T: StoreFPR(fd, S, ConvertFmt(ValueFPR(fs, fmt), fmt, S))
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Underflow exception
Floating-Point
CVT.W.fmt Convert to CVT.W.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
CVT.W.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the single
fixed-point format. The result is placed in the floating-point register
specified by fd. This instruction is valid only for conversion from a single-
or double-precision floating-point formats. The operation is not defined if
bit 0 of any register specification is set and the FR bit in the Status register
equals zero, since the register numbers specify an even-odd pair of
adjacent coprocessor general registers. When the FR bit in the Status
register equals one, both even and odd register numbers are valid.
When the source operand is an Infinity or NaN, or the correctly rounded
integer result is outside of –231 to 231–1, an Invalid operation exception is
raised. If Invalid operation is not enabled, then no exception is taken and
231 –1 is returned.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Format:
DIV.fmt fd, fs, ft
Description:
The contents of the floating-point registers specified by fs and ft are
interpreted in the specified format and the value in the fs field is divided by
the value in the ft field. The result is rounded as if calculated to infinite
precision and then rounded to the specified format, according to the
current rounding mode. The result is placed in the floating-point register
specified by fd.
This instruction is valid for only single or double precision floating-point
formats.
The operation is not defined if bit 0 of any register specification is set and
the FR bit in the Status register equals zero, since the register numbers
specify an even-odd pair of adjacent coprocessor general registers. When
the FR bit in the Status register equals one, both even and odd register
numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception Invalid operation exception
Division-by-zero exception Inexact exception
Overflow exception Underflow exception
COP1 DMF rt fs 0
010001 00001 0 0 0 0 0 0 0 0 0 00
6 5 5 5 11
Format:
DMFC1 rt, fs
Description:
The contents of register fs from the floating-point coprocessor is stored
into processor register rt.
The contents of general register rt are undefined for the instruction
immediately following DMFC1.
The FR bit in the Status register specifies whether all 32 registers of the
R4000 are addressable. When FR equals zero, this instruction is not
defined when the least significant bit of fs is non-zero. When FR is set, fs
may specify either odd or even registers.
Operation:
64 T: if SR26 = 1 then /* 64-bit wide FGRs */
data ← FGR[fs]
elseif fs0 = 0 then /* valid specifier, 32-bit wide FGRs */
data ← FGR[fs+1] || FGR[fs]
else /* undefined for odd 32-bit reg #s */
data ← undefined64
endif
T+1: GPR[rt] ← data
Exceptions:
Coprocessor unusable exception
Coprocessor Exceptions:
Unimplemented operation exception
Doubleword Move To
DMTC1 Floating-Point Coprocessor DMTC1
31 26 25 21 20 16 15 11 10 0
COP1 DMT rt fs 0
010001 00101 0 0 0 0 0 0 0 0 0 00
6 5 5 5 11
Format:
DMTC1 rt, fs
Description:
The contents of general register rt are loaded into coprocessor register fs of
the CP1.
The contents of floating-point register fs are undefined for the instruction
immediately following DMTC1.
The FR bit in the Status register specifies whether all 32 registers of the
R4000 are addressable. When FR equals zero, this instruction is not
defined when the least significant bit of fs is non-zero. When FR equals
one, fs may specify either odd or even registers.
Operation:
64 T: data ← GPR[rt]
T+1: if SR26 = 1 then /* 64-bit wide FGRs */
FGR[fs] ← data
elseif fs0 = 0 then /*valid specifier, 32-bit wide valid FGRs */
FGR[fs+1] ← data63...32
FGR[fs] ← data31...0
else /* undefined result for odd 32-bit reg #s */
undefined_result
endif
Exceptions:
Coprocessor unusable exception
Coprocessor Exceptions:
Unimplemented operation exception
Floating-Point
FLOOR.L.fmt Floor to Long FLOOR.L.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
FLOOR.L.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the long
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round to -∞ (3).
This instruction is valid only for conversion from single- or double-
precision floating-point formats.
When the source operand is an Infinity, NaN, or the correctly rounded
integer result is outside of –263 to 263– 1, the Invalid operation exception is
raised. If the Invalid operation is not enabled then no exception is taken
and 263–1 is returned.
Floating-Point
FLOOR.L.fmt Floor to Long FLOOR.L.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Floating-Point
FLOOR.W.fmt Floor to Single FLOOR.W.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
FLOOR.W.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the single
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round to –∞ (RM = 3).
This instruction is valid only for conversion from a single- or double-
precision floating-point formats. The operation is not defined if bit 0 of
any register specification is set and the FR bit in the Status register equals
zero, since the register numbers specify an even-odd pair of adjacent
coprocessor general registers. When the FR bit in the Status register equals
one, both even and odd register numbers are valid.
When the source operand is an Infinity or NaN, or the correctly rounded
integer result is outside of –231 to 231–1, an Invalid operation exception is
raised. If Invalid operation is not enabled, then no exception is taken and
231–1 is returned.
Floating-Point
FLOOR.W.fmt Floor to Single FLOOR.W.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Format:
LDC1 ft, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form an unsigned effective address.
In 32-bit mode, the contents of the doubleword at the memory location
specified by the effective address is loaded into registers ft and ft+1 of the
floating-point coprocessor. This instruction is not valid, and is undefined,
when the least significant bit of ft is non-zero.
In 64-bit mode, the contents of the doubleword at the memory location
specified by the effective address are loaded into the 64-bit register ft of the
floating point coprocessor.
The FR bit of the Status register (SR26) specifies whether all 32 registers of
the R4000 are addressable. If FR equals zero, this instruction is not defined
when the least significant bit of ft is non-zero. If FR equals one, ft may
specify either odd or even registers.
If any of the three least-significant bits of the effective address are non-
zero, an address error exception takes place.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
Exceptions:
Coprocessor unusable
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
Format:
LWC1 ft, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form an unsigned effective address. The contents of the
word at the memory location specified by the effective address is loaded
into register ft of the floating-point coprocessor.
The FR bit of the Status register specifies whether all 64-bit Floating-Point
registers are addressable. If FR equals zero, LWC1 loads either the high or
low half of the 16 even Floating-Point registers. If FR equals one, LWC1
loads the low 32-bits of both even and odd Floating-Point registers.
If either of the two least-significant bits of the effective address is non-zero,
an address error exception occurs.
Operation:
Exceptions:
Coprocessor unusable
TLB refill exception
TLB invalid exception
Bus error exception
Address error exception
COP1 MF rt fs 0
010001 00000 000 0000 0000
6 5 5 5 11
Format:
MFC1 rt, fs
Description:
The contents of register fs from the floating-point coprocessor are stored
into processor register rt.
The contents of register rt are undefined for the instruction immediately
following MFC1.
The FR bit of the Status register specifies whether all 32 registers of the
R4000 are addressable. If FR equals zero, MFC1 stores either the high or
low half of the 16 even Floating-Point registers. If FR equals one, MFC1
stores the low 32-bits of both even and odd Floating-Point registers.
Operation:
32 T: data ← FGR[fs]31...0
T+1: GPR[rt] ← data
64 T: data ← FGR[fs]31...0
T+1: GPR[rt] ← (data31)32 || data
Exceptions:
Coprocessor unusable exception
Format:
MOV.fmt fd, fs
Description:
The contents of the FPU register specified by fs are interpreted in the
specified format and are copied into the FPU register specified by fd.
The move operation is non-arithmetic; no IEEE 754 exceptions occur as a
result of the instruction.
This instruction is valid only for single- or double-precision floating-point
formats.
The operation is not defined if bit 0 of any register specification is set and
the FR bit in the Status register equals zero, since the register numbers
specify an even-odd pair of adjacent coprocessor general registers. When
the FR bit in the Status register equals one, both even and odd register
numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Move To FPU
MTC1 (Coprocessor 1) MTC1
31 26 25 21 20 16 15 11 10 0
COP1 MT rt fs 0
010001 00100 000 0000 0000
6 5 5 5 11
Format:
MTC1 rt, fs
Description:
The contents of register rt are loaded into the FPU general register at
location fs.
The contents of floating-point register fs is undefined for the instruction
immediately following MTC1.
The FR bit of the Status register specifies whether all 32 registers of the
R4000 are addressable. If FR equals zero, MTC1 loads either the high or
low half of the 16 even Floating-Point registers. If FR equals one, MTC1
loads the low 32-bits of both even and odd Floating-Point registers.
Operation:
Exceptions:
Coprocessor unusable exception
Format:
MUL.fmt fd, fs, ft
Description:
The contents of the floating-point registers specified by fs and ft are
interpreted in the specified format and arithmetically multiplied. The
result is rounded as if calculated to infinite precision and then rounded to
the specified format, according to the current rounding mode. The result
is placed in the floating-point register specified by fd.
This instruction is valid only for single- or double-precision floating-point
formats.
The operation is not defined if bit 0 of any register specification is set and
the FR bit in the Status register equals zero, since the register numbers
specify an even-odd pair of adjacent coprocessor general registers. When
the FR bit in the Status register equals one, both even and odd register
numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Inexact exception
Overflow exception
Underflow exception
Format:
NEG.fmt fd, fs
Description:
The contents of the FPU register specified by fs are interpreted in the
specified format and the arithmetic negation is taken (polarity of the sign-
bit is changed). The result is placed in the FPU register specified by fd.
The negate operation is arithmetic; an NaN operand signals invalid
operation.
This instruction is valid only for single- or double-precision floating-point
formats. The operation is not defined if bit 0 of any register specification
is set and the FR bit in the Status register equals zero, since the register
numbers specify an even-odd pair of adjacent coprocessor general
registers. When the FR bit in the Status register equals one, both even and
odd register numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Floating-Point
ROUND.L.fmt Round to Long ROUND.L.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
ROUND.L.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the long
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round to nearest/even (0).
This instruction is valid only for conversion from single- or double-
precision floating-point formats.
When the source operand is an Infinity, NaN, or the correctly rounded
integer result is outside of –263 to 263– 1, the Invalid operation exception is
raised. If the Invalid operation is not enabled then no exception is taken
and 263 –1 is returned.
Floating-Point
ROUND.L.fmt Round to Long ROUND.L.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
ROUND.W.fmt Round
Floating-Point
to Single
ROUND.W.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
ROUND.W.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the single
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round to the nearest/even
(RM = 0).
This instruction is valid only for conversion from a single- or double-
precision floating-point formats. The operation is not defined if bit 0 of
any register specification is set and the FR bit in the Status register equals
zero, since the register numbers specify an even-odd pair of adjacent
coprocessor general registers. When the FR bit in the Status register equals
one, both even and odd register numbers are valid.
When the source operand is an Infinity or NaN, or the correctly rounded
integer result is outside of –231 to 231 –1, an Invalid operation exception is
raised. If Invalid operation is not enabled, then no exception is taken and
231 –1 is returned.
ROUND.W.fmt Round
Floating-Point
to Single
ROUND.W.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Format:
SDC1 ft, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form an unsigned effective address.
In 32-bit mode, the contents of registers ft and ft+1 from the floating-point
coprocessor are stored at the memory location specified by the effective
address. This instruction is not valid, and is undefined, when the least
significant bit of ft is non-zero.
In 64-bit mode, the 64-bit register ft is stored to the contents of the
doubleword at the memory location specified by the effective address.
The FR bit of the Status register (SR26) specifies whether all 32 registers of
the R4000 are addressable. When FR equals zero, this instruction is not
defined if the least significant bit of ft is non-zero. If FR equals one, ft may
specify either odd or even registers.
If any of the three least-significant bits of the effective address are non-
zero, an address error exception takes place.
Operation:
32 T: vAddr ← (offset15)16 || offset15...0) + GPR[base]
64 T: vAddr ← (offset15)48 || offset15...0) + GPR[base]
Exceptions:
Coprocessor unusable
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Floating-Point
SQRT.fmt Square Root SQRT.fmt
31 26 25 21 20 16 15 11 10 6 5 0
Format:
SQRT.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified format and the positive arithmetic square root is taken. The
result is rounded as if calculated to infinite precision and then rounded to
the specified format, according to the current rounding mode. If the value
of fs corresponds to –0, the result will be –0. The result is placed in the
floating-point register specified by fd.
This instruction is valid only for single- or double-precision floating-point
formats.
The operation is not defined if bit 0 of any register specification is set and
the FR bit in the Status register equals zero, since the register numbers
specify an even-odd pair of adjacent coprocessor general registers. When
the FR bit in the Status register equals one, both even and odd register
numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Inexact exception
Format:
SUB.fmt fd, fs, ft
Description:
The contents of the floating-point registers specified by fs and ft are
interpreted in the specified format and the value in the ft field is subtracted
from the value in the fs field. The result is rounded as if calculated to
infinite precision and then rounded to the specified format, according to
the current rounding mode. The result is placed in the floating-point
register specified by fd. This instruction is valid only for single- or double-
precision floating-point formats.
The operation is not defined if bit 0 of any register specification is set and
the FR bit in the Status register equals zero, since the register numbers
specify an even-odd pair of adjacent coprocessor general registers. When
the FR bit in the Status register equals one, both even and odd register
numbers are valid.
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Unimplemented operation exception
Invalid operation exception
Inexact exception
Overflow exception
Underflow exception
Format:
SWC1 ft, offset(base)
Description:
The 16-bit offset is sign-extended and added to the contents of general
register base to form an unsigned effective address. The contents of
register ft from the floating-point coprocessor are stored at the memory
location specified by the effective address.
The FR bit of the Status register specifies whether all 64-bit floating-point
registers are addressable.
If FR equals zero, SWC1 stores either the high or low half of the 16 even
floating-point registers.
If FR equals one, SWC1 stores the low 32-bits of both even and odd
floating-point registers.
If either of the two least-significant bits of the effective address are non-
zero, an address error exception occurs.
Operation:
32 T: vAddr ← ((offset15)16 || offset15...0) + GPR[base]
64 T: vAddr ← ((offset15)48 || offset15...0) + GPR[base]
Exceptions:
Coprocessor unusable
TLB refill exception
TLB invalid exception
TLB modification exception
Bus error exception
Address error exception
Floating-Point
TRUNC.L.fmt Truncate to Long TRUNC.L.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
TRUNC.L.fmt fd, fs
Description:
The contents of the floating-point register specified by fs are interpreted in
the specified source format, fmt, and arithmetically converted to the long
fixed-point format. The result is placed in the floating-point register
specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round toward zero (1).
This instruction is valid only for conversion from single- or double-
precision floating-point formats.
When the source operand is an Infinity, NaN, or the correctly rounded
integer result is outside of –263 to 263–1, the Invalid operation exception is
raised. If the Invalid operation is not enabled then no exception is taken
and 263–1 is returned.
Floating-Point
TRUNC.L.fmt Truncate to Long TRUNC.L.fmt
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Floating-Point
TRUNC.W.fmt Truncate to Single TRUNC.W.fmt
Fixed-Point Format
31 26 25 21 20 16 15 11 10 6 5 0
Format:
TRUNC.W.fmt fd, fs
Description:
The contents of the FPU register specified by fs are interpreted in the
specified source format fmt and arithmetically converted to the single
fixed-point format. The result is placed in the FPU register specified by fd.
Regardless of the setting of the current rounding mode, the conversion is
rounded as if the current rounding mode is round toward zero (RM = 1).
This instruction is valid only for conversion from a single- or double-
precision floating-point formats. The operation is not defined if bit 0 of
any register specification is set and the FR bit in the Status register equals
zero, since the register numbers specify an even-odd pair of adjacent
coprocessor general registers. When the FR bit in the Status register equals
one, both even and odd register numbers are valid.
When the source operand is an Infinity or NaN, or the correctly rounded
integer result is outside of –231 to 231–1, an Invalid operation exception is
raised. If Invalid operation is not enabled, then no exception is taken and
–231 is returned.
TRUNC.W.fmt Truncate
Floating-Point TRUNC.W.fmt
to Single
Fixed-Point Format
(continued)
Operation:
Exceptions:
Coprocessor unusable exception
Floating-Point exception
Coprocessor Exceptions:
Invalid operation exception
Unimplemented operation exception
Inexact exception
Overflow exception
Opcode
28...26
31...29 0 1 2 3 4 5 6 7
0
1
2 COP1
3
4
5
6 LWC1 LDC1
7 SWC1 SDC1
sub
23...21
25...24 0 1 2 3 4 5 6 7
0 MF DMFη CF δ MT DMTη CT δ
1 BC δ δ δ δ δ δ δ
2 S D δ δ W Lη δ δ
3 δ δ δ δ δ δ δ δ
18...16 br
20...19 0 1 2 3 4 5 6 7
0 BCF BCT BCFL BCTL γ γ γ γ
1 γ γ γ γ γ γ γ γ
2 γ γ γ γ γ γ γ γ
3 γ γ γ γ γ γ γ γ
2...0 function
5...3 0 1 2 3 4 5 6 7
0 ADD SUBMUL DIV SQRT ABS MOV NEG
1 ROUND.Lη TRUNC.Lη CEIL.Lη FLOOR.Lη ROUND.W TRUNC.W CEIL.W FLOOR.W
2 δ δ δ δ δ δ δ δ
3 δ δ δ δ δ δ δ δ
4 CVT.S CVT.D δ δ CVT.W CVT.Lη δ δ
5 δ δ δ δ δ δ δ δ
6 C.F C.UN C.EQ C.UEQ C.OLT C.ULT C.OLE C.ULE
7 C.SF C.NGLE C.SEQ C.NGL C.LT C.NGE C.LE C.NGT
Key:
γ Operation codes marked with a gamma cause a reserved
instruction exception. They are reserved for future versions of the
architecture.
δ Operation codes marked with a delta cause unimplemented
operation exceptions in all current implementations and are
reserved for future versions of the architecture.
η Operation codes marked with an eta are valid only when MIPS III
instructions are enabled. Any attempt to execute these without
MIPS III instructions enabled causes an unimplemented operation
exception.
Byte 0
Byte 3 Byte 5
taken first
taken fourth taken sixth
Byte 1
Byte 4 Byte 6
taken second Byte 2 Byte 7
taken fifth taken seventh
taken third taken last
hexword (block)
octalword
quadword
Order of retrieval 2 3 0 1 6 7 4 5
DW0
DW 3 DW5
taken third
taken second taken eighth
DW1
DW4 DW6
taken fourth DW2 DW7
taken seventh taken fifth
taken first taken sixth
Using the subblock ordering shown in Figure C-2, the doubleword at the
target address is retrieved first (DW2), followed by the remaining
doubleword (DW3) in this quadword.
Next, the quadword that fills out the octalword are retrieved in the same
order as the prior quadword (in this case DW0 is followed by DW 1). This
is followed by the remaining octalword (DW8, DW7, DW4, DW5), that fills
out the hexword.
It may be easier way to understand subblock ordering by taking a look at
the method used for generating the address of each doubleword as it is
retrieved. The subblock ordering logic generates this address by
executing a bit-wise exclusive-OR (XOR) of the starting block address with
the output of a binary counter that increments with each doubleword,
starting at doubleword zero (0002).
Using this scheme, Tables C-1 through Table C-3 list the subblock ordering
of doublewords for a 32-word block, based on three different starting-
block addresses: 00102, 10112, and 01012. The subblock ordering is
generated by an XOR of the subblock address (either 00102, 10112, and
01012) with the binary count of the doubleword (00002 through 11112).
Thus, the eighth doubleword retrieved from a block of data with a starting
address of 00102 is found by taking the XOR of address 00102 with the
binary count of DW8, 01112. The result is 01012, or DW5 (shown in Table
C-1).
The remaining tables illustrate this method of subblock ordering, using
various address permutations.
Table C-1 Sequence of Doublewords Transferred Using Subblock Ordering: Address 00102
Table C-2 Sequence of Doublewords Transferred Using Subblock Ordering: Address 10112
Table C-3 Sequence of Doublewords Transferred Using Subblock Ordering: Address 01012
For situations where the jitter associated with the operation of the ∆i/∆t
control mechanism cannot be tolerated and where the variation in
temperature and supply voltage after ColdReset* is expected to be small,
the ∆i/∆t control mechanism can be instructed to lock during ColdReset*
and thereafter retain its control values. The EnblDPLLR mode bit is set
and EnblDPLL is cleared for this mode of operation.
In addition, if both the EnblDPLL and EnblDPLLR mode bits are cleared,
the speed of the output buffers are set by the InitP(3:0) and InitN(3:0)
mode bits.
CPU Board
b
The longest trace from an
a R4000 output driver to a
receiving device
c
R4000 d
IO_Out IO_In
Length = L/2
C Load = C
“Incident Wave” Trace
L=a+b+c+d
C = Total Capacitance Loading
of the worst case trace
Figure D-1 O_In/IO_Out Board Trace
The Phase Locked Loop circuit requires several passive components for
proper operation, which are connected to PLLCap0, PLLCap1, VccP, and
VssP, as illustrated in Figure E-1.
In addition, the capacitors for PLLCap0 (Cp) and PLLCap1 (Cp) can be
connected to either VssP (as shown), VccP, or one to VssP and one to
VccP. Note that C2 and the Cp capacitors are incorporated into both the
179PGA and 447PGA package designs as surface-mounted chip
capacitors.
PLLCap1 Vcc
R L
Cp
VccP
%1
R4000 C2 C1 C3 C1, C3,
Rs and Ls
Cp are Board
Caps
VssP
%2
R L
PLLCap0 Vss
Figure E-2 shows a top view of the 179-pin package with capacitors.
x C2 x
%1 %2
Figure E-3 shows a top view of the 447-pin package with chip capacitors.
x x
x %1 C2 %2 x
It is essential to isolate the analog power and ground for the PLL circuit
(VccP/VssP) from the regular power and ground (Vcc/Vss). Initial
evaluations have yielded good results with the following values:
R = 5 ohms C1 = 1 nF C2 = 82 nF
C3 = 10 µF Cp = 470 pF
Since the optimum values for the filter components depend upon the
application and the system noise environment, these values should be
considered as starting points for further experimentation within your
specific application. In addition, the chokes (inductors: L) can be
considered for use as an alternative to the resistors (R) for use in filtering
the power supply.
The contents of the System Coprocessor registers and the TLB affect the
operation of the processor in many ways. For instance, an instruction that
changes CP0 data also affects subsequent instructions that use the data.
In the CPU, general registers are interlocked and the result of an
instruction can generally be used by the next instruction; if the result is not
available right away, the processor stalls until it is available. CP0 registers
and the TLB are not interlocked, however; there may be some delay before
a value written by one instruction is available to following instructions.
There is a required-data dependence between an instruction that changes a
register or TLB entry (a writer) and the next instruction that uses it (a user).
(A writer can write multiple data items, forming multiple writer/user
pairs.) The writer/user instruction pair places a hazard on the data if there
must be a delay between the time the writer instruction writes the data,
and the user instruction can use the data.
In addition to instructions, events can be writers and users of CP0
information. For instance, an exception writes information to CP0
registers and events that occur for every instruction, like an instruction
TLBWR/
TLBWI
→ TLBP TLB entry 3 8-(4+1)
TLBWR/
TLBWI
→ load/store using new TLB
entry
TLB entry 3 8-(4+1)
TLBWR/
TLBWI
→ I-fetch using new TLB
entry
TLB entry 5 8-(2+1)
MTCO
Status[CU]
→ Coprocessor instruction
needs CU set
Status[CU] 4 7-(2+1)
†. You cannot depend on a delay in effect if the instruction execution order is changed by exceptions.
In this case, for example, the minimum delay for IE to be effective is the maximum delay before a
pending, enabled interrupt can occur.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
V • • • • • • • • • • • • • • • • • • V
U • • • • • • • • • • • • • • • • • • U
T • • • • • • • • • • • • • • • • • • T
R • • • • • • • • • • • • • • • • • • R
P • • • • • • • • • • • • • • • • • • P
N • • • • • • • • • • • • • • • • • • N
M • • • • • • • • • • • • • • • • • • M
L • • • • • • • • • • • • • • • • • • L
R4000 PC Pinout
K • • • • • • • • • • • • • • • • • • K
x
J • • • • • • • • • • • • • • • • • • J
Bottom
H • • • • • • • • • • • • • • • • • • H
G • • • • • • • • • • • • • • • • • • G
F • • • • • • • • • • • • • • • • • • F
E • • • • • • • • • • • • • • • • • • E
D • • • • • • • • • • • • • • • • • • D
C • • • • • • • • • • • • • • • • • • C
B • • • • • • • • • • • • • • • • • • B
A • • • • • • • • • • • • • • • • • A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
AW AU AR AN AL AJ AG AE AC AA W U R N L J G E C A
AV AT AP AM AK AH AF AD AB Y V T P M K H F D B
39 • • • • • • • • • • • • • • • • • • • • 39
38 • • • • • • • • • • • • • • • • • • • 38
37 • • • • • • • • • • • • • • • • • • • • 37
36 • • • • • • • • • • • • • • • • • • • 36
35 • • • • • • • • • • • • • • • • • • • • 35
34 • • • • • • • • • • • • • • • • • • • 34
33 • • • • • • • • • • • • • • • • • • • • 33
32 • • • • • • • • • • • • • • • • • • • 32
31 • • • • • • • • • • • • • • • • • • • • 31
30 • • • • • • • • • • • • • • • • • • • 30
29 • • • • • • • • • • • • • • • • • • • • 29
28 • • • • • • • • • • • • • • • • • • • 28
27 • • • • • • • • • • • • • • • • • • • • 27
26 • • • • • • • • • • • • • • • • • • • 26
25 • • • • • • • • • • • • • • • • • • • • 25
24 • • • • • • • • • • • • • • • • • • • 24
23 • • • • • • • • • • • • • • • • • • • • 23
22 • • • • • • • • • • • • • • • • • • • 22
21 • • • • • • • • • • • • • • • • • • • • 21
20 • • • • • • R4000
• • MC/SC
• • 447
• Pinout
• • • • • • • • 20
19 • • • • • • • • • • • • • • • • • • • • 19
18 • • • • • • • • (bottom)
• • • • • • • • • • • 18
17 • • • • • • • • • • • • • • • • • • • • 17
16 • • • • • • • • • • • • • • • • • • • 16
15 • • • • • • • • • • • • • • • • • • • • 15
14 • • • • • • • • • • • • • • • • • • • 14
13 • • • • • • • • • • • • • • • • • • • • 13
12 • • • • • • • • • • • • • • • • • • • 12
11 • • • • • • • • • • • • • • • • • • • • 11
10 • • • • • • • • • • • • • • • • • • • 10
9 • • • • • • • • • • • • • • • • • • • • 9
8 • • • • • • • • • • • • • • • • • • • 8
7 • • • • • • • • • • • • • • • • • • • • 7
6 • • • • • • • • • • • • • • • • • • • 6
5 • • • • • • • • • • • • • • • • • • • • 5
4 • • • • • • • • • • • • • • • • • • • 4
3 • • • • • • • • • • • • • • • • • • • • 3
2 • • • • • • • • • • • • • • • • • • • 2
1 • • • • • • • • • • • • • • • • • • • 1
AW AU AR AN AL AJ AG AE AC AA W U R N L J G E C A
AV AT AP AM AK AH AF AD AB Y V T P M K H F D B
A
Numerics address acceleration 58
32-bit Address Error exception 127
addressing 109 address prediction 58
applications 9 address space identifier (ASID) 64
data format 24 address spaces
instructions 36 32-bit translation of 65
operands, in 64-bit mode 39 64-bit translation of 66
operations 6, 67 address space identifier (ASID) 64
single-precision FP format 164 physical 64
virtual-to-physical-address virtual 63
translation 65 virtual-to-physical translation of 64
32-bit mode addressing
address space 31 and data formats 24
address translation 65, 95 big-endian 24
addresses 63 Kernel mode 73
FPU operations 153 little-endian 24
TLB entry format 81 misaligned data 26
4th Floor Supervisor mode 69
B-dorm. See Alco Hall User mode 67
64-bit virtual address translation 95
addressing 109 See also address spaces
ALU 9 Alco Hall vs. Acid. See game, softball
bus, address and data 201 application software, compatibility with
data format 24 MIPS R2000, R3000, and R6000
double-precision FP format 164 processors 6
floating-point registers 156 architecture
FPU 9 64-bit 9
internal data path widths 381 superpipeline 11
operations 6, 39, 67 array, page table entry (PTE) 102
System interface 11 ASID. See address space identifier
virtual-to-physical-address
translation 66
J L
Joint Test Action Group (JTAG) interface language suite approach, benefits of 5
boundary scanning, explanation of latency
390 determining 363
operation 400 external read response 363
registers external response 361, 363
Boundary-scan 394 fault detection 435
Bypass 393 FPU instruction 181
Instruction 392 FPU operation 173
latency (cont.) N
intervention response 363 No Operation (NOP) instructions 59
release 361, 362 Nonmaskable Interrupt (NMI) 402
snoop response 363 Nonmaskable Interrupt (NMI) exception
line ownership, cache 258 handling 150
line size process 121
primary data cache 250
primary instruction cache 249
O
secondary cache 252
little-endian, byte addressing 24, 170 on-chip primary caches 33, 246
load delay 48, 169 operating modes 32
load delay slot 37 Kernel mode 73
load instructions, CPU Supervisor mode 69
defining access types 37 User mode 67
delayed load instruction 37 Overflow exception 194
overview 15
scheduling a load delay slot 37 P
load instructions, FPU 169 page table entry (PTE) array 102
Load Linked Address (LLAddr) register PageMask register 81, 87
93 parameters, system timing 233
parity check matrix 425
M parity error checking 408
Master/Checker mode, of R4400 430 performance
MC68000, compatibility with 24 address acceleration 58
memory management address prediction 58
address spaces 63 of uncached stores 59
addressing 31 physical address space 64
memory management unit (MMU) 61 pipeline, CPU
register numbers 84 back-up 54
registers. See registers, CPU, memory branch delay 48
management correctness considerations 58
System Control Coprocessor (CP0) 80 decision whether to advance 57
memory organization, hierarchy 244 exception conditions 52
MIPS RISCompilers, language suite 5 external stalls 53
MIPS R-Series processors, instructions load delay 48
common to 16–23 operation 44
move instructions, FPU 169 overrun 53
multiply registers, CPU 13 performance considerations 58
slip conditions 53
V
virtual address space 63
Virtual Coherency exception 133
virtual memory
and the TLB 62
hits and misses 62
mapping 31
multiple matches 62
virtual address translation 95
W
warm reset 208, 214, 217
Watch exception 142
WatchHi register 113
WatchLo register 113
Wired register 86, 88
Write Back (WB) 47