Updates To E500mc Core Reference Manual, Rev. 3, As of 2015-07-09
Updates To E500mc Core Reference Manual, Rev. 3, As of 2015-07-09
Updates To E500mc Core Reference Manual, Rev. 3, As of 2015-07-09
3, as of 2015-07-09
This section provides updates to the e500mc Core Reference Manual, Rev 3. We are providing known
corrections, but do not guarantee that the list is exhaustive. For convenience, the section number and page
number of the item in the reference manual are provided.
Note: This PDF file contains updates embedded as inline sticky notes; use the provided links and scroll to the inline location.
Future versions of the document will incorporate the sticky notes into the source text of the document. Freescale
recommends viewing this file with Adobe Acrobat Reader.
Supports
e500mc (all revisions)
e500mcRM
Rev. 3
03/2013
How to Reach Us: Information in this document is provided solely to enable system and software
implementers to use Freescale products. There are no express or implied copyright
Home Page:
freescale.com licenses granted hereunder to design or fabricate any integrated circuits based on the
Freescale, the Freescale logo, AltiVec, C-5, CodeTest, CodeWarrior, ColdFire, C-Ware,
Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert,
QorIQ, Qorivva, StarCore, Symphony, and VortiQa are trademarks of Freescale
Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, ColdFire+,
CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC
Engine, Ready Play, SafeAssure, SMARTMOS, TurboLink, Vybrid, and Xtrinsic are
trademarks of Freescale Semiconductor, Inc. All other product or service names are
the property of their respective owners. The Power Architecture and Power.org word
marks and the Power and Power.org logos and related marks are trademarks and
service marks licensed by Power.org.
© 2010-2013 Freescale Semiconductor, Inc.
Chapter 1
e500mc Overview
Chapter 2
Register Model
Chapter 3
Instruction Model
Chapter 5
Core Caches and Memory Subsystem
Chapter 6
Memory Management Units (MMUs)
Chapter 8
Power Management
Chapter 9
Debug and Performance Monitor Facilities
Chapter 10
Execution Timing
Chapter 11
Core Software Initialization Requirements
11.1 Core State and Suggested Software Initialization After Reset ...................................... 11-1
11.2 MMU State .................................................................................................................... 11-1
11.3 Register State ................................................................................................................. 11-1
11.3.1 GPRs .......................................................................................................................... 11-1
11.3.2 FPRs........................................................................................................................... 11-2
11.3.3 SPRs........................................................................................................................... 11-2
11.3.4 MSR and FPSCR ....................................................................................................... 11-3
11.4 Timer State..................................................................................................................... 11-3
11.5 L1 Cache State ............................................................................................................... 11-4
11.6 L2 Cache State ............................................................................................................... 11-5
11.7 Branch Target Buffer State............................................................................................. 11-5
Appendix A
Revision History
Appendix B
Simplified Mnemonics
B.1 Overview..........................................................................................................................B-1
Audience
It is assumed that the reader understands operating systems, microprocessor system design, and the basic
principles of RISC processing and has access to the EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors and Power ISA™ Version 2.06.
Suggested Reading
This section lists additional reading that provides background for the information in this manual as well as
general information about the architecture.
General Information
The following documentation is available from Power.org from their website http://www.power.org:
• Power ISA™ Version 2.06B, July 2010
The following documentation, published by Morgan-Kaufmann Publishers, 340 Pine Street, Sixth Floor,
San Francisco, CA, provides useful information about computer architecture in general:
• Computer Architecture: A Quantitative Approach, Third Edition, by John L. Hennessy and David
A. Patterson
• Computer Organization and Design: The Hardware/Software Interface, Second Edition, David A.
Patterson and John L. Hennessy
Related Documentation
Freescale documentation is available from the sources listed on the back cover of this manual. The
document order numbers are included in parentheses for ease in ordering:
• EREF: A Programmer’s Reference Manual for Freescale Power Architecture® Processors: A
Programmer’s Reference Manual for Freescale Embedded Processors
Conventions
This document uses the following notational conventions:
cleared/set When a bit takes the value zero, it is said to be cleared; when it takes a value of
one, it is said to be set.
mnemonics Instruction mnemonics are shown in lowercase bold.
italics Italics indicate variable command parameters, for example, bcctrx.
Book titles in text are set in italics
Internal signals are set in italics, for example, qual BG
0x0 Prefix to denote hexadecimal number
0b0 Prefix to denote binary number
rA, rB, rS Instruction syntax used to identify a source GPR
rD Instruction syntax used to identify a destination GPR
frA, frB, frC Instruction syntax used to identify a source FPR
frD Instruction syntax used to identify a destination FPR
REG[FIELD] Abbreviations for registers are shown in uppercase text. Specific bits, fields, or
ranges appear in brackets. For example, MSR[PR] refers to the privilege mode bit
in the machine state register.
x:y A bit range from bit x to bit y inclusive.
Terminology Conventions
Table i lists certain terms used in this manual that differ from the architecture terminology conventions.
Table i. Terminology Conventions
1.1 Overview
The e500mc core is a low-power implementation of the resources for embedded processors defined by the
Power ISA™. The core is a 32-bit implementation and implements 32 32-bit general-purpose registers;
however it supports accesses to 36-bit physical addresses. The block diagram in Figure 1-1 shows how the
e500mc functional units operate independently and in parallel. Note that this conceptual diagram does not
attempt to show how these features are implemented physically.
The e500mc is a superscalar processor that can issue two instructions and complete two instructions per
clock cycle. Instructions complete in order, but can execute out of order. Execution results are available to
subsequent instructions through the rename buffers, but those results are recorded into architected registers
in program order, maintaining a precise exception model.
The processor core integrates two simple instruction units (SFX0, SFX1), a multiple-cycle instruction unit
(MU), a branch unit (BU), a floating-point unit (FPU), and a load/store unit (LSU).
The LSU supports 32-bit integer and 64-bit floating-point operands.
The ability to execute six instructions in parallel and the use of simple instructions with short execution
times yield high efficiency and throughput. Most integer instructions execute in one clock cycle.
The core includes on-chip first-level instruction and data memory management units (MMUs) and an
on-chip second-level unified MMU.
• The first-level MMUs for both instruction and data translation are each composed of two
subarrays: an 8-entry fully-associative array of translation look-aside buffer (TLB) entries for
variable-sized pages, and a 64-entry 4-way set-associative array of TLB entries for fixed sized
pages that provide virtual to physical memory address translation for variable-sized pages and
demand-paged fixed pages respectively. These arrays are maintained entirely by the hardware with
a true least-recently-used (LRU) algorithm, and are a cache of the second level MMU.
• The second-level MMU contains a 64-entry, fully-associative unified (instruction and data) TLB
array that provides support for variable-sized pages. It also contains a 512-entry, 4-way
set-associative unified TLB for 4-Kbyte page size support. These second-level TLBs are
maintained completely by the software.
The e500mc includes independent on-chip, 32-Kbyte, eight-way set-associative, physically addressed L1
caches for instructions and data and a unified 128-KB, eight-way set-associative, physically addressed,
backside L2 cache.
Load/Store
Completion Bus
Unified Backside L2 Cache
Completion
Queue Core Interface Unit
(14 Entry) Maximum Two Instructions
Retire per Cycle CoreNet Interface
Freescale Semiconductor
e500mc Overview
Cache lines on the e500mc are 16 words (64 bytes) wide. The core allows cache-line-based user-mode
locks on cache contents. This provides embedded applications with the capability for locking interrupt
routines or other important (time-sensitive) instruction sequences into the instruction cache. It also allows
data to be locked into the data cache, which supports deterministic execution time.
The e500mc shown as “Core” in Figure 1-2, is designed to be implemented in multicore integrated
devices, and many of the features are defined to support multicore implementations, in particular to
partition the cores in such a way that multiple operating systems can be run with the integrated device.
Control Plane Data Plane Other Services
I D I D I D I D I D I D I D I D
Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache Cache
Peripheral
Device 1
Peripheral
Device 2
Peripheral
Device 3
Peripheral
Device 4 … Peripheral
Device n
Front-side
Platform Cache
DDR2/3
Memory Controller
The architecture defines the resources required to allow orderly and secure interactions between the cores,
memory, peripheral devices, and virtual machines. These include a hypervisor and guest supervisor
privilege levels, that determine whether certain activities, such as memory accesses and management,
cache management, and interrupt handling, are to be carried on at a system-wide level (hypervisor level)
or by the operating system within a partition (guest supervisor level).
In particular, e500mc implements the following categories as defined by PowerISA 2.06:
• Base
• Embedded
• Alternate Time Base
• Cache Specification
• Decorated Storage
• Embedded.Enhanced Debug
• Embedded.External PID
• Embedded.Hypervisor
• Embedded.Little-Endian
• Embedded.Performance Monitor
• Embedded.Processor Control
• Embedded.Cache Locking
• External Proxy
• Floating Point and Floating Point.Record
• Memory Coherence
• Store Conditional Page Mobility
• Wait
The above categories define instructions, registers, and processor behavior associated with a given
category. For a more complete and canonical definition of the e500mc register and instruction set, see
Chapter 2, “Register Model,” and Chapter 3, “Instruction Model,” respectively.
The CoreNet interface provides the primary on-chip interface between the cores and the rest of the SoC.
CoreNet is a tag-based interface fabric that provides interconnections among the cores, peripheral devices,
and system memory in a multicore implementation.
results available to a subsequent instruction, but cannot update the architected GPR specified as its target
operand ahead of the multiple-cycle divide instruction.
The common pipeline stages are as follows:
• Instruction fetch—Includes the clock cycles necessary to request an instruction and the time the
memory system takes to respond to the request. Instructions retrieved are latched into the
instruction queue (IQ) for subsequent consideration by the dispatcher.
Instruction fetch timing depends on many variables, such as whether an instruction is in the on-chip
instruction cache or the L2 cache. Those factors increase when it is necessary to fetch instructions
from system memory and include the processor-to-bus clock ratio, the amount of bus traffic, and
whether any cache coherency operations are required.
Because there are so many variables, unless otherwise specified, the instruction timing examples
in this chapter assume optimal performance and show the portion of the fetch stage in which the
instruction is in the instruction queue. The fetch1 and fetch2 stages are primarily involved in
retrieving instructions.
• The decode/dispatch stage fully decodes each instruction; most instructions are dispatched to the
issue queues (however, isync, rfi, sc, nops, and some other instructions do not go to issue queues).
• The issue queues, BIQ, GIQ, and FIQ, can accept as many as one, two, and two instructions,
respectively, in a cycle. The following simplification covers most cases:
— Instructions dispatch only from the two lowest IQ entries—IQ0 and IQ1.
— A total of two instructions can be dispatched to the issue queues per clock cycle.
Dispatch is treated as an event at the end of the decode stage. The issue stage reads source operands
from rename registers and register files and determines when instructions are latched into the
execution unit reservation stations. Note that the e500mc has 14 rename registers, one for each
completion queue entry, so instructions cannot stall because of a shortage of rename registers.
— Space must be available in the CQ for an instruction to decode and dispatch (this includes
instructions that are assigned a space in the CQ but not in an issue queue).
The general behavior of the issue queues is described as follows:
— The GIQ accepts as many as two instructions from the dispatch unit per cycle. SFX0, SFX1,
CFX, and all LSU instructions (including 64-bit loads and stores) are dispatched to the GIQ,
shown in Figure 1-3.
From IQ0/IQ1
GIQ3
GIQ2
GIQ1 To SFX1, CFX, or LSU
— Instructions can be issued out-of-order from the bottom two GIQ entries (GIQ1–GIQ0). GIQ0
can issue to SFX0, CFX, and LSU. GIQ1 can issue to SFX1, CFX, and LSU.
Note that SFX1 executes a subset of the instructions that can be executed in SFX0. The ability
to identify and dispatch instructions to SFX1 increases the availability of SFX0 to execute more
computational-intensive instructions.
An instruction in GIQ1 destined for SFX1 or the LSU need not wait for an CFX instruction in
GIQ0 that is stalled behind a long-latency divide.
— FIQ and BIQ only issue one instruction per cycle each to their respective reservation stations.
• The execute stage accepts instructions from its issue queue when the appropriate reservation
stations are not busy. In this stage, the operands assigned to the execution stage from the issue stage
are latched.
The execution unit executes the instruction (perhaps over multiple cycles), writes results on its
result bus, and notifies the CQ when the instruction finishes. The execution unit reports any
exceptions to the completion stage. Instruction-generated exceptions are not taken until the
excepting instruction is next to retire.
— Branch unit—The branch unit (BU) executes (resolves) all branch and CR logical instructions.
Branches resolve in execution stage. If a branch is mispredicted, it takes five cycles for the next
instruction to reach the execute stage.
— Integer units. Two simple units (SFX0 and SFX1) handle add, subtract, shift, rotate and logical
operations. The complex integer unit (CFX) executes multiplication and divide instructions
Most integer instructions have a one-cycle latency, so results of these instructions are available
one clock cycle after an instruction enters the execution unit.
Integer multiply and divide instructions have longer latency, and the multiply and divide can
overlap execution in most cases. Multiply operations are also pipelined.
— The load/store unit (LSU), shown in Figure 1-4, has the following features:
– Three-cycle load latency
– Fully pipelined
– Load miss queue
– Load hits can continue to be serviced when the load miss queue is full.
– As many as nine load misses to five distinct cache lines can be pipelined in parallel while
L1 cache hits continue to be serviced.
Reservation Station
Load/Store Unit
Three-Stage Pipeline
To completion queue
To GPR/FPR operand buses
To GPR/FPRs
• The complete and write-back stages maintain the correct architectural machine state and commit
results to the architecture-defined registers in order. If completion logic detects a mispredicted
branch or an instruction containing an exception status, subsequent instructions are cancelled, their
execution results in rename registers are discarded, and the correct instruction stream is fetched.
The complete stage ends when the instruction is retired. Two instructions can be retired per clock
cycle. If no dependencies exist, as many as two instructions are retired in program order.
Section 10.3.2, “Dispatch, Issue, and Completion Considerations,” describes completion
dependencies.
The write-back stage occurs in the clock cycle after the instruction is retired.
Backside L2 cache not present present An integrated backside L2 cache is present in e500mc.
The backside L2 cache is described throughout this
document.
SPE and embedded present not present SPE and embedded floating point (floating point done in
floating point the GPRs) is not present in e500mc. This makes the
GPRs 32 bits in size as opposed to 64 bits.
FPR based floating-point not present present FPR based floating-point (category Floating-Point) is
present in e500mc. The floating point is binary compatible
with e300 and e600. See Section 3.4.4.1, “Floating-Point
Instructions.”
Embedded hypervisor not present present A new privilege level and associated instructions and
registers are provided in e500mc to support partitioning
and virtualization. Changes appear throughout the
document.
Power management uses uses SoC How power management functions are invoked is now
MSR[WE] and programming model to mostly controlled by writing SoC registers. See
HID0[DOZE,N control power Chapter 8, “Power Management.”
AP,SLEEP] to management and
enter power removes MSR[WE],
management HID0[DOZE,NAP,SLE
states EP]. Also adds the wait
instruction.
External proxy not present present External proxy is a mechanism which allows the core to
acknowledge an external input interrupt from the PIC
when the interrupt is taken and provide the interrupt
vector in a core register. See Section 4.9.6.3, “External
Proxy.”
Additional interrupt level for not present present A separate interrupt level for debug interrupts is provided
Debug interrupts and the associated save /restore registers
DSRR0/DSRR1. See Section 4.9.16, “Debug
Interrupt—IVOR15.”
Processor signaling not present present The msgsnd and msgclr instructions are provided to
perform topology independent core to core doorbell
interrupts. See Section 3.4.11.4, “Message Clear and
Message Send Instructions.”
External PID load/store not present present Instructions are provided for supervisor/hypervisor level
software to perform load and store operations using a
different address space context. See Section 3.4.11.2,
“External PID Load Store Instructions.”
Decorated storage not present present Instructions are provided for performing load and store
operations to devices that include meta data that is
interpreted by the target address. Devices in some SoCs
utilize this facility for performing atomic memory updates
like increments and decrements. See Section 3.4.3.2.8,
“Decorated Load and Store Instructions.”
Lightweight not present Adds the lwsync The lwsync instruction is provided for a faster form of
synchronization instruction. memory barrier for load/store ordering to memory that is
cached and coherent. See Section 3.4.10.1, “User-Level
Cache Instructions” and Section 5.5.5, “Load/Store
Operation Ordering.”
CoreNet uses Core uses CoreNet as an CoreNet is a scalable non-retry based fabric used as an
Complex Bus interconnect interconnect between cores and other devices in the SoC.
(CCB) as
interconnect
Cache stashing not present present The capability to have certain SoC devices “stash” or
pre-load data into a designated core L1 or L2 data cache
is provided. The core is a passive recipient of such
requests. See Section 5.2.2, “Cache Stashing.”
Machine check provides provides error report, Machine check interrupts are divided into synchronous
machine asynchronous machine error reports, asynchronous machine checks, and NMI.
check check, and NMI How errors are reported are more conducive to a
interrupt and interrupts. HID0[RFXE] multi-core environment. See Section 4.9.3, “Machine
HID0[RFXE] is removed. Check Interrupt—IVOR1.”
to control how
the core treats
machine
check
interrupts
Write shadow not present present The capability to have all data written to the L1 data cache
be “written through” to the L2 cache (or to memory) is
provided. This provides a method of ensuring that any L1
cache error can be recovered from without loss of data.
See Section 5.4.2, “Write Shadow Mode.”
Cache block size 32 bytes 64 bytes e500mc contains a larger cache block/line/coherence
granule size.
Number of variable size 16 64 e500mc contains a larger number of variable size TLB
TLB entries entries and larger number of available page sizes. See
Section 6.3.2, “L2 TLB Arrays.”
– A specified CR field can be set by a move to the CR from another CR field (mcrf), or from
the XER (mcrxr).
– CR0 can be set as the implicit result of an integer instruction.
– CR1 can be set as the implicit result of a floating-point instruction.
– A specified CR field can be set as the result of an integer or floating-point compare
instruction.
See Section 2.6.1, “Condition Register (CR).”
• Performance monitor registers (PMRs). Similar to SPRs, PMRs are accessed by using dedicated
move to/move from instructions (mtpmr and mfpmr). See Section 2.18, “Performance Monitor
Registers (PMRs).”
Guest supervisor Both mfspr and mtspr when operating in supervisor mode (MSR[PR] = 0), regardless of the state of the
MSR[GS] bit (that is, it is available in hypervisor state as well).
For details, see Section 2.7.1, “Machine State Register (MSR).
Guest supervisor RO Only mfspr when operating in supervisor mode (MSR[PR] = 0), regardless of the state of the MSR[GS]
bit (that is, it is available in hypervisor state as well)
Hypervisor Both mfspr and mtspr when operating in hypervisor mode (MSR[GS,PR] = 00)
Hypervisor R/Clear Both mfspr and mtspr when operating in hypervisor mode (MSR[GS,PR] = 00); however, an mtspr only
clears bit positions in the SPR that correspond to the bits set in the source GPR.
An mtspr or mfspr instruction that specifies an unsupported SPR number is considered an invalid
instruction. The e500mc takes an illegal-operation program exception on all accesses to undefined SPRs
(or read accesses to SPRs that are write-only and write accesses to SPRs that are read-only), regardless of
MSR[GS,PR] and SPRN[5]values. For supported SPR numbers which are privileged, an mfspr or mtspr
while in user mode (MSR[PR] = 1) causes a privilege operation program exception.
NOTE
The behavior of e500mc in user mode when attempting to access an
unsupported privileged SPR number causes an illegal-operation program
exception, not a privilege operation program exception as specified by the
architecture.
Defined
SPR
Name SPR Access Section/Page
Abbreviation
Number
Defined
SPR
Name SPR Access Section/Page
Abbreviation
Number
Defined
SPR
Name SPR Access Section/Page
Abbreviation
Number
Defined
SPR
Name SPR Access Section/Page
Abbreviation
Number
Defined
SPR
Name SPR Access Section/Page
Abbreviation
Number
4
Certain fields in the register are only writeable when in hypervisor state.
5
This register is only writeable in hypervisor state, but can be read in guest supervisor state.
6
On Cores that do not provide an L2 cache, these registers still exist, but always read as zero.
7
NPIDR contents are transferred to the Nexus port whenever it is written.
8 USPRG0 is a separate physical register from SPRG0.
fetched and executed from any context from any permutation of these bits. Software should guarantee that
a translation exists for each of the permutations of these address space bits and that translation has the same
characteristics, including permissions and RPN fields. For this reason, it is unwise to use mtmsr to change
these bits and such changes should only be done through return from interrupt type instructions, which
provide the context synchronization atomically with instruction execution.
Guest supervisor
32 34 35 36 37 38 39 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
R
— GS — UCLE — CE — EE PR FP ME FE0 — DE FE1 — IS DS — PMM RI —
W
Reset All zeros
Figure 2-1. Machine State Register (MSR)
When an interrupt occurs, MSR contents of the interrupted process are automatically saved to the
save/restore register 1 (xSRR1) appropriate to the interrupt, and the MSR is altered to values
predetermined for the interrupt taken. At the end of the interrupt handler, the appropriate return from
interrupt instruction restores the values in the save/restore register 1 (xSRR1) to the MSR.
MSR contents are read into a GPR using mfmsr. The contents of a GPR can be written to MSR using
mtmsr. The write MSR external enable instructions (wrtee and wrteei) can be used to set or clear
MSR[EE] without affecting other MSR bits.
The e500mc does not implement the WE bit found in some previous e500 cores. Power management
operations on SoCs using the e500mc are handled through an SoC programming model. Refer to the
reference manual for the integrated device.
32–47 Version A 16-bit number that identifies the version of the processor. Different version numbers indicate major
differences between processors, such as which optional facilities and instructions are supported.
48–63 Revision A 16-bit number that distinguishes between implementations of the version. Different revision numbers
indicate minor differences between processors having the same version number, such as clock rate and
engineering change level.
DEC
Decrementer event = 1/0 detect Auto-reload
32 63
DECAR
The architecture definition for timer control register fields is described in the EREF: A Programmer’s
Reference Manual for Freescale Power Architecture® Processors.
This change prevents software from clearing a watchdog time-out that should result in the action defined
in TCR[WRC], in which these bits are reflected into the TSR[WRS] when the watchdog times out. Without
this change, it is theoretically possible that these bits could be cleared prior to the SoC seeing the bits
change, causing the watchdog action to fail.
The (G)ESR provides a way to differentiate among exceptions that can generate an interrupt type. When
an interrupt is generated, bits corresponding to the specific exception that generated the interrupt are set
and all other (G)ESR bits are cleared. Other interrupt types do not affect (G)ESR contents. The (G)ESR
does not need to be cleared by software. Table 2-6 shows (G)ESR bit definitions. For machine check
exceptions, the e500mc uses the MCSR, described in Section 2.9.9, “Machine Check Syndrome Register
(MCSR).”
The (G)ESR implementation differs from the architecture in the following respects:
• The e500mc does not implement AP, PUO, SPV, VLEMI, MIF, or XTE
• The e500mc implements the EPID field.
SPR 62 (ESR); 383 (GESR) Guest supervisor
32 35 36 37 38 39 40 41 42 43 44 45 46 47 48 56 57 58 63
R
— PIL PPR PTR FP ST — DLK ILK — BO PIE — EPID —
W
Reset All zeros
Figure 2-6. (Guest) Exception Syndrome Register (ESR/GESR)
32–35 — Reserved —
36 PIL Illegal instruction exception Program
37 PPR Privileged instruction exception Program
38 PTR Trap exception Program
39 FP Floating-point operations Alignment, data
storage, data TLB,
program
40 ST Store operation Alignment, DSI,
DTLB error
41 — Reserved —
42 DLK Data cache locking. Set when a DSI occurs because dcbtls, dcbtstls, or dcblc is DSI
executed in user mode while MSR[UCLE] = 0.
43 ILK Instruction cache locking. Set when a DSI occurs because icbtls or icblc is executed in DSI
user mode while MSR[UCLE] = 0.
44 — Not supported on the e500mc. Defined by the architecture as AP (auxiliary processor —
operation).
45 — Not supported on the e500mc. Unimplemented operation exception. On the e500mc, Program
unimplemented instructions are handled as illegal instructions.
46 BO Byte-ordering exception DSI, ISI
47 PIE Imprecise exception. Program
48–56 — Reserved —
57 EPID Indicates whether translation was performed using context from EPLC or EPSC. Set when Data storage,
a DSI, DTLB, or Alignment error occurs during execution of an external PID instruction. Data TLB error
Alignment
58–63 — Reserved —
This table shows the MCAR address and MCSR[MAV,MEA] at error time.
Table 2-7. MCAR Address and MCSR[MAV,MEA] at Error Time
MCSR[MAV] State
MCSR[MEA]:
MCAR/MCARU Comment
Next State
Current Next
0 1 1 MCAR[0–63] Updated with the EA associated with the error. If the detected error is a
multiway hit in the L2MMU (MCSR[L2MMU_MHIT]), the lower 12 bits of
the EA are cleared providing an EPN for the translation.
40 42 43 44 45 46 47
R NMI MAV MEA IF
— —
W w1c w1c w1c w1c
Reset All zeros
48 49 50 51 55
R LD ST LDG
—
W w1c w1c w1c
Reset All zeros
56 61 62 63
R TLBSYNC BSL2_ERR
—
W w1c w1c
Reset All zeros
Figure 2-7. Machine Check Syndrome Register (MCSR)
32 MCP Machine check input signal asserted. Set immediately on Async HID0[EMCP]
recognition of assertion of the MCP input. This input comes
from the SoC and is a level sensitive signal. This usually
occurs as the result of an error detected by the SoC.
33 ICERR Instruction cache tag or data array parity error Async L1CSR1[ICECE] and
(ICPERR) L1CSR1[ICE]
34 DCERR Uncorrectable L1 data cache data or tag error. Async L1CSR0[CECE] and
(DCPERR) L1CSR0[CE]
35 — Reserved — —
37–42 — Reserved — —
44 MAV MCAR address valid. The address contained in the MCAR Status —
was updated by the processor and corresponds to the first
detected error condition that contained an associated
address. Subsequent machine check errors that have
associated addresses are not placed in MCAR unless MAV
is 0 at the time the error is logged.
0 The address contained in MCAR is not valid.
1 The address contained in MCAR is valid.
Note: Software should first read MCAR before clearing
MAV. MAV should be cleared before MSR[ME] is set.
46 — Reserved — —
47 IF Instruction fetch error report. An error occurred during the Error None
attempt to fetch the instruction corresponding to the report
address in MCSRR0 or during an attempted fetch of a
younger instruction than that pointed by MCSRR0.
48 LD Load instruction error report. An error occurred during the Error None
attempt to execute the load instruction at the address report
contained in MCSRR0.
50 LDG Guarded load instruction error report. Set along with LD if Error None
the load encountering the error was a guarded load report
(WIMGE = xxx1x) and that guarded load did not encounter
one of the data cache errors. Set only if the error
encountered by the load was an L2 or CoreNet error.
51 — Reserved — —
52–61 — Reserved — —
1
“Exception Type” indicates which exception type caused the update of a given MCSR bit:
— Error report. Indicates that this bit is set only for error report exceptions that cause machine check interrupts. These bits are
only updated when the machine check interrupt is taken. Error report exceptions are not gated by MSR[ME]. These are
synchronous exceptions.
— NMI. Indicates that this bit is only set for the nonmaskable interrupt type exceptions which cause machine check interrupts.
This bit is only updated when the machine check interrupt is taken. NMI exceptions are not gated by MSR[ME]. This is an
asynchronous exception.
— Async. Indicates that this bit is set for an asynchronous machine check exception. These bits are set immediately upon
detection of the error in the MCSR. Once bit is set in the MCSR, a machine check interrupt occurs if MSR[ME]=1. If
MSR[ME]=0, the MCSR bits remain set unless cleared by software, and a machine check occurs when MSR[ME] is set.
— “Status” indicates that this bit provides additional status about the logging of an asynchronous machine check exception.
2
“Additional Gating Condition” indicates any other state that, if not enabled, inhibits the recognition this particular error condition.
3
For description of L2ERRDIS, see Section 2.15.4, “L2 Error Registers.”
The setting of MCSR[LD] and MCSR[ST] identifying the type of instruction is implementation
dependent. For e500mc, LD is set by instructions that load data into a register and complete when the load
data is committed to the architected register. ST is set by instructions that perform store operations and
instructions that are processed through the store queue in the LSU. The treatment of an instruction as a
load or store for the purposes of permission checking and debug events may differ from whether the LD
or ST bit is set for an error report.
The following instructions set MCSR[LD] if an error report occurs:
dcbt, dcbtst, icbt, lbz, lbzu, lbzx, lbzux, lha, lhau, lhax, lhaux, lhz, lhzu, lhzx, lhzux, lhbrx,
lmw, lwarx, lwz, lwzu, lwzx, lwzux, lwbrx, lbepx, lhepx, lwepx, dcbtep, dcbtstep, lbdx, lhdx,
lwdx, lfddx, lfd, lfdu, lfdux, lfdx, lfdepx, lfs, lfsu, lfsux, lfsx
The following instructions set MCSR[ST] if an error report occurs:
dcba, dcbal, dcbf, dcbi, dcblc, dcbst, dcbtls, dcbtstls, dcbz, dcbzl, dsn, icbi, icblc, icbtls, stb,
stbu, stbx, stbux, sth, sthu, sthx, sthux, sthbrx, stmw, stw, stwu, stwx, stwux, stwbrx, stwcx.,
stbepx, sthepx, stwepx, dcbfep, dcbstep, icbiep, dcbzep, dcbzlep, stbdx, sthdx, stwdx, stfddx,
stfd, stfdu, stfdux, stfdx, stfdepx, stfiwx, stfs, stfsu, stfsux, stfs
NOTE
Operating system software should always use SPRG0, SPRG1, SPRG2,
SPRG3 when accessing GSPRG0, GSPRG1, GSPRG2, and GSPRG3
because in guest–supervisor state, these accesses are mapped to their
equivalent guest registers. This allows the programming model for the
operating system software to be the same regardless of whether the
operating system is operating in guest state under a hypervisor or is
executing directly on the bare metal.
SPRGs and GSPRGs are 32 bits for 32-bit implementations and 64 bits for
64-bit implementations. For e500mc, these registers are 32 bits. USPRG0
(VRSAVE) is a 32-bit register regardless of whether the processor is a 32-bit
or 64-bit implementation.
32–53 — Reserved
54 BBFI Branch buffer flash invalidate. Setting BBFI flash clears the valid bit of all entries in the branch prediction
mechanisms; clearing occurs independently from the value of the enable bit (BPEN). BBFI is cleared by
hardware and always reads as 0.
55–62 — Reserved
32 EMCP Enable machine check signal. Used to mask out further machine check exceptions caused by
asserting the internal machine check signal from the integrated device.
0 Machine check signalling is disabled.
1 Machine check signalling is enabled. If HID0[EMCP] = 1, asserting the machine check signal
from the integrated device causes MCSR[MCP] to be set to 1. If MSR[ME] = 1 or
MSR[GS] = 1, a machine check exception and subsequent interrupt occurs.
33 EN_L2MMU_MHD Enable L2MMU multiple-hit detection. An L2MMU multiple hit occurs when more than one entry
matches a given translation. This most likely occurs when software mistakenly loads the TLB with
more than one entry that matches the same translation, but can also occur if a soft error occurs in
a TLB entry.
0 Machine check signalling is disabled.
1 A multiple L2 MMU hit sets MCSR[L2MMU_MHIT] to 1. If MSR[ME] = 1 or MSR[GS] = 1, a
machine check exception and subsequent interrupt occurs.
34–55 — Reserved
56 EN_MAS7_UPDAT Enable MAS7 update. Enables updating MAS7 by tlbre and tlbsx.
E 0 MAS7 is not updated by a tlbre or tlbsx.
1 MAS7 is updated by a tlbre or tlbsx.
57 DCFA Data cache flush assist. Force data cache to ignore invalid sets on miss replacement selection.
0 The data cache flush assist facility is disabled
1 The miss replacement algorithm ignores invalid entries and follows the replacement sequence
defined by the PLRU bits. This reduces the series of uniquely addressed load or dcbz
instructions to eight per set. The bit should be set just before beginning a cache flush routine
and should be cleared when the series of instructions is complete.
58 — Reserved
59 CIGLSO Cache-inhibited guarded load/store ordering.
0 Loads and stores to storage that are marked as cache inhibited and guarded have no ordering
implied except what is defined in the rest of the architecture.
1 Loads and stores to storage that are marked as cache inhibited and guarded are ordered.
60–62 — Reserved
63 NOPTI NOP the data and instruction cache touch instructions. Note that “cache and lock set” and “cache
and lock clear” instructions are not affected by the setting of this bit.
0 dcbt, dcbtep, dcbtst, dcbtstep, and icbt are enabled, and operate as defined by the
architecture and the rest of this document.
1 dcbt, dcbtep, dcbtst, dcbtstep, and icbt are treated as NOPs.
When touch instructions are treated as NOPs because HID0[NOPTI] is set, they do not cause
DAC debug events. That is, if a DAC comparison would have caused a debug event, the debug
event is also NOPed and does not occur.
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
R
CEI — CEDT CSLC CUL CLO CLFC — CEIT — DCBZ32 — CFI CE
W
Reset All zeros
Figure 2-11. L1 Cache Control and Status Register 0 (L1CSR0) Fields Implemented on e500mc
32–43 — Reserved
The setting of CEA has no effect if L1CSR0[CECE] = 0. Reading CEA is not guaranteed to reflect the last
written value in some implementations, however, it returns either the last written value or 0.
e500mc only supports the value 0b00 for ICEA
46 — Reserved
48 CEI (Data) Cache error injection enable. See Section 5.4.5, “Cache Error Injection.”
CPI 0 Error error injection disabled
DCPI 1 Error injection enabled.
Cache error checking must also be enabled (CECE = 1) when this bit is set.
Note that if the programmer attempts to set L1CSR0[CEI] (using mtspr) without setting L1CSR0[CECE],
L1CSR0[CEI] is not set (enforced by hardware).
49–51 — Reserved
56 — Reserved
57–58 CEIT Cache error injection type. Controls the type of error injection to be performed.
00 Inject single-bit data error and inject single bit tag error
01 reserved
10 reserved
11 reserved
61 — Reserved
62 CFI (Data) Cache flash invalidate. Invalidation occurs regardless of the enable (L1CSR0[CE]) value.
DCFI 0 No cache invalidate.
1 Cache flash invalidate operation. A cache invalidation operation is initiated by hardware. Once complete,
this bit is cleared.
Note: During an invalidation operation, writing a 1 causes undefined results; writing a 0 has no effect.
Note: CE should not be set when the cache is disabled until after the cache has been properly initialized by
flash invalidating the cache . This applies both to the first time the cache is enabled as well as
sequences that want to re-enable the cache after software has disabled it.
Note: If the cache is enabled and software wishes to disable it by writing a 0 to CE, software should first flush
the cache to ensure that any modified data resident in the cache is pushed to memory. If the cache is
not flushed, coherency is lost and any lines in the cache may provide stale data when the cache is
re-enabled.
32–43 — Reserved
49 — Reserved
52 ICSLC Instruction cache snoop lock clear. Sticky bit set by hardware if a cache line lock was cleared by a snoop
operation which caused an invalidation. Note that the lock for that line is cleared whenever the line is
invalidated. This bit can be cleared only by software.
0 The cache has not encountered a snoop that invalidated a locked line.
1 The cache has encountered a snoop that invalidated a locked line.
53 ICUL Instruction cache unable to lock. Sticky bit set by hardware. This bit can be cleared only by software.
0 Indicates a lock set instruction was effective in the cache
1 Indicates a lock set instruction was not effective in the cache
54 ICLO Instruction cache lock overflow. Sticky bit set by hardware. This bit can be cleared only by software.
0 Indicates a lock overflow condition was not encountered in the cache
1 Indicates a lock overflow condition was encountered in the cache
55 ICLFC Instruction cache lock bits flash clear. Clearing occurs regardless of the enable (L1CSR1[ICE]) value.
0 Default.
1 Hardware initiates a cache lock bits flash clear operation. This bit is cleared when the operation is complete.
Note: Writing a 1 while a flash clear operation is in progress causes undefined results. Writing a 0 during a
flash clear operation is ignored
62 ICFI Instruction cache flash invalidate. Invalidation occurs regardless of the enable (L1CSR1ICE) value.
0 No cache invalidate.
1 Cache flash invalidate operation. A cache invalidation operation is initiated by hardware. Once complete,
this bit is cleared.
Note: Writing a 1 during an invalidation operation causes undefined results. Writing a 0 during an invalidation
operation is ignored.
63 ICE Instruction cache enable.
0 The cache is not enabled. (not accessed or updated)
1 Enables cache operation.
Note: ICE should not be set when the cache is disabled until after the cache has been properly initialized by
flash invalidating the cache . This applies both to the first time the cache is enabled as well as
sequences that want to re-enable the cache after software has disabled it.
• Setting L1CSR2[DCWS] automatically sets L1CSR0[CFI] to flash invalidate the data cache when
turning on write shadow mode to purge the cache of any modified data. Software should perform
a flush operation on the data cache prior to setting L1CSR2[DCWS].
• While write shadow mode is active (L1CSR2[DCWS] = 1), the L2 cache is required to be enabled
and in general be able to allocate lines when store or store type operations are performed. See
Table 5-1 for supported write shadow configurations.
• Although the architecture defines DCSTASHID as L1CSR2[54–63], the e500mc implements only
8 bits (L1CSR2[56–63]) and supports only stash ID values of 8 to 255.
Writing to L1CSR2 requires synchronization, as described in Section 3.3.3, “Synchronization
Requirements.”
SPR 606 Hypervisor
32 33 34 55 56 63
R
— DCWS — DCSTASHID
W
Reset All zeros
Figure 2-13. L1 Cache Control and Status Register 2 (L1CSR2) Fields Implemented on the e500mc
This table describes how L1CSR2 fields are implemented on the e500mc.
Table 2-14. L1CSR2 Field Descriptions
32 — Implementation dependent.
33 DCWS Data cache write shadow. Note that on the e500mc, changing L1CSR2[DCWS] automatically sets
L1CSR0[CFI].
Set by software to place the primary data cache into write shadow mode. When write shadow mode is
enabled, data that is written to the primary data cache is also written through to the backside L2 (or other
parts of the memory hierarchy) so that any subsequent failures in the primary data cache can be recovered
from by invalidating the data cache.
0 The primary data cache is not in write shadow mode.
1 The primary data cache is in write shadow mode.
Note: Software should flush and invalidate the primary data cache before setting DCWS to ensure that no
modified data exists in the primary data cache.
Note: Only certain cache configurations are supported when write shadow mode is enabled. See
Table 5-1.
34–55 — Reserved
56–63 DCSTASHID Data cache stash ID. Contains the cache target identifier for external stash operations directed to this
processor’s data cache. Clearing DCSTASHID prevents the primary cache from accepting external stash
operations. Note that the e500mc supports only stash ID values of 8 and larger (that is values between 8
and 255); values from 1 to 7 are illegal.
The EREF: A Programmer’s Reference Manual for Freescale Power Architecture® Processors describes
these fields as they are defined in the Power ISA. This table describes how they are implemented on the
e500mc.
Table 2-15. L1CFG0 Field Descriptions
32–33 CARCH Cache architecture. 0 indicates harvard (split instruction and data)
37–38 — Reserved
32–35 — Reserved
37–38 — Reserved
32 — Reserved
33–34 L2CTEHA L2 cache tags error handling available. 1 indicates parity detection.
35–36 L2CDEHA L2 cache data error handling available. 0b11 indicates both parity and ECC correction available.
41–42 L2CREPL Cache default replacement policy. This is the default line replacement policy at power-on-reset. If an
implementation allows software to change the replacement policy it is not reflected here.1 indicates
pseudo-LRU.
44 — Reserved
48 49 50 51 52 53 54 55 56 57 58 59 63
R
— L2REP L2FL L2LFC — L2LOA — L2LO —
W
Reset All zeros
Figure 2-17. L2 Cache Control and Status Register (L2CSR0)
32 L2E L2 cache enable. Implemented as defined in EREF: A Programmer’s Reference Manual for Freescale
Power Architecture® Processors. The e500mc requires software to continue to read this bit after setting it
to ensure the desired value has been set before continuing on.
Note: L2E should not be set when the L2 cache is disabled until after the L2 cache has been properly
initialized by flash invalidating the cache and locks. This applies both to the first time the L2 cache is
enabled as well as sequences that want to re-enable the cache after software has disabled it.
Note: If the L2 cache is enabled and software wishes to disable it by writing a 0 to L2E, software should
first flush the L2 cache to ensure that any modified data resident in the L2 cache is pushed to
memory. If the L2 cache is not flushed, coherency is lost and any lines in the cache may provide stale
data when the L2 cache is re-enabled.
33 L2PE L2 cache parity/ECC error checking enable. Implemented as defined in EREF: A Programmer’s Reference
Manual for Freescale Power Architecture® Processors.
Note: L2PE should not bet set until after the L2 cache has been properly initialized out of reset by flash
invalidation. Doing so can cause erroneous detection of errors because the state of the error
detection bits are random out of reset. See Section 11.5, “L1 Cache State,” for more details on L1
cache initialization.
Note: When error injection is being performed, the value of L2PE and individual error disables are ignored
and errors are always detected. Software should ensure that L2PE is set when performing error
injection.
34 — Reserved
Performance note: If the number of ways available for instruction or data allocation is not a power of two,
the statistical percentage of total allocations across those available ways over a very long period of time
are not evenly distributed. For instance, if 3-ways (say way A, way B, and way C) are available for data
allocation, the long term percentage of allocations for A, B, and C are not 33%, 33%, 33%, respectively.
Instead, the number of allocations for one of the three ways are closer to 50%, with the other two ways
being closer to 25% (50%, 25%, 25%).
Instruction and Data way partitioning has no effect on cache locking. Cache lines which are locked due to
cache locking instructions are still honored in the presence of way partitioning. If locked lines exist in the
L2 cache prior to enabling L2 way partitioning, those locked lines can exist in the “opposite” partition. For
instance, a line locked by an icbtls instruction can exist in a way which is part of the data partition. To avoid
this condition, locks must be flash invalidated prior to enabling way partitioning.
Because L2WP only controls how new lines are allocated, L2WP can be changed at any time without
affecting the functionality of the L2 Cache.
38–39 (L2CM) L2 cache coherency mode. This field is not implemented in e500mc, and always reads as 0.
40–41 — Reserved
42 L2FI L2 cache flash invalidate. Implemented as defined in EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors. Note that Lock bits are not cleared by a L2 cache flash
invalidate. Lock bits should be cleared by software at boot time to ensure that random states of the lock
bits for each line do not limit allocation of those lines. See L2CSR0[L2LFC].
Note: When a flash invalidation operation is being performed (i.e. L2FI has been set to 1 by software),
software should not attempt to write 1 to this field again until after hardware has reset this bit to 0 to
indicate that the invalidate operation is complete. Writing a 1 during an invalidation operation causes
undefined results. Writing a 0 during an invalidation operation is ignored.
43 L2IO L2 cache instruction only. Implemented as defined in EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors except that if L2IO is set and L2DO is not set, storage accesses
which are data references (i.e. from load/store instructions) are not serviced from the L2 cache even if the
cache had previously allocated and still contains lines from data references that were allocated prior to
setting L2IO. In addition, when L2IO is set, the L2 cache does not participate in the coherence protocol
(that is, it does not respond to snoops) except that it processes instruction cache invalidations (icbi) from
any processor. When L2IO is set and the L2 cache contains modified data, that data becomes incoherent.
To avoid this situation, if software wishes to set L2IO (and not L2DO), it should first set both L2IO and L2DO
to prevent further allocations, then flush any modified data from the L2 cache, then clear L2DO.
The e500mc requires software to continue to read this bit after setting it to ensure the desired value has
been set before continuing on.
44–46 — Reserved
47 L2DO L2 cache data only. Implemented as defined in EREF: A Programmer’s Reference Manual for Freescale
Power Architecture® Processors. The e500mc requires software to continue to read this bit after setting it
to ensure the desired value has been set before continuing on.
48–49 — Reserved
52 L2FL L2 cache flush. Implemented as defined in EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors. On e500mc, L2FL should not be set when the L2 cache is not currently enabled
(L2E should already be 1). If L2FL is set and the L2 cache is not enabled, the flush does not occur and the
L2FL bit remains set.
53 L2LFC L2 cache lock flash clear. On boot, the processor should set this bit to clear any lock state bits which may
be randomly set out of reset, prior to enabling the L2 cache.
54–55 — Reserved
56 L2LOA L2 cache lock overflow allocate. Implemented as defined in EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors. Note that cache line locking in e500mc L2 is persistent.
57 — Reserved
58 L2LO L2 cache lock overflow. Implemented as defined in EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors.
59–63 — Reserved
48 55 56 63
R
— L2STASHID
W
Reset All zeros
Figure 2-18. L2 Cache Control and Status Register 1 (L2CSR1)
33 — Reserved
34–35 L2INSTLOSSLIMIT Some units of the core can lose arbitration for the backside L2 for multiple cycles. This field
specifies how many consecutive cycles instruction fetches can lose backside L2 arbitration
before raising its priority.
00 Raise priority after 8 losses (default)
01 Raise priority after 4 losses
10 Raise priority after 8 losses
11 Raise priority after 16 losses
37–39 L2SNPWINLIMIT Snoops receive the highest priority when arbitrating for the backside L2. In a system with very
active snooping, this can starve other units from winning access to the backside L2. This field
specifies how many consecutive snoops can win arbitration before allowing another unit to
win.
000 Limit to 8 consecutive snoops (default)
001 Limit to 2 consecutive snoops
010 Limit to 4 consecutive snoops
011 Limit to 8 consecutive snoops
100 Limit to 16 consecutive snoops
101 Limit to 32 consecutive snoops
110 Limit to 64 consecutive snoops
111 Limit to 128 consecutive snoops
40–55 — Reserved
56–63 L2STASHID L2 cache stash ID. Contains the cache target identifier to be used for external stash
operations directed to this processor’s L2 cache. A value of 0 for L2STASHID prevents the L2
cache from accepting external stash operations. Note that the e500mc supports only stash
ID values of 8 and larger (that is values between 8 and 255); values from 1 to 7 are illegal.
32–56 — Reserved
57-58 — Reserved
63 — Reserved
that it does not implement the TMBECCERR, TSBECCERR, and L2CFGERR fields, and implements the
implementation specific fields MULL2ERR and TMHIT.
SPR 991 Hypervisor
32 55 56 57 58 59 61 62 63
R MULL2ERR TMHIT TPARERR MBECCERR SBECCERR PARERR
— —
W w1c w1c w1c w1c w1c w1c
Reset All Zeros
Figure 2-20. L2 Cache Error Detect Register (L2ERRDET)
32 MULL2ERR Multiple L2 errors. Writing a 1 to this bit location resets the bit.
0 Multiple L2 errors of the same type were not detected.
1 Multiple L2 errors of the same type were detected.
Note: This field is not part of EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors.
33–55 — Reserved
56 TMHIT Tag multi-way hit detected. Writing a 1 to this bit location resets the bit.
0 Tag multi-way hit not detected.
1 Tag multi-way hit detected.
Note: This field is not part of EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors.
57–58 — Reserved
59 TPARERR Tag parity error detected. Writing a 1 to this bit location resets the bit.
0 Tag parity error not detected.
1 Tag parity error detected.
60 MBECCERR Data Multiple-bit ECC error detected. Writing a 1 to this bit location resets the bit.
0 Tag Multiple-bit ECC error not detected.
1 Tag Multiple-bit ECC error detected.
61 SBECCERR Data ECC error detected. Writing a 1 to this bit location resets the bit.
0 Tag Single-bit ECC error not detected.
1 Tag Single-bit ECC error detected.
62 PARERR Data parity error detected. Writing a 1 to this bit location resets the bit.
0 Tag parity error not detected.
1 Tag parity error detected.
63 — Reserved
described in the EREF: A Programmer’s Reference Manual for Freescale Power Architecture®
Processors with the following exception:
• It does not implement the TMBECCINTEN, TSBECCINTEN, and L2CFGINTEN fields
• It does implement the implementation specific fields TMHITINTEN.
SPR 720 Hypervisor
32 55 56 57 58 59 60 61 62 63
R
— TMHITINTEN — TPARINTEN MBECCINTEN SBECCINTEN PARINTEN —
W
Reset All Zeros
Figure 2-21. L2 Cache Error Interrupt Enable Register (L2ERRINTEN)
32–55 — Reserved
57–58 — Reserved
63 — Reserved
EREF: A Programmer’s Reference Manual for Freescale Power Architecture® Processors with the
following exception: it does not implement the L2TCCOUNT field.
SPR 724 Hypervisor
32 39 40 47 48 55 56 63
R
— L2CTHRESH — L2CCOUNT
W
Reset All Zeros
Figure 2-22. L2 Cache Error Control Register (L2ERRCTL)
32–39 — Reserved
40–47 L2CTHRESH L2 cache threshold. Threshold value for the number of ECC single-bit errors that are detected
before reporting an error condition. L2CTHRESH is compared to L2CCOUNT each time a single-bit
ECC error is detected.
48–55 — Reserved
56–63 L2CCOUNT L2 data ECC single-bit error count. Counts ECC single-bit errors in the L2 data detected. If
L2CCOUNT equals the ECC single-bit error trigger threshold (L2CTHRESH), an error is reported
if single-bit error reporting for data is enabled. Software should clear this value when such an error
is reported to reset the count.
32 — Reserved
33-35 DWNUM Doubleword number of the detected error (data ECC errors only).
Note: This field is not part of EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors.
36–42 — Reserved
48–49 — Reserved
52–62 — Reserved
56-63 ECCERRIM Error injection mask for the ECC/parity bits. A set bit causes the corresponding ECC/parity bit
to be inverted on writes if DERRIEN = 1.
Note: This field is not part of EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors.
hypervisor state results in a hypervisor privilege exception when MSR[PR] = 0 and a privilege exception
when MSR[PR] = 1.
Only the low-order 6 bits of LPIDR are implemented on e500mc.
When LPIDR is written the results of the change to LPIDR are not guaranteed to be seen until a context
synchronizing event occurs.
32–60 — Reserved
61 L2TLB0_FI TLB0 flash invalidate (write 1 to invalidate)
0 No flash invalidate. Writing a 0 to this bit during an invalidation operation is ignored.
1 TLB1 invalidation operation. Hardware initiates a TLB1 invalidation operation. When this operation is
complete, this bit is cleared. Writing a 1 during an invalidation operation causes an undefined operation.
This invalidation typically takes 1 cycle.
32–35 — Reserved
36–39 LPIDSIZ LPID size. The number of LPID bits implemented. The processor implements only the least significant
E LPIDR bits. (0b0110 indicates LPIDR is 6 bits, LPIDR[58–63])
40–46 RASIZE Real address size supported by the implementation. (0b0100100 indicates 36 physical address bits)
47–48 — Reserved
49–52 NPIDS Number of PID registers. Indicates the number of PID registers provided by the processor. (0b0001
indicates one PID register implemented)
53–57 PIDSIZE PID register size. PIDSIZE is one less than the number of bits in each of the PID registers implemented by
the processor. The processor implements only the least significant PIDSIZE+1 bits in the PID. (0b00111
indicates PID is 8 bits. PID[56–63])
58–59 — Reserved
60–61 NTLBS Number of TLBs. The value of NTLBS is one less than the number of software-accessible TLB structures
that are implemented by the processor. NTLBS is set to one less than the number of TLB structures so that
its value matches the maximum value of MAS0[TLBSEL]. (0b01 indicates two TLBs.)
62–63 MAVN MMU architecture version number. Indicates the version number of the architecture of the MMU
implemented by the processor. (0b00 indicates Version 1.0.)
This table describes the TLBnCFG fields and shows the values for the e500mc.
Table 2-28. TLBnCFG Field Descriptions
50–51 — Reserved
TLB read (tlbre) and TLB write (tlbwe) instructions use MAS0[TLBSEL], MAS0[ESEL], and
MAS2[EPN] to select which TLB entry to read from or write to. On e500mc, these fields are used as
described by this table.
Table 2-29. TLB Selection Fields
TLB Array
MAS0[ESEL] MAS2[EPN] MAS0[NV]
MAS0[TLBSEL]
1 MAS0[42:47] selects entry Not used, as TLB1 is fully associative NV field not defined for this TLB
(low order 6 bits of ESEL) Array
32–34 — Reserved
36–41 — Reserved
42–47 ESEL Entry select. Number of the entry in the selected array to be used for tlbwe. Updated on TLB error exceptions
(misses) and tlbsx hit and miss cases. Only certain bits are valid, depending on the array selected in TLBSEL.
Other bits should be 0.
Table 2-30. MAS0 Field Descriptions—MMU Read/Write and Replacement Control (continued)
48–61 — Reserved
62–63 NV Next victim. Can be used to identify the next victim to be targeted for a TLB miss replacement operation for
those TLBs that support the NV field.
For the e500mc, NV is the next victim value to be written to TLB0[NV] on execution of tlbwe. This field is also
updated on TLB error exceptions (misses), tlbsx hit and miss cases, and on execution of tlbre.
This field is updated based on the calculated next victim value for TLB0 (based on the round-robin replacement
algorithm, described in Section 6.3.2.2, “Replacement Algorithms for L2 MMU Entries”). Note that this field is
not defined for operations that specify TLB1 (when TLBSEL = 1).
33 IPRO Invalidate protect. Set to protect this TLB entry from invalidate operations from tlbivax, tlbilx, or MMUCSR0 TLB
T flash invalidates. Note that not all TLB arrays are necessarily protected from invalidation with IPROT. Arrays that
support invalidate protection are denoted as such in the TLB configuration registers.
0 Entry is not protected from invalidation.
1 Entry is protected from invalidation.
34–39 — Reserved
40–47 TID Translation identity. Defines the process ID for this TLB entry. TID is compared to the process ID in the PID
register during translation. A TID value of 0 defines an entry as global and matches with all process IDs.
48–50 — Reserved
51 TS Translation space. Compared with MSR[IS] (instruction fetch) or MSR[DS] (memory reference) to determine if
this TLB entry may be used for translation.
Table 2-31. MAS1 Field Descriptions—Descriptor Context and Configuration Control (continued)
52–55 TSIZE Translation size. Defines the page size of the TLB entry. For TLB arrays with fixed-size TLB entries, TSIZE is
ignored. For variable-size arrays, the page size is 4TSIZE Kbytes. The e500mc supports the following sizes:
0001 4 Kbyte 0111 16 Mbyte
0010 16 Kbyte 1000 64 Mbyte
0011 64 Kbyte 1001 256 Mbyte
0100 256 Kbyte 1010 1 Gbyte
0101 1 Mbyte 1011 4 Gbyte
0110 4 Mbyte
56–63 — Reserved
Nam
Bits Description
e
32–51 EPN Effective page number. Depending on page size, only the bits associated with a page boundary are valid. Bits
that represent offsets within a page are ignored and should be zero.
52–56 — Reserved
59 W Write-through
0 This page is considered write-back with respect to the caches in the system.
1 All stores performed to this page are written through the caches to main memory.
Nam
Bits Description
e
60 I Caching-inhibited
0 Accesses to this page are considered cacheable.
1 The page is considered caching-inhibited. All loads and stores to the page bypass the caches and are
performed directly to main memory. A read or write to a caching-inhibited page affects only the memory
element specified by the operation.
Note: Cache-inhibited loads may hit in the L1 or L2 cache, but the transaction is always performed over CoreNet,
ignoring the hit (although the hit may have other unarchitected side effects). Cache-inhibited loads that hit
in the Data Line Fill Buffer (DLFB) are serviced out of the DLFB and are not performed over CoreNet.
Note: Cache-inhibited (non-decorated, and non-guarded) loads execute speculatively on e500mc.
63 E Endianness. Determines endianness for the corresponding page. Little-endian operation is true little endian,
which differs from the modified little-endian byte ordering model available in some previous devices.
0 The page is accessed in big-endian byte order.
1 The page is accessed in true little-endian byte order.
32–51 RPN Real page number. Depending on page size, only the bits associated with a page boundary are valid. Bits that
represent offsets within a page are ignored and should be zero. MAS3[RPN] contains only the low-order bits
of the real page number. The high order bits of the real page number are located in MAS7. See
Section 2.16.6.8, “MAS Register 7 (MAS7),” for more information.
52–53 — Reserved
54–57 U0–U3 User attribute bits. These bits are associated with a TLB entry and can be used by system software. For
example, these bits may be used to hold information useful to a page scanning algorithm or be used to mark
more abstract page attributes.
58–63 UX,SX Permission bits (UX, SX, UW, SW, UR, SR). User and supervisor read, write, and execute permission bits.
UW,SW See the EREF: A Programmer’s Reference Manual for Freescale Power Architecture® Processors for more
, information on the page permission bits as they are defined by the architecture.
UR,SR
32–34 — Reserved
35 TLBSEL TLBSEL default value. Specifies the default value to be loaded in MAS0[TLBSEL] on a TLB miss exception.
D
36–51 — Reserved
52–55 TSIZED Default TSIZE value. Specifies the default value to be loaded into MAS1[TSIZE] on a TLB miss exception.
56 — Reserved
57 X0D Default X0 value. Specifies the default value to be loaded into MAS2[X0] on a TLB miss exception.
58 X1D Default X1 value. Specifies the default value to be loaded into MAS2[X1] on a TLB miss exception.
59 WD Default W value. Specifies the default value to be loaded into MAS2[W] on a TLB miss exception.
60 ID Default I value. Specifies the default value to be loaded into MAS2[I] on a TLB miss exception.
61 MD Default M value. Specifies the default value to be loaded into MAS2[M] on a TLB miss exception.
62 GD Default G value. Specifies the default value to be loaded into MAS2[G] on a TLB miss exception.
63 ED Default E value. Specifies the default value to be loaded into MAS2[E] on a TLB miss exception.
32 SGS Search GS. Specifies the GS value used when searching the TLB during execution of tlbsx. The SGS field is
compared with the Translated (TGS) field of each TLB entry to find a matching entry.
33–57 — Reserved
58–63 SLPID Search LPID. Specifies the LPID value used when searching the TLB during execution of tlbsx. The SLPID
field is compared with the TLPID field of each TLB entry to find a matching entry.
32–39 — Reserved
40–47 SPID Search PID. Specifies the value of PID used when searching the TLB during execution of tlbsx. For the e500mc,
SPID contains the search PID value used when searching the TLB during execution of tlbsx.
48–62 — Reserved
63 SAS Address space (AS) value for searches. Specifies the value of AS used when searching the TLB during
execution of tlbsx.
Nam
Bits Description
e
32–59 — Reserved
60–63 RPN Real page number, 4 high-order bits. MAS3 holds the remainder of the RPN. The byte offset within the page is
provided by the EA and is not present in MAS3 or MAS7.
Nam
Bits Description
e
32 TGS Translation guest space. During translation, TGS is compared with MSR[GS] to select a TLB entry.
33 VF Virtualization fault. Controls whether a DSI occurs on data accesses to the page, regardless of permission bit
settings.
0 Data accesses translated by this TLB entry occur normally.
1 Data accesses translated by this TLB entry always cause a data storage interrupt directed to the hypervisor.
34–57 — Reserved
58–63 TLPI Translation logical partition ID. During translation, TLPID is compared with the LPIDR to select a TLB entry. A
D TLPID value of 0 defines an entry as global and matches all values of LPIDR.
0–31 — Reserved
32 EPR External load context PR bit. Used in place of MSR[PR] for load permission checking when an external PID
load instruction is executed.
0 Supervisor mode.
1 User mode.
33 EAS External load context AS bit. Used in place of MSR[DS] for load translation when an External PID Load
instruction is executed. Compared with TLB[TS] during translation.
0 Address space 0.
1 Address space 1.
34 EGS External load context GS bit. Used in place of MSR[GS] for load translation when an External PID Load
instruction is executed. Compared with TLB[TGS] during translation.This field is only writable in hypervisor
state (MSR[PR] = 0 and MSR[GS] = 0).
0 Hypervisor address space.
1 Guest address space.
35–41 — Reserved
42–47 ELPID External load context LPID value. Used in place of LPIDR value for load translation when an external PID Load
instruction is executed. Compared with TLB[TLPID] during translation. This field is only writable in hypervisor
state (MSR[PR] = 0 and MSR[GS] = 0).
48–55 — Reserved
56–63 EPID External load context PID value. Used in place of all PID register values for load translation when an external
PID Load instruction is executed. Compared with TLB[TID] during translation.
0–31 — Reserved
32 EPR External store context PR bit. Used in place of MSR[PR] for store permission checking when an External PID
Store instruction is executed.
0 Supervisor mode
1 User mode.
33 EAS External store context AS bit. Used in place of MSR[DS] for store translation when an External PID Store
instruction is executed. Compared with TLB[TS] during translation.
0 Address space 0
1 Address space 1
34 EGS External store context GS bit. Used in place of MSR[GS] for store translation when an External PID Store
instruction is executed. Compared with TLB[TGS] during translation.This field is only writable in hypervisor
state (MSR[PR] = 0 and MSR[GS] = 0).
0 Hypervisor address space.
1 Guest address space.
35–41 — Reserved
42–47 ELPID External store context LPID value. Used in place of LPIDR value for store translation when an external PID
Store instruction is executed. Compared with TLB[TLPID] during translation. This field is only writable in
hypervisor state (MSR[PR] = 0 and MSR[GS] = 0).
48–55 — Reserved
56–63 EPID External store context PID value. Used in place of all PID register values for store translation when an external
PID Store instruction is executed. Compared with TLB[TID] during translation.
The debug registers listed here generally only describe the registers and facilities that are used by software
in the internal debug mode (when DBCR0[IDM] = 1). More detailed description of the debug facilities is
described in Chapter 9, “Debug and Performance Monitor Facilities.”
External Debug Mode. This bit is read only by software. It reflects the status of EDBCR0[EDM].
0 Indicates the processor is not in external debug mode. External debug events are disabled.
32 EDM
1 Indicates the processor is in external debug mode. A qualified debug condition generates an external
debug event by updating the corresponding bit in EDBSR0 and causing the processor to halt.
Reset. The architecture defines this field such that 00 is always no action and all other settings are
implementation specific. e500mc implements these bits as follows:
34–35 RST 0x Default (No action)
1x Core reset. Requests a core hard reset if MSR[DE] and DBCR0[IDM] are set. Always cleared on
subsequent cycle.
Interrupt Taken Debug Condition Enable. This bit affects only non-critical, non-debug, and non-machine
check interrupts.
38 IRPT
0 IRPT debug conditions are disabled
1 IRPT debug conditions are enabled
42–43 — Reserved
49–56 — Reserved
59–62 — Reserved
e500mc sets both DBSR[IAC1] and DBSR[IAC2] bits if IAC12M is set to anything other than 0b00 and an
instruction address compare 1 or 2 event occurs.
42–63 — Reserved
1
If IAC1 > IAC2 or IAC1 = IAC2 a valid condition never occurs.
2
If IAC1 > IAC2 or IAC1 = IAC2 a valid condition may occur on every instruction fetch.
e500mc sets both DBSR[DAC1] and DBSR[DAC2] bits if DAC12M is set to anything other than 0b00 and
a data address compare 1 or 2 event occurs
44–63 — Reserved
1
See DBCR4 for extensions to the exact address match (range defined)
2
If DAC1 > DAC2 or DAC1=DAC2 a valid condition never occurs.
3 If DAC1 > DAC2 or DAC1=DAC2 a valid condition may occur on every data storage address.
1
When in EDM (DBCR0[EDM]=1) software writes to this register are ignored while the e500mc is not halted.
32–47 — Reserved
56–63 — Reserved
Reset 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0..0
This table provides the bit definitions for DBSR and DBSRWR.
Table 2-45. DBSR/DBSRWR Field Descriptions
42–43 — Reserved
49–56 — Reserved
59–63 — Reserved
1
When in EDM (DBCR0[EDM] = 1), software writes to this register are ignored while the e500mc is not halted.
This table provides the bit definitions for NSPC. See Table 9-23 for the list of the Nexus registers that can
be accessed.
Table 2-46. NSPC Field Descriptions
32–51 — Reserved
40–55 — Reserved
NOTE
OS accesses to NPIDR must be performed in addition to writes to the PID
register used to create translated addresses in the MMU for Nexus
messaging.
Architecture® Processors. The performance monitor also defines IVOR35 (see Section 2.9.4, “(Guest)
Interrupt Vector Offset Registers (IVORs/GIVORs)”) for providing the address of the performance
monitor interrupt vector. IVOR35 is described in the interrupt model chapter of the EREF: A
Programmer’s Reference Manual for Freescale Power Architecture® Processors.
PMRs are similar to the SPRs and are accessed by mtpmr and mfpmr. As shown in Table 2-49, the
contents of the PMRs are reflected to a read-only user-level equivalent.
Table 2-49. Performance Monitor Registers
Supervisor User
Name Section/Page
Abbreviation PMRn Abbreviation PMRn
Attempting to access a supervisor PMR from user mode (MSR[PR] = 1), results in a privileged instruction
exception. Attempting to access a non-existent PMR in any privilege mode results in an illegal instruction
exception.
If MSRP[PMMP] = 1, access to PMRs can cause embedded hypervisor privilege exceptions, or return a
value of 0 in the target register. The behavior is described in EREF: A Programmer’s Reference Manual
for Freescale Power Architecture® Processors.
PMGC0 is cleared by a hard reset. Reading this register does not change its contents. This table describes
the e500mc specific PMGC0 fields.
Table 2-50. PMGC0/UPMGC0 Implementation-Specific Field Descriptions
51–52 TBSEL Time base selector. Selects the time base bit that can cause a time base transition event (the event occurs when
the selected bit changes from 0 to 1).
00 TB[63] (TBL[63])
01 TB[55] (TBL[55])
10 TB[51] (TBL[51])
11 TB[47] (TBL[47])
Time base transition events can be used to periodically collect information about processor activity. In
multiprocessor systems in which TB registers are synchronized among processors, time base transition events
can be used to correlate the performance monitor data obtained by the several processors. For this use,
software must specify the same TBSEL value for all processors in the system. Because the time-base frequency
is implementation-dependent, software should invoke a system service program to obtain the frequency before
choosing a value for TBSEL.
• The EVENT field only implements the low order 8 bits of the EREF: A Programmer’s Reference
Manual for Freescale Power Architecture® Processors defined field.
• The FCGS0 and FCGS1 fields are not implemented on e500mc Rev 1.x or Rev 2.x.
PMLCa0 (PMR144) UPMLCa0 (PMR128) PMLCa0–PMLCa3: Guest supervisor
PMLCa1 (PMR145) UPMLCa1 (PMR129) UPMLCa0–UPMLCa3: User RO
PMLCa2 (PMR146) UPMLCa2 (PMR130)
PMLCa3 (PMR147) UPMLCa3 (PMR131)
32 33 34 35 36 37 38 39 40 47 48 61 62 63
R
FC FCS FCU FCM1 FCM0 CE — EVENT — FCGS1 FCGS0
W
Reset All zeros
Figure 2-53. Local Control A Registers (PMLCa0–PMLCa3)/
User Local Control A Registers (UPMLCa0–UPMLCa3)
32 FC Freeze counter
0 The PMC is incremented (if permitted by other PM control bits).
1 The PMC is not incremented.
37 CE Condition enable
0 PMCx overflow conditions cannot occur. (PMCx cannot cause interrupts, cannot freeze counters.)
1 Overflow conditions occur when the most-significant-bit of PMCx is equal to one.
It is recommended that CE be cleared when counter PMCx is selected for chaining.
38–39 — Reserved
48–61 — Reserved
• If TRIGONCTL = 0b0000, the trigger state is always set to ON when the counter is not frozen.
This setting is used to essentially make triggers inactive and all other performance monitor controls
determine whether events are counted.
• If a condition occurs that is programmed via TRIGONCTL and the counter is not frozen, the trigger
state is set to ON.
• If a condition occurs that is programmed via TRIGOFFCTL and the counter is not frozen, the
trigger state is set to OFF.
• Other methods of freezing the PMCn from counter other than PMLCan[FC] or PMGC0[FAC] have
no effect on the trigger state, although such methods can prevent the counter from counting. That
is, the trigger state may be ON, but the PMCn is not counting events because it is frozen from some
other method.
41–47 — Reserved
51–52 — Reserved
56–57 — Reserved
58–63 THRESHOLD Threshold. Only events that exceed this value are counted. Events to which a threshold value applies
are implementation-dependent as are the dimension (for example duration in cycles) and the
granularity with which the threshold value is interpreted.
By varying the threshold value, software can profile event characteristics. For example, if PMC1 is
configured to count cache misses that last longer than the threshold value, software can obtain the
distribution of cache miss durations for a given program by monitoring the program repeatedly using a
different threshold value each time.
1
Performance Monitor Counter overflow generates a watchpoint (PMWn) that can be used for triggering or to generate
Watchpoint Messages (if enabled).
32 OV Overflow. When this bit is set, it indicates this counter reaches its maximum value.
33–63 Counter Value Indicates the number of occurrences of the specified event.
The minimum counter value is 0x0000_0000; 4,294,967,295 (0xFFFF_FFFF) is the maximum. A counter
can increment by 0, 1, 2, 3, or 4 up to the maximum value and then wrap to the minimum value.
A counter enters overflow state when the high-order bit is set by entering the overflow state at the halfway
point between the minimum and maximum values. A performance monitor interrupt handler can easily
identify overflowed counters, even if the interrupt is masked for many cycles (during which the counters
may continue incrementing). A high-order bit is set normally only when the counter increments from a
value below 2,147,483,648 (0x8000_0000) to a value greater than or equal to 2,147,483,648
(0x8000_0000).
NOTE
Initializing PMCs to overflowed values is strongly discouraged. If an
overflowed value is loaded into a PMCn that held a non-overflowed value
(and PMGC0[PMIE], PMLCan[CE], and (MSR[EE] or MSR[GS]) are set),
an interrupt is generated before any events are counted.
The response to an overflow depends on the configuration, as follows:
• If PMLCan[CE] is clear, no special actions occur on overflow: the counter continues incrementing,
and no exception is signaled.
• If PMLCan[CE] and PMGC0[FCECE] are set, all counters are frozen when PMCn overflows.
• If PMLCan[CE] and PMGC0[PMIE] are set, an exception is signaled when PMCn reaches
overflow. Interrupts are masked by when MSR[EE] and MSR[GS] are both 0. An exception may
be signaled while the interrupt is masked by MSR[EE] and MSR[GS], but the interrupt is not taken
until it is fully enabled and only if the overflow condition is still present and the configuration has
not been changed in the meantime to disable the exception.
However, if MSR[EE] and MSR[GS] remain 0 until after the counter leaves the overflow state
(msb becomes 0), or if MSR[EE] and MSR[GS] remain 0 until after PMLCan[CE] or
PMGC0[PMIE] are cleared, the exception is not signaled.
The following sequence is recommended for setting counter values and configurations:
1. Set PMGC0[FAC] to freeze the counters.
2. Using mtpmr instructions, initialize counters and configure control registers.
3. Release the counters by clearing PMGC0[FAC] with a final mtpmr.
Software is expected to use mtpmr to explicitly set PMCs to non-overflowed values. Setting an
overflowed value may cause an erroneous exception. For example, if both PMGC0[PMIE] and
PMLCan[CE] are set and the mtpmr loads an overflowed value into PMCn, an interrupt may be generated
without an event counting having taken place.
Table 3-1 lists Power ISA 2.06 instructions defined in the above categories which are not supported on the
e500mc. Attempting to execute unsupported instructions results in an illegal instruction exception-type
program exception.
Table 3-1. Unsupported Power ISA 2.06 Instructions (by category)
Table 3-1. Unsupported Power ISA 2.06 Instructions (by category) (continued)
instructions that affect the instruction flow. See Section 3.4.5, “Branch and Flow Control
Instructions.”
• Processor control instructions
These instructions are used for performing various tasks associated with moving data to and from
special registers, system linkage instructions, etc. See Section 3.4.6, “Processor Control
Instructions.”
• Memory synchronization instructions
These instructions are used for memory synchronizing. See Section 3.4.8, “Memory
Synchronization Instructions.”
• Memory control instructions
These instructions provide control of caches and TLBs. See Section 3.4.10, “Memory Control
Instructions,” and Section 3.4.11.3, “Supervisor-Level Memory Control Instructions.”
Note that instruction groupings used here do not indicate the execution unit that processes a particular
instruction or group of instructions. This information, which is useful for scheduling instructions most
effectively, is provided in Chapter 10, “Execution Timing.”
Instructions are four bytes long and are word-aligned. Byte, halfword, word loads and stores occur
between memory and a set of thirty-two 32-bit general-purpose registers (GPRs).
Integer instructions operate on word operands that specify GPRs as source and destination registers.
Floating-point instructions operate on doubleword operands, which may contain single- or
double-precision values, and use thirty-two 64-bit floating-point registers (FPRs) as source and destination
registers.
Arithmetic and logical instructions do not read or modify memory. To use the contents of a memory
location in a computation and then modify the same or another location, the memory contents must be
loaded into a register, modified, and then written to the target location using load and store instructions.
The description of each instruction includes the mnemonic and a formatted list of operands. To simplify
assembly language programming, a set of simplified mnemonics and symbols is provided for some of the
frequently used instructions (see Appendix B, “Simplified Mnemonics,” for a complete list). Programs
written to be portable across the various assemblers for the Power ISA should not assume the existence of
mnemonics not described in that document.
undefined results for a given instruction can vary between implementations and between execution
attempts in the same implementation.
If a sequence of instructions contains context-altering instructions and contains no instructions that are
affected by any of the context alterations, no software synchronization is required within the sequence.
Sometimes advantage can be taken of the fact that certain instructions that occur naturally in the program,
such as the rfi at the end of an interrupt handler, provide the required synchronization.
No software synchronization is required before altering the MSR because mtmsr is execution
synchronizing. No software synchronization is required before most other alterations shown in Table 3-2,
because all instructions before the context-altering instruction are fetched and decoded before the
context-altering instruction is executed. (The processor must determine whether any of the preceding
instructions are context-synchronizing.)
Table 3-2 identifies the software synchronization requirements for data access for context-altering
instructions that require synchronization.
Table 3-2. Data Access Synchronization Requirements
4,5
tlbilx CSI CSI
tlbwe CSI CSI 4,5
1 A sync prior to reading L1CSR0 or L1CSR1 is required to examine any cache locking status from prior cache locking
operations. The sync ensures that any previous cache locking operations have completed prior to reading the status.
2 A context-synchronizing instruction is required after altering MSR[ME] to ensure that the alteration takes effect for subsequent
machine check interrupts, which may not be recoverable and therefore may not be context-synchronizing.
3 The additional sync after the mtspr is done is required if software is turning off stashing by setting the stash ID field of the
register to zero. The sync ensures that any pending stash operations have finished.
4 For data accesses, the context-synchronizing instruction before tlbwe, tlbilx, or tlbivax ensures that all memory accesses
due to preceding instructions have completed to a point at which they have reported all exceptions they cause.
5
The context-synchronizing instruction after tlbwe, tlbilx, or tlbivax ensures that subsequent accesses (data and instruction)
use the updated value in any TLB entries affected. It does not ensure that all accesses previously translated by TLB entries
being updated have completed with respect to memory; if these completions must be ensured, tlbwe, tlbilx, or tlbivax must
be followed by an sync and by a context-synchronizing instruction.
6
To insure that all TLB invalidations are completed and seen in all processors in the coherence domain, the global invalidation
requires that a tlbsync be executed after the tlbivax as follows: tlbivax; sync; tlbsync; sync; isync. For the e500mc, this
code should be protected by a mutual exclusion lock such that only one processor at a time is executing this sequence as
multiple tlbsync operations on the CoreNet interface may cause the integrated device to hang.
Table 3-3 identifies the software synchronization requirements for instruction fetch and/or execution for
context-altering instructions which require synchronization.
Table 3-3. Instruction Fetch and/or Execution Synchronization Requirements
1
Architecturally, MAS registers changes require an isync before subsequent instructions that use those updated values such
as a tlbwe, tlbre, tlbilx, tlbsx, and tlbivax. Typically software does several MAS updates and then performs a single isync
prior to executing the TLB management instruction. Currently, e500mc does not require such synchronization because the
mtspr and the TLB management instructions both internally use the same synchronization method. If software choses not to
execute the isync it should be aware that the internal synchronization may change in future cores or even in a future revision
of e500mc.
2
The context-synchronizing instruction after tlbwe, tlbilx, or tlbivax ensures that subsequent accesses (data and instruction)
use the updated value in any TLB entries affected. It does not ensure that all accesses previously translated by TLB entries
being updated have completed with respect to memory; if these completions must be ensured, tlbwe, tlbilx, or tlbivax must
be followed by an sync and by a context-synchronizing instruction.
3 To insure that all TLB invalidations are completed and seen in all processors in the coherence domain, the global invalidation
requires that a tlbsync be executed after the tlbivax as follows: tlbivax; sync; tlbsync; sync; isync. For the e500mc, this
code should be protected by a mutual exclusion lock such that only one processor at a time is executing this sequence as
multiple tlbsync operations on the CoreNet interface may cause the integrated device to hang.
Table 3-4 identifies the software synchronization requirements for non context-altering instructions that
require synchronization.
Table 3-4. Special Synchronization Requirements
Context Altering Instruction or Event Required Before Required Immediately After Notes
MSR[DE] from 0 to 1
mtspr (L2ERR*) msync followed by isync isync —
mtspr (MMUCSR0) None isync —
mtspr (NSPD) None isync —
1
Synchronization requirements for changing any debug facility registers require that the changes be followed by an isync and a
transition of MSR[DE] from 0 to 1 before the results of the changes are guaranteed to be seen. Normally changes to the debug
registers occurs in the debug interrupt routine when MSR[DE] is 0 and the subsequent return from the debug routine is likely to
set MSR[DE] back to 1 which accomplishes the required synchronization. Software should only make changes to the debug
facility registers when MSR[DE] = 0.
2 Note that the special synchronization requirement applies only to changes to EPCR[DUVD]. If this bit is not changed, the
synchronization requirements for EPCR is as described in the data or instruction execution tables above.
Subtract from Minus One Extended subfme (subfme. subfmeo subfmeo.) rD,rA
Subtract from Zero Extended subfze (subfze. subfzeo subfzeo.) rD,rA
Although there is no subtract immediate instruction, its effect is achieved by negating the immediate
operand of an addi instruction. Simplified mnemonics include this negation. Subtract instructions subtract
the second operand (rA) from the third (rB). Simplified mnemonics are provided in which the third is
subtracted from the second. See Appendix B, “Simplified Mnemonics.”
An implementation that executes instructions with the overflow exception enable bit (OE) set or that sets
the carry bit (CA) can either execute these instructions slowly or prevent execution of the next instruction
until the operation completes. Chapter 10, “Execution Timing,” describes how the e500mc handles such
CR dependencies. The summary overflow and overflow bits XER[SO,OV] are set to reflect an overflow
condition of a 32-bit result only if the instruction’s OE bit is set.
The crD operand can be omitted if the result of the comparison is to be placed in CR0. Otherwise the target
CR field must be specified in crD by using an explicit field number.
For information on simplified mnemonics, see Appendix B, “Simplified Mnemonics.”
Rotate Left Word then AND with Mask rlwnm (rlwnm.) rA,rS,rB,MB,ME
Rotate Left Word Immediate then Mask Insert rlwimi (rlwimi.) rA,rS,SH,MB,ME
Rotate Left Word Immediate then AND with Mask rlwinm (rlwinm.) rA,rS,SH,MB,ME
Integer shift instructions, listed in Table 3-9, perform left and right shifts. Immediate-form logical
(unsigned) shift operations are obtained by specifying masks and shift values for certain rotate
instructions. Appendix B, “Simplified Mnemonics,” shows how to simplify coding of such shifts.
Multiple-precision shifts can be programmed as described in the EREF: A Programmer’s Reference
Manual for Freescale Power Architecture® Processors.
Some implementations execute the load algebraic (lha, lhax, lhau, lhaux) instructions with greater latency
than other types of load instructions. The e500mc executes these instructions with the same latency as
other load instructions.
The e500mc also contains load and store instructions for atomic memory accesses. These are described in
Section 3.4.8, “Memory Synchronization Instructions.”
Some implementations run the load/store byte-reverse instructions with greater latency than other types of
load/store instructions. The e500mc executes these instructions with the same latency as other load/store
instructions.
If rA is in the range of registers to be loaded, what gets loaded into any register depends on whether an
interrupt occurs (and at what point the interrupt occurs) requiring the instruction to be restarted. If rA is
loaded with a new value from memory and an interrupt and subsequent return to re-execute the lmw
instruction occurs, rA has a different value and forms a completely different EA, which causes the registers
to be reloaded from a storage location not intended by the program.
If an interrupt does not occur, the register to be loaded starting at rA + 1 (for example, if rA is r10, then
r11 is rA + 1) then is loaded from the new address calculated from the updated value of rA and the current
running displacement.
Table 3-13. Integer Load and Store Multiple Instructions
Load Byte with Decoration Indexed lbdx rD,rA,rB The byte, halfword, word, or floating-point doubleword
addressed by EA (in rB) using the decoration supplied by
Load Halfword with Decoration Indexed lhdx rD,rA,rB rA is loaded into target GPR rD.
Load Word with Decoration Indexed lwdx rD,rA,rB
Store Byte with Decoration Indexed stbdx rS,rA,rB The contents of rS and the decoration supplied by
GPR(rA) are stored into byte, halfword, word, or
Store Halfword with Decoration Indexed sthdx rS,rA,rB floating-point doubleword in storage addressed by EA
Store Word with Decoration Indexed stwdx rS,rA,rB (rB).
Decorated Storage Notify dsn rA,rB Address-only operation that sends a decoration without
any associated load or store semantics.
Decorated load and store instructions are treated as normal cacheable loads and stores when they are to
addresses that are not cache inhibited. dsn is treated as a 0 byte store. Decorated load and store instructions
to addresses that are caching inhibited are always treated as guarded, regardless of the setting of the G bit
in the associated TLB entry. This prevents speculative decorated loads from executing, which potentially
produces side effects other than the normal load semantics.
Implementation Notes:
The number of bits of decoration that are delivered along with the address for decorated load, store and
notify operations is implementation dependent based on how many bits of decoration the interconnect
supports. For e500mc, only the low-order 4 bits of the decoration in rA are implemented.
Implementation Notes
Single-precision multiply-type instructions operate faster than their double-precision equivalents. See
Chapter 10, “Execution Timing,” for more information.
NOTE
The architecture notes that, in some implementations, the Move to FPSCR
Fields (mtfsfx) instruction may perform more slowly when only a portion
of the fields are updated as opposed to all of the fields. This is not the case
in the e500mc.
BO Bits Description
BO Description
0000z Decrement the CTR, then branch if the decremented CTR ≠ 0 and the condition is FALSE.
0001z Decrement the CTR, then branch if the decremented CTR = 0 and the condition is FALSE.
001at Branch if the condition is FALSE.
0100z Decrement the CTR, then branch if the decremented CTR ≠ 0 and the condition is TRUE.
0101z Decrement the CTR, then branch if the decremented CTR = 0 and the condition is TRUE.
011at Branch if the condition is TRUE.
1a00t Decrement the CTR, then branch if the decremented CTR ≠ 0.
1a01t Decrement the CTR, then branch if the decremented CTR = 0.
1z1zz Branch always.
Note:
1. In this table, z indicates a bit that is ignored. Note that the z bits should be cleared, as they may be assigned a meaning in
some future version of the architecture.
2. The a and t bits provides a hint about whether a conditional branch is likely to be taken and may be used by some
implementations to improve performance. e500mc always uses dynamic prediction and ignores these bits.
The 5-bit BI operand in branch conditional instructions specifies which CR bit represents the condition to
test. The CR bit selected is BI +32.
If branch instructions use immediate addressing operands, target addresses can be computed ahead of the
branch instruction so instructions can be fetched along the target path. If the branch instructions use LR or
CTR, instructions along the path can be fetched if the LR or CTR is loaded sufficiently ahead of the branch
instruction.
Branching can be conditional or unconditional, and optionally a branch return address is created by storing
the EA of the instruction following the branch instruction in the LR after the branch target address has been
computed. This is done regardless of whether the branch is taken.
Any of these instructions for which the LR update option is enabled are considered invalid.
Executing sc invokes the system call interrupt handler or the hypervisor system call interrupt handler
depending on the value of the LEV field, see Section 4.9.10, “System Call/Hypervisor System Call
Interrupt—IVOR8/GIVOR8/IVOR40.”
An sc instruction without the level field is treated by the assembler as an sc with LEV = 0.
Move to Condition Register Fields mtcrf CRM,rS On some implementations, mtcrf may perform more slowly if
only a portion of the fields are updated. This is not so for the
e500mc.
Move to Condition Register from XER mcrxr crD —
Move from Condition Register mfcr rD —
Move from One Condition Register Field mfocrf rD,FXM See the EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors for a full
description of this instruction.
Move to One Condition Register Field mtocrf FXM,rS See the EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors for a full
description of this instruction.
Implementation Notes
e500mc implements mfocrf the same as mfcr and all the contents of the CR are moved to the destination
register.
e500mc implements mtocrf the same as mtcrf and all the fields of the CR specified by FXM are moved
to the CR fields specified by FXM.
The e500mc does not implement the WC field of the wait instruction as defined in Power ISA 2.06. The
WC field is ignored.
The user-level PMRs listed in Section Table 2-49., “Performance Monitor Registers are accessed with
mfpmr. Attempting to write user-level PMRs in either mode causes an illegal instruction exception.
memory, preventing the reordering of memory accesses across the barrier. No ordering is
performed for dcbz if the instruction causes the system alignment error handler to be invoked.
All accesses in this set are ordered as one set; there is not one order for guarded, caching-inhibited
loads and stores and another for write-through-required stores.
• Stores to memory that are caching-allowed, write-through not required, and memory-coherency
required. mbar 1 controls the order in which accesses are performed with respect to coherent
memory. It ensures that, with respect to coherent memory, applicable stores caused by instructions
before the mbar 1 complete before any applicable stores caused by instructions after it.
Except for dcbz and dcba, mbar 1 does not affect the order of cache operations (whether caused explicitly
by a cache management instruction or implicitly by the cache coherency mechanism). Also, mbar 1 does
not affect the order of accesses in one set with respect to accesses in the other.
mbar 1 may complete before memory accesses caused by instructions preceding it have been performed
with respect to main memory or coherent memory as appropriate. mbar 1 is intended for use in managing
shared data structures, in accessing memory-mapped I/O, and in preventing load/store combining
operations in main memory. For the first use, the shared data structure and the lock that protects it must be
altered only by stores in the same set (for both cases described above). For the second use, mbar 1 can be
thought of as placing a barrier into the stream of memory accesses issued by a core, such that any given
access appears to be on the same side of the barrier to both the core and the I/O device.
Like mbar 0, mbar 1 is broadcast on the CoreNet interface, however, unlike mbar 0, mbar 1 does not
wait for it address tenure before completing execution.
Because the core performs store operations in order to memory that is designated as both caching-inhibited
and guarded, mbar 1 is needed for such memory only when loads must be ordered with respect to stores
or with respect to other loads.
The section, “Lock Acquisition and Import Barriers,” in the EREF: A Programmer’s Reference Manual
for Freescale Power Architecture® Processors describes how sync and mbar control memory access
ordering when programs share memory.
3.4.9 Reservations
The ability to emulate an atomic operation using load with reservation and store conditional instructions
is based on the conditional behavior of stwcx., the reservation set by lwarx, and the clearing of that
reservation if the target location is modified by another processor or mechanism before the store
conditional instruction performs its store. Behavior of these instructions is described in the EREF: A
Programmer’s Reference Manual for Freescale Power Architecture® Processors. On the e500mc, a
reservation may be lost for any of the following reasons:
• Execution of a stwcx. by the processor
• Some other processor successfully modifies a location in the reservation granule and the address
containing the reservation is marked as Memory Coherence Required (M = 1)
• Execution of another load with reservation instruction, which removes the old reservation and
establishes a reservation at the address specified in the load with reservation instruction
• Some other processor successfully executes a dcbtst, dcbtstep, dcbtstls, dcbal, or dcba to a
location in the reservation granule and the address containing the reservation is marked as Memory
Coherence Required (M = 1)
On the e500mc, a reservation also may be lost for any of the following reasons:
• Any asynchronous interrupt is taken on the processor holding the reservation
Software should be written to not assume that the reservation is lost as the result of any interrupt. System
software should always perform a store conditional instruction to a scratch location when performing a
context switch or a partition switch to ensure that any held reservation is lost prior to initiating the new
context.
Data Cache dcba rA,rB If L1CSR0[DCBZ32] = 0 dcba operates on all bytes in cache line (cache-line operation)
Block Allocate If L1CSR0[DCBZ32] = 1 dcba operates on 32 bytes (32-byte operation)
dcba performs the same address translation and protection as a store and is treated as
a store for debug events. The dcba instruction is treated as a 32 or cache line number of
bytes store of zeros operation. The store operation is always size aligned to a 32 byte
granule for a 32 byte operation and a cache line granule for a cache line operation by
truncating the EA as necessary to achieve the appropriate granule. Using dcba with
32-byte operation may perform inferior to using cache-line operation and should be
avoided when possible.
Data Cache dcbal rA,rB This instruction behaves the same as dcba except it always operates on all bytes in the
Block Allocate cache line regardless of the setting of L1CSR0[DCBZ32].
by Line
Data Cache dcbf rA,rB The EA is computed, translated, and checked for protection violations:
Block Flush • For cache hits with the tag marked modified, the cache block is written back to memory
and the cache entry is invalidated.
• For cache hits with the tag marked not modified, the entry is invalidated.
• For cache misses, no further action is taken.
A dcbf is broadcast if WIMGE = xx1xx (coherency enforced).dcbf acts like a load with
respect to address translation and memory protection. It executes in the LSU regardless
of whether the cache is disabled or locked.
Data Cache dcbz rA,rB If L1CSR0[DCBZ32] = 0 dcbz operates on all bytes in cache line (cache-line operation)
Block Set to If L1CSR0[DCBZ32] = 1 dcbz operates on 32 bytes (32-byte operation)
Zero dcbz performs the same address translation and protection as a store and is treated as
a store for debug events. The dcbz instruction is treated as a 32 or cache line number of
bytes store of zeros operation. The store operation is always size aligned to a 32 byte
granule for a 32 byte operation and a cache line granule for a cache line operation by
truncating the EA as necessary to achieve the appropriate granule. Using dcbz with
32-byte operation may perform inferior to using cache-line operation and should be
avoided when possible.
dcbz takes an alignment exception if any of the following occur:
• The page is marked write-through.
• The page is marked caching inhibited.
When using dcbz in 32-bye operation on e500mc, if the line is not already valid in the
cache, the line is read from main storage prior to performing the dcbz operation.
Data Cache dcbzl rA,rB This instruction behaves the same as dcbz except it always operates on all bytes in the
Block Set to cache line regardless of the setting of L1CSR0[DCBZ32].
Zero by Line
Data Cache dcbt TH,rA,rB 1 When dcbt executes, the e500mc checks for protection violations (as for a load
Block Touch instruction). dcbt is treated as a NOP in the following cases on e500mc:
• The access would cause a DSI or DTLB Miss exception.
• The page is marked Caching Inhibited.
• The page is marked Guarded.
• The targeted cache is disabled.
• An L2 MMU multi-way hit is detected.
• A dcbf (or dcbst, dcbstep, dcbfep) was previously executed and has not yet
performed its flush and the dcbt and dcbf (or dcbst, dcbstep, dcbfep) specify the
same cache line, but specify a different byte address within the cache line.
• HID0[NOPTI] = 1
Otherwise, if no data is in the cache location, then a cache line fill is requested.
When dcbt is treated as a NOP, executing the dcbt can result in IAC debug events, but
does not cause DAC debug events.
Data Cache dcbtst TH,rA,rB 1 dcbtst is treated as a dcbt except that the line is allocated and an attempt is made to
Block Touch mark it as exclusive in the specified cache.
for Store
Instruction icbi rA,rB icbi is broadcast on the CoreNet interface. It should always be followed by a sync and
Cache Block an isync to make sure its effects are seen by instruction fetches and instruction execution
Invalidate following the icbi itself.
Instruction icbt CT,rA,rB When icbt executes, the e500mc checks for protection violations (as for a load
Cache Block instruction). icbt is treated as a NOP in the following cases on e500mc:
Touch • The access would cause a DSI or TLB Miss exception.
• The page is marked Caching Inhibited.
• The page is marked Guarded.
• The targeted cache is disabled.
• An L2 MMU multi-way hit is detected.
• HID0[NOPTI] = 1
Otherwise, if no data is in the cache location, then a cache line fill is requested.
When icbt is treated as a NOP, executing the icbt can result in IAC debug events, but
does not cause DAC debug events.
Note that the primary instruction cache (CT=0) on e500mc does not perform icbt
instructions and they are treated as a NOP.
1 These instructions formerly used CT as the first operand, however, Power ISA has renamed the field as TH to accommodate
the capability of performing streaming prefetches. For e500mc, the TH field can be treated as a CT value.
Full descriptions of these instructions are in the “Instruction Set” chapter of the EREF: A Programmer’s
Reference Manual for Freescale Power Architecture® Processors. Note the following behavior for
e500mc:
• Unable to lock conditions occur if the locking instruction has no exceptions and the line cannot be
locked when CT = 0 or CT = 2. When an unable to lock condition occurs the line is not loaded or
locked and L1CSR0[CUL] (or L1CSR1[ICUL] if an icbtls executed) is set to 1 regardless of
whether the L2 cache or the primary cache was specified. An unable to lock condition occurs when:
— The targeted cache is not enabled.
— The target address is marked Caching Inhibited (WIMGE = 0bx1xx)
— The instruction is an icbtls, the L2 cache is specified, and L2CSR0[L2DO] = 1.
— The instruction is an dcbtls or dcbtstls, the L2 cache is specified, and L2CSR0[L2IO] = 1.
— An error loading the line occurred either on the CoreNet interface or from the L2 cache.
• An overlocking condition occurs if the locking instruction has no exceptions and if all available
ways in the specified cache are locked.
— If an overlocking condition occurs in the primary cache (CT=0), the line is not loaded or locked
and L1CSR0[CLO] (L1CSR0[ICLO] if an icbtls executed) is set to 1. L1CSR0[CUL] and
L1CSR1[ICUL] are not set.
— If an overlocking condition occurs in the L2 cache (CT=2) L2CSR0[L2LO] is set to 1.
L1CSR0[CUL] and L1CSR1[ICUL] are not set. If L2CSR0[L2CLOA] = 1, the line is loaded
and locked replacing and unlocking a line in the set that would have normally been selected for
replacement if no lines in the set were locked. If L2CSR0[L2CLOA] = 0, the line is not loaded
or locked.
• Note that setting L1CSR0[CLFC] flash invalidates all primary data cache lock bits and setting
L1CSR0[ICLFC] flash invalidates all primary instruction cache lock bits, allowing system
software to clear all cache locking in the L1 cache without knowing the addresses of the lines
locked. Setting L2CSR0[L2LFC] flash invalidates all L2 cache lock bits allowing system software
to clear all cache locking in the L2 cache without knowing the addresses of the lines locked.
Because L1 cache locking is not persistent, setting L1CSR0[CFI] or L1CSR1[ICFI] clears the
locks in the respective caches because the lines containing the locks are invalidated.
• Touch and lock set instructions (icbtls, dcbtls, and dcbtstls) are always executed and are not
treated as hints.
• Since e500mc implements cache locking for the L1 cache as non-persistent, when combining
CT=2 cache operations with CT=0 data cache locking operations to the same line without any
synchronization, the final state of the CT=0 lock operations is unknown (that is, the line may or
may not be locked into the L1 data cache).
Cache locking clear instructions (dcblc and icblc) are NOPed if the specified cache is the L1 or L2 cache
and the cache is not enabled.
Consult the SoC documentation to determined behavior for the platform cache (CT = 1).
To precisely detect an overlock or unable to lock condition in the primary data cache, system software must
perform the following code sequence:
dcbtls
sync
mfspr (L1CSR0)
(check L1CSR0[CUL] for data cache index unable-to-lock condition)
(check L1CSR0[CLO] for data cache index overlock condition)
The following code sequence precisely detects an overlock in the primary instruction cache:
icbtls
sync
mfspr (L1CSR1)
check L1CSR1[ICUL] for instruction cache index unable-to-lock condition
check L1CSR1[ICLO] for instruction cache index overlock condition
involve the hypervisor to perform a service and causes the processor to take an embedded hypervisor
system call interrupt. The supervisor-level rfi and rfgi instructions are used for returning from an interrupt
handler. The hypervisor level rfci instruction is used for critical interrupts; rfdi is used for debug
interrupts; rfmci is used for machine check interrupts.
Guest supervisor software should use rfi, rfci, rfdi, and rfmci when returning from their associated
interrupts. When a guest operating system executes rfi, the processor maps the instruction to rfgi assuring
that the appropriate guest save/restore registers are used for the return. For rfci, rfdi, and rfmci, the
hypervisor should emulate these instructions as it emulates the taking of these interrupts in guest
supervisor state.
Privileges are as follows:
• sc is user privileged.
• rfi (rfgi), mfmsr, mtmsr, wrtee, wrteei are guest–supervisor privileged.
• rfci, rfdi, rfmci are hypervisor privileged.
Table 3-39. System Linkage Instructions—Supervisor-Level
Return from Interrupt rfi — These instructions are context-synchronizing, which for the
e500mc means it works its way to the final execute stage,
Return from Guest Interrupt rfgi — updates architected registers, and redirects instruction flow.
Return from Critical Interrupt rfci — In guest supervisor state (MSR[GS,PR]=0b10), rfi (rfgi)
cannot alter MSR[GS] or any bits protected by MSRP.
Return from Debug Interrupt rfdi — Guest supervisor state maps rfi to rfgi. Guest supervisor
state cannot execute rfci, rfdi, or rfmci as they are hypervisor
Return from Machine Check Interrupt rfmci —
privileged and are emulated by the hypervisor.
System Call sc LEV
Move to Machine State Register mtmsr rS In guest supervisor state (MSR[GS,PR]=0b10) mtmsr cannot
alter MSR[GS] or any bits protected by MSRP.
Certain encodings of the SPR field of mtspr and mfspr instructions (shown in Table 3-32) provide access
to supervisor-level SPRs. Encodings for SPRs are listed in Table 2-2. Simplified mnemonics are provided
for mtspr and mfspr. See Section 3.3.3, “Synchronization Requirements,” and the EREF: A
Programmer’s Reference Manual for Freescale Power Architecture® Processors for more information on
context synchronization requirements when altering certain SPRs.
Non External
PID
Instruction Mnemonic Syntax
Analogous
Instruction
Data Cache Block Flush by External PID Indexed dcbfep rA,rB dcbf
Data Cache Block Store by External PID Indexed dcbstep rA,rB dcbst
Data Cache Block Touch by External PID Indexed dcbtep TH,rA,rB dcbt
Data Cache Block Touch for Store by External PID Indexed dcbtstep TH,rA,rB dcbtst
Data Cache Block Zero by External PID Indexed dcbzep rA,rB dcbz
Data Cache Block Zero Long by External PID Indexed dcbzlep rA,rB dcbzl
Instruction Cache Block Invalidate by External PID Indexed icbiep rA,rB icbi
Data Cache dcbi rA,rB dcbi executes as defined in the Power ISA but has implementation dependent behaviors.
Block When the address to be invalidated is marked Memory Coherence Required (WIMGE
Invalidate = 0bx01xx), a dcbf is performed which first flushes the line if modified prior to invalidation. If
the address is not marked as Memory Coherence Required (WIMGE=0bx00xx), the line is
not flushed and is invalidated. In this case if the line was modified, the modified data is lost.
In the e500mc, dcbi cannot generate a cache-locking exception.
This table summarizes the operation of the TLB instructions in the e500mc.
Table 3-43. TLB Management Instructions
TLB Invalidate tlbilx T,rA, rB Invalidates TLB entries in the processor which executes the tlbilx instruction. TLB entries
Local which are protected by the IPROT attribute (entryIPROT=1) are not invalidated.tlbilx can be
used to invalidate all entries corresponding to a LPID value, all entries corresponding to a
PID value, or a single entry.”
tlbilx is guest supervisor privileged, however it causes an embedded hypervisor privilege
exception if EPCR[DGTMI] is set.
Note: tlbilx is the preferred way of performing TLB invalidations, especially for operating
systems running as a guest to the hypervisor because invalidations are partitioned
and do not require hypervisor privilege.
Note: tlbilx requires the same local-processor synchronization as tlbivax, but not the
cross-processor synchronization (that is, it does not require tlbsync).
TLB Invalidate tlbivax rA, rB A TLB invalidate operation is performed whenever tlbivax is executed. tlbivax invalidates
Virtual any TLB entry in the targeted TLB array that corresponds to the virtual address calculated
Address by this instruction as long as IPROT is not set; this includes invalidating TLB entries
Indexed contained in TLBs on other processors and devices in addition to the processor executing
tlbivax. Thus, an invalidate operation is broadcast throughout the coherent domain of the
processor executing tlbivax. For more information see Section 6.3, “Translation Lookaside
Buffers (TLBs).”
• tlbivax is hypervisor privileged.
TLB Read tlbre — tlbre causes the contents of a single TLB entry to be extracted from the MMU and be placed
Entry in the corresponding fields of the MAS registers. The entry extracted is specified by the
TLBSEL, ESEL, and EPN fields of MAS0 and MAS2. The contents extracted from the MMU
are placed in MAS0–MAS3, MAS7, and MAS8. If HID0[EN_MAS7_UPDATE] = 1, MAS7 is
updated with the four highest-order bits of physical address for the TLB entry. See
Section 6.3, “Translation Lookaside Buffers (TLBs).”
tlbre is hypervisor privileged.
TLB Search tlbsx rA, rB tlbsx searches the MMU for a particular entry based on the computed EA and the search
Indexed values in MAS5 and MAS6.If a match is found, MAS1[V] is set and the found entry is read
into the MAS0–MAS3, MAS7, and MAS8. If HID0[EN_MAS7_UPDATE] = 1, MAS7 is
updated with the four highest-order bits of physical address for the TLB entry. If the entry is
not found MAS1[V] is set to 0.See Section 6.3, “Translation Lookaside Buffers (TLBs).”
tlbsx is hypervisor privileged.
Note that rA=0 is a preferred form for tlbsx and that some Freescale implementations,
including the e500mc, take an illegal instruction exception if rA != 0.
TLB tlbsync — Causes a TLBSYNC transaction on CoreNet interface. See Section 6.3, “Translation
Synchronize Lookaside Buffers (TLBs).”
tlbsync is hypervisor privileged.
Note that only one tlbsync can be in process at any given time on all processors of a
coherence domain. The hypervisor or operating system should ensure this by doing the
appropriate mutual exclusion. If e500mc detects multiple tlbsync operations at the same
time, a machine check can occur.
TLB Write tlbwe — tlbwe causes the contents of certain fields of MAS0–MAS3, MAS7, and MAS8 to be written
Entry into a single TLB entry in the MMU. The entry written is specified by the TLBSEL, ESEL,
and EPN fields of MAS0 and MAS2. Execution of tlbwe on the e500mc also causes the
upper 4 bits of the RPN that reside in MAS7 to be written to the selected TLB entry. See
Section 6.3, “Translation Lookaside Buffers (TLBs).”
tlbwe is hypervisor privileged regardless of the setting of EPCR[DGTMI] as e500mc does
not provide a Logical to Real Translation (LRAT) mechanism.
Implementation Notes
If an attempt is made to write a TLB1 entry and MAS1[TSIZE] specifies an invalid size (that is, 0 or 11 to
15), the entry is treated as if it is 4KB.
The TLB management instructions from Power ISA 2.06 contain a significant amount of optional
capabilities. Although these capabilities are described in configuration registers, Freescale
implementations only utilize a portion of the these capabilities. To minimize compatibility problems,
system software should incorporate uses of these instructions into subroutines.
Executing tlbsx with rA != 0 causes an illegal instruction exception on e500mc. Software should always
use tlbsx with rA = 0.
Writing to a performance monitor register (mtpmr) requires guest supervisor privilege when
MSRP[PMMP] = 0. If MSRP[PMMP] = 1, performance monitor registers are only accessible to the
hypervisor. User level access is limited to read only access to certain registers through aliases designed to
be accessed by user level software. Supervisor software can access these as well as all other defined
performance monitor registers. Attempting to access an undefined performance monitor register causes an
illegal instruction exception. PMRs are listed in Section 2.18, “Performance Monitor Registers (PMRs).”
Guest interrupts are standard interrupts that are handled by guest-supervisor software. They use guest save
and restore registers (GSRR0/GSRR1) to save state when they are taken and they use the rfgi instruction
to restore state. Guest interrupts are listed in Table 2-5.
Section 2.3, “Register Mapping in Guest–Supervisor State,” describes how accesses to non-guest registers
are changed by the processor to their guest register equivalents when MSR[PR] = 0 and MSR[GS] = 1.
All asynchronous interrupts except the NMI interrupt are ordered because each type of interrupt has its
own set of save/restore registers. Only one interrupt of each category is reported (standard, critical, debug,
machine check, and guest), and when it is processed (taken) no program state is lost. Program state may
be lost if synchronous exceptions occur within the interrupt handler for those same synchronous
exceptions before software has successfully saved the state of the save/restore registers. For example,
executing an illegal instruction as the first instruction of the program interrupt handler causes another
program interrupt changing the state of the SRR0/SRR1 registers before software can save them thus
destroying the return path to the original interrupt. (See Section 4.6.1, “Interrupt Ordering and Masking.”)
All interrupts except the machine check interrupt are context synchronizing, as defined in the instruction
model chapter of the EREF: A Programmer’s Reference Manual for Freescale Power Architecture®
Processors. A machine check interrupt acts like a context-synchronizing operation with respect to
subsequent instructions.
— Processor doorbell
— Processor doorbell critical
• The e500mc implements the following interrupts defined by the embedded hypervisor category:
— Hypervisor privilege
— Hypervisor system call
— Guest processor doorbell interrupt
— Guest processor doorbell critical interrupt
— Guest processor doorbell machine check interrupt
• The e500mc does not implement the unimplemented operation exception of the program interrupt.
All unimplemented instructions take an illegal instruction exception.
• Interrupt priorities differ from those specified in the architecture as described in Section 4.11,
“Interrupt Priorities.”
In no case is an interrupt directed to the guest when the processor is executing in the hypervisor state.
For more specific information about how interrupts are directed, see EREF: A Programmer’s Reference
Manual for Freescale Power Architecture® Processors or Power ISA 2.06.
Register Description
Save/restore register 0 On an interrupt, xSRR0 holds the EA at which execution continues when the corresponding return from
(SRR0, CSRR0, interrupt instruction executes. Typically, this is the EA of the instruction that caused the interrupt or the
DSRR0, GSRR0, subsequent instruction.
MCSRR0)
Save/restore register 1 When an interrupt is taken, MSR contents are placed into xSRR1. When the return from interrupt (rfi, rfgi,
(SRR1, CSRR1, rfci, rfdi, rfmci) instruction executes, the values are restored to the MSR from xSRR1. xSRR1 bits that
DSRR1, GSRR1, correspond to reserved MSR bits are also reserved. Note that an MSR bit that is reserved may be altered
MCSRR1) by a return from interrupt instruction.
Register Description
Data exception Contains the address referenced by a load, store, or cache management instruction that caused an
address register alignment, data TLB miss, or data storage interrupt. When executing in the guest state (MSR[GS] = 1),
(DEAR/GDEAR) accesses to the DEAR are mapped to GDEAR upon executing mtspr or mfspr.
DEAR and GDEAR are described in Section 2.9.2, “(Guest) Data Exception Address Register
(DEAR/GDEAR).”
Exception proxy Defined by the external interrupt proxy facility, which is described in Section 4.9.6.3, “External Proxy.” EPR
register (EPR/GEPR) is used to convey the peripheral-specific interrupt vector associated with the external input interrupt
triggered by the programmable interrupt controller (PIC) in the integrated device. When executing in the
guest state (MSR[GS] = 1), accesses to the EPR are mapped to GEPR upon executing mfspr.
EPR and GEPR are described in Section 2.9.5, “(Guest) External Proxy Register (EPR/GEPR).”
Interrupt vector prefix (G)IVPR[32-47] provides the high-order 16 bits of the address of the interrupt handling routine for each
register (IVPR/GIVPR) interrupt type. The 16-bit vector offsets are concatenated to the right of (G)IVPR to form the address of
the interrupt handling routine.
Exception syndrome Identifies a syndrome for differentiating exception conditions that can generate the same interrupt. When
register (ESR/GESR) such an exception occurs, corresponding (G)ESR bits are set and all others are cleared. Other interrupt
types do not affect the (G)ESR. (G)ESR does not need to be cleared by software. When executing in the
guest state (MSR[GS] = 1), accesses to the ESR are mapped to GESR upon executing mtspr or mfspr.
See Section 2.9.6, “(Guest) Exception Syndrome Register (ESR/GESR).”
Interrupt vector offset Holds the quad-word index from the base address provided by the (G)IVPR for each interrupt type.
registers Table 4-2 lists the (G)IVOR assignments for the e500mc core. Supported (G)IVORs and the associated
(IVORs/GIVORs) interrupts are listed in Table 4-2.
Machine check On a machine check interrupt, MCAR/MCARU is updated with the address of the data associated with the
address register machine check if applicable. See Section 2.9.8, “Machine Check Address Register (MCAR/MCARU).”
(MCAR/MCARU)
Machine check On a machine check interrupt, MCSR is updated with a syndrome to indicate exceptions, listed in
syndrome register Table 2-8 and fully described in the EREF: A Programmer’s Reference Manual for Freescale Power
(MCSR) Architecture® Processors. Section 2.9.9, “Machine Check Syndrome Register (MCSR),” describes MCSR
bit fields as they are defined for the e500mc.
NOTE
System software may need to identify the type of instruction that caused the
interrupt and examine the TLB entry and ESR to fully identify the exception
or exceptions. For example, because both protection violation and
byte-ordering exception conditions may be present, and either causes a data
storage interrupt, system software would have to look beyond ESR[BO],
such as the state of MSR[PR] in SRR1 and the TLB entry page protection
bits, to determine if a protection violation also occurred.
4.6 Exceptions
Exceptions are caused directly by instruction execution or by an asynchronous event. In either case, the
exception may cause one of several types of interrupts to be invoked.
caused the exception. The EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors describes asynchronous and synchronous interrupts.
Save and
Directing State at
IVOR Interrupt Exception (G)ESR1 Enabled by Type2 Restore Page
Exception
Registers
Save and
Directing State at
IVOR Interrupt Exception (G)ESR1 Enabled by Type2 Restore Page
Exception
Registers
Privileged PPR — SP
Trap PTR — SP
Unimplemented PUO4 — SP
opcode
IVOR7 Floating-point unavailable — — — SP SRRs 4-29
IVOR13 Data TLB Data TLB miss MSR[GS] = 0 or [ST],[FP,AP], — SP SRRs 4-32
error EPCR[DTLBGS] = 0 [EPID]
GIVOR13 Data TLB Data TLB miss MSR[GS] = 1 and [ST],[FP,AP], — SP GSRRs 4-32
error EPCR[DTLBGS] = 1 [EPID]
Save and
Directing State at
IVOR Interrupt Exception (G)ESR1 Enabled by Type2 Restore Page
Exception
Registers
Critical interrupt
taken
Unconditional A
debug event
1
In general, when an interrupt affects an (G)ESR as indicated in the table, it also causes all other (G)ESR bits to be cleared.
Special rules may apply for implementation-specific (G)ESR bits.
Legend:
xxx (no brackets) means (G)ESR[xxx] is set.
[xxx] means (G)ESR[xxx] could be set.
[xxx,yyy] means either (G)ESR[xxx] or (G)ESR[yyy] may be set, but not both.
{xxx,yyy} means either (G)ESR[xxx] or (G)ESR[yyy] and possibly both may be set.
2
Interrupt types:
SP = synchronous and precise
SI = synchronous and imprecise
A = asynchronous
* = post completion interrupt. xSRR0 registers point after the instruction causing the exception.
3
Section 4.9.6.3, “External Proxy,” describes how the e500mc interacts with a programmable interrupt controller (PIC) defined by
the integrated device.
4
PUO does not occur on e500mc.
5
This debug interrupt may be made pending if MSR[DE] = 0 at the time of the exception.
— Alignment
— Data storage (if the access crosses a page boundary and protections on the 2 pages differ)
— Data TLB (if the access crosses a page boundary and one of the pages is not in the TLB)
— Debug (data address compare)
Register Setting
• An address related to the machine check may be stored in MCAR (and MCARU), according to
Table 4-4.
• The machine check syndrome register, MCSR, is used to log information about the error condition.
The MCSR is described in Section 2.9.9, “Machine Check Syndrome Register (MCSR).”
• At the end of the machine check interrupt software handler, a Return from Machine Check Interrupt
(rfmci) may be used to return to the state saved in MCSRR0 and MCSRR1.
Machine check exceptions are typically caused by a hardware failure or by software performing actions
for which the hardware has not been designed to handle, or cannot provide a suitable result. They may be
caused indirectly by execution of an instruction, but may not be recognized or reported until long after the
processor has executed that instruction.
• It is also possible that an error report machine check interrupt occurs without an associated
asynchronous machine check error bit being set in the MCSR. This can occur when the processor
is the consumer of some data for which the error was detected by some agent other than the
processor. For example, an error in DRAM may occur and if the processor executed a load
instruction which accessed that DRAM where the error occurred, the load instruction would take
an error report machine check interrupt if it attempted to complete execution.
• A non-maskable interrupt (NMI) occurs when the integrated device asserts the NMI signal to the
e500mc. The MCSR[NMI] bit is set when the interrupt occurs. The NMI signal is non-maskable
and occurs regardless of the state of MSR[ME] or MSR[GS].
NOTE
The taking of an asynchronous machine check interrupt always occurs when
any of the asynchronous machine check error bits is not zero and the
asynchronous machine check interrupt is enabled (MSR[ME] = 1 or
MSR[GS] = 1). The condition persists until software clears the
asynchronous machine check error bits in MCSR.
To avoid multiple asynchronous machine check interrupts, software should
always read the contents of the MCSR within the asynchronous machine
check interrupt handler and clear any set bits in the MCSR prior to
re-enabling machine check interrupts by setting MSR[ME] or MSR[GS].
Note that the processor may set asynchronous machine check error bits in
MCSR at any time as errors are detected, including when the processor is in
the asynchronous machine check interrupt handler and MSR[ME] = 0.
An asynchronous machine check, error report, or NMI interrupt occurs when no higher priority interrupt
exists and an asynchronous machine check, error report, or NMI exception is presented to the interrupt
mechanism.
The following general rules apply:
• The instruction whose address is recorded in MCSRR0 has not completed, but may have attempted
to execute.
• No instruction after the one whose address is recorded in MCSRR0 has completed execution.
• Instructions in the architected instruction stream prior to this instruction have all completed
successfully.
Register Setting
MCSRR0 The core sets this to an EA of an instruction executing or about to execute when the exception occurred.
MCSRR1 Set to the contents of the MSR at the time of the exception.
MSR • RI is cleared.
• All other defined MSR bits are cleared.
Register Setting
MCAR MCAR is updated with the address of the data associated with the machine check. See Section 2.9.8, “Machine
(MCARU) Check Address Register (MCAR/MCARU).”
MCSR Set according to the machine check condition. See Table 2-8.
Machine check input signal asserted. Set immediately on recognition of assertion of the HID0[EMCP]
mcp input. This input comes from the SoC and is a level sensitive signal. This usually
occurs as the result of an error detected by the SoC.
Data cache data parity or tag parity error due to a load or store L1CSR0[CECE] and L1CSR0[CE]
If an L2 MMU simultaneous hit occurs during the execution of dcba, dcbal, dcbt, dcbtep,
dcbtst, dcbtstep, or icbt, no error report machine check occurs on the instruction and no
access off the core is performed.
Simultaneous tlbsync operations detected. The system should never have two None
outstanding tlbsync operations on CoreNet.
1
“Additional Enable Bits” indicates any other state that, if not enabled, inhibits the recognition this particular error condition.
2
For a description of L2ERRDIS, see Section 2.15.4.1, “L2 Cache Error Disable Register (L2ERRDIS).”
Instruction fetch error report An error occurred while attempting to fetch the instruction corresponding to the address
(MCSR[IF]) contained in MCSRR0.
Load instruction error report An error occurred while attempting to execute the load instruction corresponding to the
(MCSR[LD]) address contained in MCSRR0.
Guarded load instruction If LD is set and the load was a guarded load (that is, has the guarded storage attribute), this
error report bit may be set. Note that some implementations may have specific conditions that govern
(MCSR[LDG]) when this bit is set.
Store instruction error report An error occurred while attempting to perform address translation on the instruction
(MCSR[ST]) corresponding to the address contained in MCSRR0. Since stores may complete with
respect to the processor pipeline before their effects are seen in all memory subsystem
areas, only translation errors are reported as error reports with stores.
Note that some instructions which are considered load instructions with respect to
permission checking and debug events are reported as store error reports (MCSR[ST] is
set). See Section 2.9.9, “Machine Check Syndrome Register (MCSR)” for which instructions
set MCSR[LD] or MCSR[ST].
Table 4-7 describes which error sources generate which error report status bits in the MCSR.
Note that there is no MCSR error status bit for CoreNet data errors. If a CoreNet data error occurs on a
load or instruction fetch and the instruction reaches the bottom of the completion buffer, an error report
occurs. But, because there is no MCSR error status bit for data errors, the core does not generate an
asynchronous machine check. The device that detects the error is expected to report it. For example,
assume that the core attempts to perform a load from a PCI device that encounters an error. The PCI device
would signal a “PCI Master Abort” and would signal the error to the programmable interrupt controller
(PIC).
The core's memory transaction should be completed with a data error so that the core is not hung awaiting
the transaction. Eventually, the PIC should interrupt the core (the PIC should be programmed to direct such
an error to take a machine check interrupt).
Error reports are intended to be a mechanism to stop the propagation of bad data; the asynchronous
machine check is intended to allow software to attempt to recover from errors gracefully.
In a multicore system, the PIC is likely to steer all PCI error interrupts to one processor. For the PCI Master
Abort example, assume that Processor B performs a load that gets a PCI Master Abort, and the PIC steers
the PCI's error signal to Processor A’s machine check input signal. Here, the error report in Processor B
prevents the propagation of bad data; Processor A gets the task of attempting a graceful recovery. Some
interprocessor communication is likely necessary.
Table 4-7. Synchronous Machine Check Error Reports
Instruction fetch Instruction cache data array parity error IF Within fetch group3
L2 cache error
Load (or touch) instruction Data cache tag parity error LD, [LDG]4 Yes
An error report occurs only if the instruction that encountered the error reaches the bottom of the
completion buffer (that is, it becomes the oldest instruction currently in execution) and the instruction
would have completed otherwise. If the instruction is flushed (possibly due to a mispredicted branch or
asynchronous interrupt, including an asynchronous machine check) before reaching the bottom of the
completion buffer, the error report does not occur.
Error Source Error Type Transaction Source MCSR Update1 MCAR Update2
Error Source Error Type Transaction Source MCSR Update1 MCAR Update2
Data cache Tag parity error load, touch, stores, cache MAV DCERR RA
operations, or snoops
L2 MMU Multi-way hit tlbsx, instruction fetch, load, MAV L2MMU_MHIT EA5
touch, store, cache op (all types)
1
The MCSR update column indicates which MCSR bits are updated when the exception is logged.
2
The MCAR update column indicates whether the error type provides either a real or effective address (RA or EA), or no address
which is associative with the error.
3 The machine check input pin is used by the SoC to indicate all types of machine check type error which are detected by the
SoC. Software must query error logging information within the SoC to determine the specific error condition and source.
4 The L2 cache has a separate set of error reporting and capture registers.
5 The lower 12 bits of the EA are cleared.
IVOR2 Data Access or virtualization fault MSR[GS] = 0 or [ST], [FP,AP], [EPID] SRRs
storage EPCR[DSIGS] = 0 or
(DSI) Load reserve or store conditional to TLB[VF] = 1 [ST]
write-through required location (W = 1)
This table describes exceptions as defined by the architecture, noting any e500mc-specific behavior.
Table 4-10. Data Storage Interrupt Exception Conditions
Exception Cause
Regardless of the EA, icbt, dcbt, dcbtst, dcba and dcbal cannot cause a data storage interrupt.
NOTE
icbi, icbt, icblc, and icbtls are treated as loads from the addressed byte with
respect to translation and protection. Both use MSR[DS], not MSR[IS], to
determine translation for their operands. Instruction storage and TLB error
interrupts are associated with instruction fetching and not execution. Data
storage and TLB error interrupts are associated with execution of instruction
cache management instructions.
When the interrupt occurs, the processor suppresses execution of the instruction that caused it. Registers
are updated as follows:
Table 4-11. Data Storage Interrupt Register Settings
Register Setting
Exception Cause
Execute access In user mode, an instruction fetch attempts to access memory that is not user mode execute enabled (page
control exception access control bit UX = 0).
In supervisor mode, an instruction fetch attempts to access a memory that is not supervisor mode execute
enabled (page access control bit SX = 0).
When an ISI occurs, the processor suppresses execution of the instruction causing the interrupt.
Registers are updated as shown in this table.
Table 4-13. Instruction Storage Interrupt Register Settings
Register Setting
The e500mc provides two methods of receiving external input interrupts, which is controlled through a
register field in the PIC:
• In one method, the legacy method, the core takes an external input interrupt when the int signal
from the PIC is asserted and the external input interrupt is enabled. The input is level sensitive and
if int is deasserted before the interrupt is enabled, no interrupt occurs. If the interrupt is enabled and
occurs, software reads the memory-mapped Interrupt Acknowledge (IACK) register which
contains the specific vector of the interrupt. This causes the PIC to deassert int until another
interrupt is requested and management of the interrupt is software’s responsibility (it is in-service)
until it performs an associated End of Interrupt (EOI) memory-mapped register write to the PIC.
• In the alternate method known as External Proxy, a signaling protocol occurs between the core and
the PIC. Instead of just signaling int, the PIC also provides the specific vector for the interrupt.
When the interrupt is enabled and the PIC is asserting int, the interrupt occurs and the core
communicates to the PIC that the interrupt has been taken and provides the vector from the PIC in
the (G)EPR register which software then can read. As part of the communication with the PIC, the
PIC puts the specific interrupt in-service as if software had read the IACK register in the legacy
method. This method is further described in Section 4.9.6.3, “External Proxy.”
Register Setting
(G)EPR If external proxy is used, (G)EPR holds the vector offset that identifies the source that generated the interrupt triggered
from the PIC. For external interrupts not generated using interrupt proxy, (G)EPR is updated to all zeros.
of the integrated device. This functionality is enabled through a register field defined by the PIC and
documented in the reference manual for the integrated device.
Using this interface reduces the latency required to read and acknowledge the interrupt that normally
requires a cache-inhibited guarded load to the memory controller.
In previous integrated devices, when the core received a signal from the PIC indicating that the external
interrupt was necessary to handle a condition typically presented by an integrated peripheral device, the
interrupt handler responded by reading a memory-mapped register (interrupt acknowledge, or IACK)
defined by the Open PIC standard. In addition to providing an additional vector offset specific to the
peripheral device, this read negated the internal signal and changed the status of the interrupt request from
pending to in-service in which state it would remain until the completion of the interrupt handling.
The external proxy eliminates the need to read the IACK register by presenting the vector to the external
proxy register (EPR), or guest external proxy register (GEPR), described in Section 2.9.5, “(Guest)
External Proxy Register (EPR/GEPR).”
Instead of just signaling int, the PIC also provides the specific vector for the interrupt. When the interrupt
is enabled and the PIC is asserting int, the interrupt occurs and the core communicates to the PIC that the
interrupt has been taken and provides the vector from the PIC in the (G)EPR register which software then
can read. As part of the communication with the PIC, the PIC puts the specific interrupt in-service as if
software had read the IACK register in the legacy method. The PIC always asserts the highest priority
pending interrupt to the core and the interrupt that is put in-service is determined by when the core takes
the interrupt based on the appropriate enabling conditions.From a system software perspective, the core
does not acknowledge the interrupt until the external input interrupt is taken.
Software in the external input interrupt handler would then read (G)EPR to determine the vector for the
interrupt. The value of the vector in (G)EPR does not change until the next external input interrupt occurs
and therefore software must read (G)EPR before re-enabling the interrupt.
When using external proxy (and even with the legacy method), software must ensure that end-of-interrupt
(EOI) processing is synchronized with taking of external input interrupts such that the EOI indicator is
received so that the interrupt controller can properly pair it with the source. For example, writing the EOI
register for the PIC would require that the following sequence occur:
block interrupts; // turn EE off for external interrupts
write EOI register; // signal end of interrupt
read EOI register; // ensure write has completed
unblock interrupts; // allow interrupts
NOTE
The architecture does not support use of a misaligned EA by load and
reserve or store conditional instructions. If a misaligned EA is specified, the
alignment interrupt handler must treat the instruction as a programming
error and not attempt to emulate the instruction.
• A dcbz, dcbzep, dcbzepl, or dcbzl is attempted to a page marked write-through or cache-inhibited.
For other accesses, the e500mc performs misaligned accesses in hardware within a single cycle if the
misaligned operand lies within a doubleword boundary. Accesses that cross a doubleword boundary
degrade performance. Although many misaligned memory accesses are supported in hardware, their
frequent use is discouraged because they can compromise overall performance. Only one outstanding
misalignment at a time is supported, which means it is nonpipelined. A misaligned access that crosses a
page boundary completely restarts if the second portion of the access causes a TLB miss or a DSI after the
associated interrupt has been serviced and the TLB miss or DSI handler has returned to re-execute the
instruction. This can cause the first access to be repeated.
When an alignment interrupt occurs, the processor suppresses execution of the instruction causing the
alignment interrupt. Registers are updated as shown in Table 4-15.
Table 4-15. Alignment Interrupt Register Settings
Register Setting
DEAR Set to the EA of a byte in the range of bytes being accessed and on the page whose access caused the exception
Floating-point A floating-point enabled exception is caused when FPSCR[FEX] is set to 1 by the execution FP
enabled of a floating-point instruction that causes an enabled exception, including the case of a Move
to FPSCR instruction that causes an exception bit and the corresponding enable bit both to
be 1. Note that in this context, the term ‘enabled exception’ refers to the enabling provided by
control bits in the FPSCR.
Illegal Attempted execution of any of the following causes an illegal instruction exception. PIL
instruction • A reserved-illegal instruction or an undefined instruction encoding.
• A mtspr or mfspr that specifies a SPRN value that is not implemented.
• A mtspr that specifies a read-only SPRN.
• A mfspr that specifies a write-only SPRN.
• A defined, unimplemented instruction.
On e500mc an instruction in an invalid form causes boundedly undefined results.
Privileged MSR[PR] = 1 and execution is attempted of any of the following: PPR
instruction • A privileged instruction or a hypervisor privileged instruction.
• mtspr or mfspr that specifies a privileged SPR.
• mtpmr or mfpmr that specifies a privileged PMR.
Trap When any of the conditions specified in a trap instruction are met and the exception is not also PTR
enabled as a debug interrupt. If enabled as a debug interrupt (that is, (DBCR0[TRAP] = 1 &
DBCR0[IDM] = 1 & MSR[DE] = 1) & (MSR[GS] | ~EPCR[DUVD])), then a debug interrupt is
taken instead of the program interrupt.
Unimplemented e500mc does not take unimplemented operation exceptions. All defined, but unimplemented —
operation instructions take an illegal instruction exception.
Register Description
Register Description
SRR0 Set to the EA of the instruction causing the floating-point unavailable interrupt.
SRR1 Set to the MSR contents at the time of the interrupt.
MSR • ME, CE, and DE are unchanged.
• RI is not cleared.
• All other defined MSR bits are cleared.
>1 — Undefined1
1 — IVOR40
0 0 IVOR8
1 GIVOR8
1 For e500mc, only the low order bit of the LEV field is
used and the (G)IVOR is used accordingly, however
software should not depend on this behavior.
Register Description
For a system call interrupt, instruction execution resumes at address (G)IVPR[32–47] || (G)IVOR8[48–59]
|| 0b0000.
For a hypervisor system call interrupt, instruction execution resumes at address IVPR[32–47] ||
IVOR40[48–59] || 0b0000.
Hypervisor system call interrupts are provided as way to communicate with the hypervisor software.
NOTE
The hypervisor should check SRR1[PR,GS] to determine the privilege level
of the software making a hypervisor system call to determine what action,
if any, should be taken as a result of the hypervisor system call.
Register Setting
NOTE
To avoid a subsequent redundant decrementer interrupt, software is
responsible for clearing the decrementer exception status prior to
re-enabling MSR[EE] or MSR[GS]. To clear the decrementer exception, the
interrupt handling routine must clear TSR[DIS] by writing a word to TSR
using mtspr with a 1 in any bit position that is to be cleared and 0 in all other
positions. The write-data to the TSR is not direct data, but a mask: A 1
causes the bit to be cleared, and a 0 has no effect.
Register Setting
Register Setting
Exception Description
Data TLB miss exception Virtual addresses associated with a data access do not match any valid TLB entry.
When the interrupt occurs, the processor suppresses execution of the excepting instruction. Registers are
updated as shown in this table.
Table 4-25. Data TLB Error Interrupt Register Settings
Register Setting
(G)SRR0 Set to the EA of the instruction causing the data TLB error interrupt.
(G)DEAR Set to the EA of a byte that is both within the range of the bytes being accessed by the memory access or cache
management instruction and within the page whose access caused the exception.
(G)ESR [ST] Set if the instruction causing the interrupt is a store, dcbi, dcbz, or dcbzl; otherwise cleared
[FP] Set if the instruction causing the interrupt is a floating-point load or store.
[EPID] Set if the instruction causing the interrupt is an external pid instruction.
All other defined ESR bits are cleared
MASn If EPCR[DMIUH] = 1, and a Instruction or Data TLB Error, ISI, or DSI is directed to the hypervisor, MAS registers
are not changed.
See Table 6-6.
Exception Description
Instruction TLB miss exception Virtual addresses associated with an instruction fetch do not match any valid TLB entry.
When an instruction TLB error interrupt occurs, the processor suppresses execution of the instruction
causing the exception.
Register Setting
(G)SRR0 Set to the EA of the instruction causing the instruction TLB error interrupt.
MASn If EPCR[DMIUH] = 1, and a Instruction or Data TLB Error, ISI, or DSI is directed to the hypervisor, MAS registers
are not changed.
See Table 6-6.
Register Description
DSRR0 For exceptions occurring while debug interrupts are enabled (DBCR0[IDM] and MSR[DE] = 1), DSRR0 is set as
follows:
• For instruction address compare (IAC registers), data address compare (DAC1R, DAC1W, DAC2R, and DAC2W),
trap (TRAP), or branch taken (BRT) debug exceptions, set to the EA of the instruction causing the interrupt.
• For interrupt taken (IRPT) debug exceptions (CIRPT for critical interrupts), set to the EA of the first instruction of the
interrupt that caused the event.
• For instruction complete (ICMP) debug exceptions, set to the EA of the instruction that would have executed after
the one that caused the interrupt.
• For return from interrupt (RET) debug exceptions, set to the EA of the instruction (rfi, rfci, or rfgi) that caused the
interrupt.
• For unconditional debug event (UDE) debug exceptions, set to the EA of the instruction that would have executed
next had the interrupt not occurred.
For exceptions occurring while debug interrupts are disabled (DBCR0[IDM] = 0 or MSR[DE] = 0), the interrupt occurs
at the next synchronizing event if DBCR0[IDM] and MSR[DE] are modified such that they are both set and if the DBSR
still indicates status. When this occurs, DSRR0 holds the EA of the instruction that would have executed next, not the
address of the instruction that modified DBCR0 or MSR and caused the interrupt.
DSRR1 Set to the MSR contents at the time of the interrupt.
MSR • ME, is unchanged.RI is not cleared.
• All other defined MSR bits are cleared.
DBSR Set to indicate type of debug event. See Section 2.17.6, “Debug Status Register (DBSR/DBSRWR).”
Note that on the e500mc, if DBCR0[IDM] is cleared, no debug events occur. That is, regardless of MSR,
DBCR0, DBCR1, and DBCR2 settings, no debug events are logged in DBSR and no debug interrupts are
taken.
The e500mc complies with the architecture debug definition, except as follows:
• Data address compare is only supported for effective addresses.
• Instruction address compares IAC3 and IAC4 are not supported.
• Instruction address compare is only supported for effective addresses.
• Data value compare is not supported.
Instruction execution resumes at address IVPR[32–47] || IVOR15[48–59] || 0b0000.
Register Setting
Value Description
0 Doorbell interrupt (DBELL). Causes a processor doorbell exception on a processor that receives and accepts the
message.
1 Doorbell critical interrupt (DBELL_CRIT). Causes a processor doorbell critical exception on a processor that receives
and accepts the message.
2 Guest processor doorbell interrupt (G_DBELL). Causes a guest processor doorbell exception on a processor that
receives and accepts the message.
3 Guest processor doorbell critical interrupt (G_DBELL_CRIT). Causes a guest processor doorbell critical exception on a
processor that receives and accepts the message.
4 Guest processor doorbell machine check interrupt (G_DBELL_MC). Causes a guest processor doorbell machine check
exception on a processor that receives and accepts the message.
Register Setting
Register Setting
Register Setting
Register Setting
Register Setting
processor doorbell machine check exceptions are generated when guest doorbell machine check type
messages are received and accepted by the processor.
Registers are updated as shown in this table.
Table 4-35. Guest Processor Doorbell Machine Check Interrupt Register Settings
Register Setting
Register Setting
Hypervisor privilege interrupts are provided as a means for restricting the guest supervisor state from
performing operations allowed only in the hypervisor state. Table 4-37 lists the resources that cause a
hypervisor privilege exception when accessed in guest supervisor state.
Table 4-37. Hypervisor Privilege Exceptions from Guest Supervisor State
Instructions
ehpriv — — Yes —
msgclr — — Yes —
msgsnd — — Yes —
rfci — — Yes —
rfdi — — Yes —
tlbivax — — Yes —
tlbre — — Yes —
tlbsx — — Yes —
tlbsync — — Yes —
tlbwe — — Yes —
SPRs
DECAR Yes Yes — e500mc allows reading of DECAR although Power ISA does not
define it.
EPCR Yes Yes — New register, allows hypervisor to direct certain interrupts and mask
hypervisor debug events.
Table 4-37. Hypervisor Privilege Exceptions from Guest Supervisor State (continued)
GPIR No Yes — —
Table 4-37. Hypervisor Privilege Exceptions from Guest Supervisor State (continued)
PIR No Yes — Guest supervisor state access to PIR maps to GPIR for reads.
PMRs
SRR0/1, MSR[EE] is set to 0 preventing external input, decrementer, fixed interval timer, and processor
doorbell interrupts from occurring). Software must ensure that synchronous exceptions do not occur prior
to saving the save/restore registers.
This table lists actions system software must avoid before saving save/restore register contents.
Table 4-38. Operations to Avoid Before Save/Restore Register are Saved to Memory
Operation Reason
Reenabling MSR[EE] , MSR[CE], MSR[DE], or Prevents any asynchronous interrupts, as well as (in the case of MSR[DE])
MSR[ME] in interrupt handlers any debug interrupts, including synchronous and asynchronous types
Branching (or sequential execution) to addresses Prevents instruction storage and instruction TLB error interrupts
not mapped by the TLB or mapped without SX set.
Load, store, or cache management instructions to Prevents data storage and data TLB error interrupts
addresses not mapped or without permissions.
Execution of System Call (sc), trap (tw, twi, td, tdi), Prevents system call and trap exception-type program interrupts. Note that
or ehpriv instructions ehpriv instructions can be executed in guest supervisor state.
Execution of any illegal instructions Prevents illegal instruction exception-type program interrupts
Execution of any instruction that could cause an Prevents alignment interrupts, as described in Section 4.9.7, “Alignment
alignment interrupt Interrupt—IVOR5.”
that other exception, independent of whether that other exception’s corresponding interrupt type is enabled
or disabled.
Except as specifically noted, only one of the exception types listed for a given instruction type is permitted
to be generated at any given time.
NOTE
Mutually exclusive exception types otherwise with the same priority are
listed in the order suggested by the sequential execution model.
0 Machine Check Machine Check Asynch N/A Asynchronous exceptions may come
from the processor or from an external
source.
Debug- IDE Debug Asynch N/A Imprecise debug event usually taken
after MSRDE goes from 0 to 1 via rfdi or
mtmsr.
Debug - Interrupt Debug Asynch N/A Debug interrupt taken after original
Taken interrupt has changed NIA (Next
Instruction Address) and MSR.
Debug - Critical Debug Asynch N/A Debug interrupt taken after original
Interrupt Taken critical interrupt has changed NIA and
MSR.
interrupt points to the instruction causing the exception, or if the instruction completes (post) and the corresponding interrupt
points to the next instruction to be executed.
17 System Call Base Synch post System Call Interrupt has SRR0 pointing
to instruction after sc (that is, post
completion).
21 Debug - Instruction Debug Synch post Debug - Instruction Complete Interrupt has
Complete DSRR0 pointing to next instruction (that is,
post completion).
1
The interrupt level defines which set of save/restore registers are used when the interrupt is taken. They are: Base: SRR0/1,
Critical: CSRR0/1, Debug: DSRR0/1, and Machine Check: MCSRR0/1.
2
Pre or Post Completion refers to whether the exception occurs before an instruction completes (pre) and the corresponding
interrupt points to the instruction causing the exception, or if the instruction completes (post) and the corresponding interrupt
points to the next instruction to be executed.
5.1 Overview
This section lists features of the LSU, the Fetch Unit, the L1 cache, the L2 cache and CoreNet interface.
The LSU has the following features:
• System memory accesses critical quad-word first. For data accesses, the LSU receives the critical
quad word as soon as it is available; it does not wait for all 64 bytes. That data is forwarded to the
requesting unit before being written to the cache, minimizing stalls due to cache fill latency.
• Store queueing. Stores cannot execute speculatively and remain queued until completion logic
indicates that the store is to be committed. When the L1 cache is accessed, stores are deallocated
from the queue (regardless of whether the cache is updated). If the address is caching-inhibited, the
store passes from the queue to the BIU and into the memory subsystem.
• L1 load miss queueing. On a load miss, the LSU allocates buffers and then queues a bus transaction
to read the line. The LSU processes load hits and load misses until one of the following conditions
occurs:
— There are more than nine outstanding load misses.
— The LSU tries to perform a load miss and there is no place to buffer a new cache line.
• Store miss merging. When a caching-allowed store misses in the data cache, the store data is
written to a cache line–wide buffer. The bytes in the cache line not specified by the store are
allocated when the cache line is eventually fetched from memory. When all 64 bytes are valid, the
cache line is reloaded into the data cache. This behavior is known as store miss merging.
If a subsequent store miss hits in the buffered data, the new data is buffered along with the original
store. Any number of subsequent stores intended for that cache line can be buffered before the
corresponding data cache line is allocated.
• Data line fill buffering extends the cache for loads and caching-allowed stores. Accesses to pages
marked as cacheable may keep copies of data. Therefore, cache management instructions, such as
dcbf, are required even if the L1 data cache is disabled.
The L1 cache implementation has the following features:
• Separate 32-KB instruction and data caches (Harvard architecture)
• Eight-way set associative, nonblocking caches
• Physically addressed cache directories. The physical (real) address tag is stored in the cache
directory.
• 2-cycle access time provides 3-cycle read latency for instruction and data caches accesses;
pipelined accesses provide single-cycle throughput from caches. For details about latency issues,
see Chapter 10, “Execution Timing.”
• Instruction and data caches have 64-byte cache blocks. A cache block is the block of memory that
a coherency state describes, also referred to as a cache line.
• Four-state modified/exclusive/shared/invalid (MESI) protocol supported for the data cache. See
Section 5.5.1, “Data Cache Coherency Model.”
• Both L1 caches support error detection (enabled through L1CSR0 and L1CSR1 bits), as follows:
— Instruction cache: 1 parity bit per word of instruction, 1 bit of parity per tag
— Data cache: 1 parity bit per byte of data, 1 bit of parity per tag
See Section 5.4.4, “L1 Cache Error Detection and Correction.”
• Both caches also support error injection, which provides a way to test error recovery software by
intentionally injecting errors into the instruction and data caches. See Section 5.4.5, “Cache Error
Injection.”
• The L1 instruction cache supports automatic error correction by invalidation when an access
detects a parity error. The subsequent reporting and taking of a machine check or error report
interrupt causes the instruction to be refetched after invalidation thus correcting the error.
• The L1 data cache supports automatic error correction by invalidation when operating in write
shadow mode. In write shadow mode, all writes to the L1 data cache are written through to the L2
cache. When an access detects an uncorrectable error, the cache is invalidated, and the subsequent
reporting and taking of a machine check or error report interrupt causes the instruction to be
re-executed after invalidation thus correcting the error. See Section 5.4.2, “Write Shadow Mode”.
• Each cache can be independently invalidated through cache flash invalidate (CFI) control bits
located in L1CSR1 and L1CSR0. See Section 5.6.3, “L1 Cache Flash Invalidation.”
• Pseudo–least-recently-used (PLRU) replacement algorithm. See Section 5.8.2.1, “PLRU
Replacement.”
• Support for individual line locking. See Section 5.6.4, “Instruction and Data Cache Line
Locking/Unlocking.”
• Support for cache stashing to the L1 data cache from other devices in the integrated device.
• Both instruction and data cache lines are filled in a single-cycle, 64 -byte write from line fill buffers
as described in Section 5.3.1, “Load/Store Unit (LSU).” Cache line fills write all 64 bytes at once,
and therefore do not occur until all data has been buffered from the CoreNet interface.
The L2 write-back, backside cache has the following features:
• Dynamic Harvard architecture, merged instruction and data cache
• 128-KB array organized as 256 eight-way sets of 64-byte cache lines
• 36-bit physical address
• Exclusive, modified, shared, invalid, incoherent, locked, and stale states
• 8-way set associativity with a streaming, 7-bit, pseudo-LRU (PLRU) algorithm with aging
replacement
• Supports data- and instruction-only and way partitioned cache operation. See Section 5.9.3, “L2
Configuration and Partitioning.”
• 64-byte (16-word) cache-line, coherency-granule size
• Support for individual line locking. See Section 5.9.2, “L2 Line Locking.”
• The L2 is a victim cache for data lines and generally inclusive for instruction lines. The L2 contains
only those cache entries that have been cast out from the L1 data cache (the L2 is not reloaded when
the data is reloaded in the L1 data cache). The L1 and L2 caches may or may not have valid copies
of the same line at the same time.
• The L2 is reloaded whenever the L1 instruction cache is reloaded, but L1 instruction cache entries
remain even if they are evicted from the L2 (there is no back invalidation).
• An instruction fetch does not cause eviction of modified lines if they hit in L2. Both the instruction
cache and L2 may have a copy of the line.
• For a transaction with L2 cache, CT = 2, a hit in L1 remains in the L1 unless the transaction is
dcbtls or dcbtstls, which cause the line to be cast out of the L1 cache.See section
• Locked L2 cache lines are not reloaded with a lock in L1 or vice versa.
• L2 cache lookup happens only if L1 cache lookup misses in L1 for the load- or store-type
instructions. Snoop starts in L1 and L2 caches in parallel.
• Two-cycle, nonpipelined data array access
• Latency of 9 cycles after L1 access with one access every two cycles
• Configurable ECC or parity protection for data array
• Parity protection for tag array
• Support for cache stashing to the L2 data cache from other devices in the integrated device.
• ABIST support
The BIU is the core’s interface manager to CoreNet and the rest of the system. The BIU sends and receives
transactions from CoreNet and routes them to the appropriate other units in the core that require them.
The BIU is connected to the CoreNet interface which provides the interprocessor and inter-device
connection for address based transactions. CoreNet itself is not described in this document, but has the
following features:
• The CoreNet interface fabric provides interconnections among the cores, peripheral devices, and
system memory in a multicore implementation. Along with handling basic storage accesses, it
manages cache coherency and consistency. CoreNet interfaces run synchronously or
asynchronously to the processor core frequency. When asynchronous, it allows arbitrary frequency
ratios between the core the rest of the system. The synchronous or asynchronous nature of the
CoreNet interface is a function of the design of the integrated device.
• Power Architecture® ordering semantics
• Power Architecture coherency support
• Supports intervention (where a cache line is supplied directly from another cache without having
to first be written to memory)
• Non-retry based protocol
• Supports stashing to core caches from certain devices
Cache identifiers (stash IDs) within the entire system should be set to unique values. That is, cache IDs
should not be set such that more than one cache in the system has the same ID (other than 0, which disables
stashing for that cache). Doing so is considered a programming error and may cause a core or the system
to hang.
Like a prefetch or “touch” operation, stashing to a cache is a performance hint. The stash operation
initiated by a device can improve performance if the stashed data is prefetched into the targeted cache prior
to when the data is accessed. This avoids the latency of bringing the data into the cache at the time it is
needed by the processor. However, since stash operations are hints, depending on conditions within the
memory hierarchy and the core, stashes may not always be performed when requested. An integrated
device that initiate stashing operations to the core can optimize its usage of stashing if it is configured to
understand the amount of buffering dedicated to incoming stashing operations.
The e500mc reserves two Data Line Fill Buffers (holding a cacheline of storage each) to perform incoming
stashing operations. If both the L1 and L2 cache have stashing disabled, the Data Line Fill Buffers reserved
for stashing are freed to be used for other core linefill operations. See the reference manual for the
integrated device for information on configuring devices that perform stashes to optimize use of stashing
based on the core's resources reserved for handling stashes.
I-Cache
Tags Instruction Data D-Cache
queueing queueing Tags
I-Cache 4 and and
Status instructions buffering buffering D-Cache
8 bytes Status
I-Cache
D-Cache
16 instructions
(cache block)
64-byte line
Bus Interface Unit (16 word)
The following sections briefly describe the LSU, the instruction unit, the BIU, and the CoreNet interface.
64 Sets
16 Words/Block (Line)
Each block (line) consists of 64 bytes of data, 3 status bits (M, V, and S), 1 lock bit, 1 cast-out bit and an
address tag. For the L1 data cache, a cache block is the 64-byte cache line. Also, although it is not shown
in Figure 5-2, the data cache has 1 parity bit/byte and 1 parity bit/tag.
Each cache block contains 16 contiguous words from memory that are loaded from an 16-word boundary
(that is, physical addresses bits 30–35 are zero). Cache blocks are also aligned on page boundaries.
Physical address bits PA[24:29] provide the index to select a cache set. The tags consist of physical address
bits PA[0:23]. Address translation occurs in parallel with set selection (from PA[24:29]). Lower address
bits PA[30:35] locate a byte within the selected block.
The data cache can be accessed internally while a fill for a miss is pending (allowing hits under misses)
and the data from a hit can be used as soon as it is available. The LSU forwards the critical doubleword to
any pending load misses and allows them to finish. Later, when all the data for the miss has arrived, the
entire cache line is reloaded. In addition, subsequent misses can also be sent to the memory subsystem
before the original miss is serviced (allowing misses under misses). Up to nine misses can be pending,
however those nine misses can only occur to up to five different cache lines.
A cast-out bit indicates whether a cache line chosen for eviction should be cast out to the L2 cache. In
general a line is cast out of the L1 cache to the L2 cache when it is victimized for replacement.
64 Sets
16 Instructions /Block
Each block consists of 16 instructions, 1 status bit (V), 1 lock bit, and an address tag. Also, although it is
not shown in Figure 5-3, the instruction cache has 1 parity bit/word (8 parity bits for each line) and one
parity bit/tag.
As with the data cache, each block is loaded from a 16-word boundary (that is, bits 30–35 of the physical
addresses are zero). Instruction cache blocks are also aligned on page boundaries. Also, PA[24:29]
provides the index to select a set and PA[30:33] selects an instruction within a block. The tags consist of
physical address bits PA[0:23]. Address translation occurs in parallel with set selection.
The instruction cache can be accessed internally while a fill for a miss is pending (allowing hits under
misses). Although the data cannot be used, the hit information stops a subsequent miss from requesting a
fill. In addition, subsequent misses can also be sent to the memory subsystem before the original miss is
serviced (allowing misses under misses). When a miss is actually updating the cache, subsequent accesses
are blocked for 1 cycle. (But up to four instructions being loaded into the instruction cache can be
forwarded simultaneously to the instruction unit.)
The instruction cache does not implement a full coherence protocol; a single status bit indicates whether
a cache block is valid. Each line has a single bit for locking. Victimized lines from the L1 instruction cache
are not cast-out to the L2 cache.
— 0b10: The value is reserved. This may cause boundedly undefined behavior.
— 0b11: The value is reserved. This may cause boundedly undefined behavior.
Line fill operations to the L1 instruction cache can be created by invalidating addresses in the
cache using icbi, then causing those instructions to be fetched. Line fill operations to the L1
data cache can be created by invalidating addresses using dcbf then performing load operations
to those addresses. Store operations can be created by writing data to cacheable memory using
store (or store class) instructions.
Single-bit errors injected into the data array are accomplished by inverting the parity bit for
each byte.
NOTE
Error checking for the L1 instruction cache must be enabled
(L1CSR1[ICECE] = 1) when L1CSR1[ICEI] is set. Similarly for the data
cache, L1CSR0[CECE] must be set if L1CSR0[CEI] is set. L1CSR0[CEII]
cannot be set (using mtspr) without setting L1CSR0[CECE].
L1CSR1[ICEI] cannot be set without setting L1CSR1[ICECE].
As described above, if a cache error is detected, a machine check interrupt occurs. Sources for cache errors
are described in Section 4.9.3, “Machine Check Interrupt—IVOR1.”
Name Description
Modified (M) The line in the cache is modified with respect to main memory. It does not reside in any other coherent cache.
Exclusive (E) The line is in the cache, and this cache has exclusive ownership of it. It is in no other coherent cache and it is
the same as main memory. This processor may subsequently modify this line without notifying other bus
masters.
Shared (S) The addressed line is in the cache, it may be in another coherent cache, and it is the same as main memory.
It cannot be modified by any processor.
Invalid (I) The cache location does not contain valid data.
Every data cache block state is defined by its status. Note that in a multiprocessor system, a cache line can
exist in the exclusive state in at most one L1 data cache at a time.
The core provides full hardware support for cache coherency and ordering instructions and for TLB
management instructions.
The core broadcasts cache management instructions (dcbst, dcbstep, dcbf, dcbi (M=1), icbi, icbiep),
synchronization instructions (mbar - all forms, sync 0), TLB management instructions (tlbsync, tlbivax),
and cache touch or locking instructions with CT=1.
A write-back store that hits a line that is already in exclusive state is immediately stored to the line; the
state is changed to modified. If a write-back store hits a line that is already in the modified state, it is
immediately stored to the line, and the line stays as modified.
If a write-back store operation (that is, caching-allowed and not write-through) hits a line in the shared
state, the cache line is first invalidated and a read-with-intent-to-modify is issued to the BIU and CoreNet.
The line is received through the BIU and the written data is merged into the line in the DLFB. The line is
then written to the cache marked as modified. If a write-back store misses in the cache, the action is the
same as the shared case, except the line is not first invalidated (as it is not present).
— Software is designed, in general, with some sort of mutual exclusion or locking mechanism
regardless of the architecture (because software running on one processor must make several
updates to data structure atomically).
• The device driver case
— Code is running that controls a device through memory-mapped addresses.
— Accesses to these memory-mapped registers usually need to occur in a specific order because
the accesses have side effects (for example a store to an address causes the device to perform
some action and the order these actions are performed must be explicitly controlled in order for
the device to perform correctly).
— Addresses are usually marked as caching-inhibited and guarded because the memory is not
“well behaved.”
• The processor synchronization case.
— Some registers within the processor, such as the MSR, have special synchronization
requirements associated with them to guarantee when changes which may effect memory
accesses, occur. (see Section 3.3.3, “Synchronization Requirements,” for the specific registers
and their synchronization requirements).
— Only system programmers modifying these special registers need be aware of these cases.
lwz r3,0(r4)
lwz r5,0(r3)
Because the second load’s address depends on the first load being performed and providing data,
the processor must ensure that the first load occurs before the second is attempted (and in fact must
be sure the first load has returned data before even attempting translation).
3. Guarded caching-inhibited stores must be performed in order with respect to other guarded
caching-inhibited stores and guarded caching-inhibited loads must be performed in order with
respect to other guarded caching-inhibited loads. This generally only applies to writing device
drivers that control memory mapped devices with side effects through store operations.
4. A store operation cannot be performed before a previous load operation regardless of the addresses.
That is a load is followed by a store, then the load is always performed before the store is. This is
an EREF: A Programmer’s Reference Manual for Freescale Power Architecture® Processors
requirement of Freescale processors. It is unlikely, but possible that other Power Architecture cores
may not require this.
— Loads and stores that are both caching-inhibited and guarded (WIMGE = 0b01x1x) as well as
stores that are write-through required (WIMGE = 0b10xxx). This is useful for the device driver
case which would be doing loads and stores to caching-inhibited memory.
— Stores that have the following attributes: not caching-inhibited, not write-through required, and
memory coherence required (WIMGE = 0b001xx). These are stores to normal cacheable
coherent memory.
mbar 1 is a better performing memory barrier than sync or mbar. For more details refer Chapter
5-Instruction Set in the “EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors”
• lwsync (or sync 1)—lwsync (lightweight sync) creates a barrier for normal cacheable memory
accesses (WIMGE = 0b001xx). It orders all combinations of the loads and stores except for a store
followed by a load. This is the most efficient barrier for normal SMP programming when dealing
with multiprocessor locks and critical regions.
lwsync is a better performing memory barrier than sync, mbar, or mbar 1.
Another method also exists for ordering all caching-inhibited loads and stores which are guarded. The
HID0[CIGLSO] bit can be set to force all caching-inhibited loads and stores which are guarded to be
performed in order. This is not a barrier, per se, but a system attribute that causes the core to always order
these accesses. Setting this bit is a good way to deal with the device driver case over a broad range of code
if the memory accesses to the device are caching-inhibited and guarded which is normally the case. This
is likely to perform better than inserting mbar in specific places since the implementation of the e500mc
already orders all of these except for a guarded caching-inhibited store followed by a guarded
caching-inhibited load. In this case, the e500mc simply ensures that the store is performed on CoreNet
prior to attempting the load.
Locking software and multiprocessing software may have various other types of mutual exclusion
and those should be examined with ordering semantics in mind. Power ISA 2.06 Book II Appendix
B gives programming examples for various types of shared storage accesses.
As specified in the architecture, the core requires that, for a store conditional instruction to succeed, its real
address must be to the same reservation granule as the real address of a preceding load and reserve
instruction that established the reservation. The e500mc makes reservations on behalf of aligned 64-byte
blocks of the memory address space.
If the reservation has been canceled for any reason (or the reservation does not match the real address
specified by the store conditional instruction), then the store conditional instruction fails and clears
CR0[EQ]. The reservation may be invalidated by several events. They are described in Section 3.4.9,
“Reservations.”
• Cacheable data accesses bypass the data cache, are forwarded to the memory subsystem as
caching-allowed, and proceed to the CoreNet interface. Returned data is forwarded to the
requesting execution unit, but is not loaded into any of the caches.
• Other cache management instructions do not affect the disabled cache.
NOTE
Data line fill buffering, which extends the cache for loads and
caching-allowed stores, remains enabled. Pages marked as cacheable are
accessed and may keep copies of data. therefore, cache management
instructions, such as dcbf, may be required even if the L1 data cache is
disabled.
When the instruction cache is disabled (L1CSR1[ICE] = 0), instruction accesses bypass the instruction
cache. These accesses are forwarded to the memory subsystem as caching-allowed and proceed to the
CoreNet interface. When the instructions are returned, they are forwarded to the instruction unit but are
not loaded into the instruction cache.
NOTE
Instruction line fill buffering, which extends the cache for fetches, remains
enabled. Pages marked as cacheable are accessed by performing a
cache-line burst transaction even when the cache is disabled and may keep
copies of instructions in line fill buffers. therefore, cache management
instructions, such as icbi, may be required even if the L1 instruction cache
is disabled.
When an L1 cache is enabled, software must first properly flash invalidate it to prevent stale data (in the
case where it has been disabled for some period of time during operation) or unknown state (in the case of
power on reset). Software should perform the invalidation by setting the flash invalidation bit (CFI or
ICFI) in the appropriate L1 cache control and status register, and then continue to read CFI (or ICFI) until
the bit is cleared. Software should then perform an isync to ensure that instructions that may have been
prefetched prior to the cache invalidation are discarded. The setting of L1CSR0[CE] or L1CSR1[ICE]
must be preceded by a sync and isync instruction, to prevent a cache from being disabled or enabled in the
middle of a data or instruction access. See Section 3.3.3, “Synchronization Requirements,” for more
information on synchronization requirements.
Because the instruction cache never contains modified data, there is no need to flush the instruction cache
before it is invalidated.
The instruction cache can be invalidated by setting L1CSR1[ICFI]. The L1 caches can be flash invalidated
independently. The setting of L1CSR0[CFI] and L1CSR1[ICFI] must be preceded by an msync and isync,
respectively.
The instruction cache is automatically flash invalidated if any parity error (tag or data) occurs.
Valid bits in both caches are cleared automatically upon reset. If software desires to clear all valid bits in
the caches during operation, software must use the Flash Invalidation bits in L1CSR0 and L1CSR1 (CFI
bits). This causes a flash invalidation, after which the CFI bits are cleared automatically (CFI bits are not
sticky). Flash invalidate operations are local only to the processor which performs them, other processor’s
L1 caches are not affected. Software should always poll the CFI bits after setting them to determine when
the invalidation has been completed and then perform an isync. Software must use care when invalidating
the entire data cache to ensure that no modified data exists in the cache by first flushing the cache unless
software does not care about the state that any previous memory operations may have attained.
Individual instruction or data cache blocks can be invalidated by using icbi and dcbi. Note that
invalidating the caches resets lock bits (causing the locks to be lost) in the L1 caches. Also note that with
dcbi, the e500mc invalidates the cache block without pushing it out to memory if WIMGE=0bx00xx. If
WIMGE=0bx01xx, the e500mc performs a dcbf and pushes any modified state to memory before
invalidating the cache block. See Section 3.4.11.3.1, “Supervisor-Level Cache Instruction.”
Exceptions and other events that can access the L1 cache should be disabled during this time so that the
PLRU algorithm can function undisturbed.
Table 5-4 shows how cache locking operations are affected by MSR[GS,PR,UCLE] and MSRP[UCLEP]
which determine whether the core is operating in hypervisor, guest-supervisor, or user (problem state
mode).
Table 5-4. Cache Locking Based on MSR[GS,PR,UCLE] and MSRP[UCLEP]
0 0 — — Execute
x 1 1 — Execute
- 0 — 0 Execute
If all of the ways are locked in a cache set, an attempt to lock another line in that set results in an
overlocking situation. The new line is not placed in the cache, and either the data cache overlock bit
L1CSR0[CLO] or instruction cache overlock bit L1CSR1[ICLO] is set. This does not cause an exception
condition. See Section 3.4.10.2, “Cache Locking Instructions” for a description of what conditions set
these bits.
It is acceptable to lock all ways of a cache set. A nonlocking line fill for modified data to a new address in
a completely locked cache set is not put into the cache. However it is loaded into a DWB and creates the
appropriate normal burst write transfer.
The cache-locking DSI handler must decide whether to lock a given cache line based on available cache
resources.
invalidation of the lock bits performed in a single CPU cycle, after which the CLFC bits are automatically
cleared (CLFC bits are not sticky).
— Set HID0[DCFA], perform isync, then perform a series of loads which access each cache line
once within a contiguous real 36-KB region. Software must ensure that no cache line within
the 52-KB region is in the L1 data cache in the modified state prior to performing the loads.
Clear HID0[DCFA], perform isync.
— Set HID0[DCFA], perform isync, then perform a series of loads or dcbz instructions which
access each cache line once within a contiguous real 32-KB scratch region. Software must
ensure that no cache line within the 32-KB region is in the L1 data cache in any state prior to
performing the loads or dcbz instructions. This can be ensured by only mapping the real pages
in the region in the MMU when the cache flushing routine is performed. The pages must be
marked as guarded and cacheable. Clear HID0[DCFA], perform isync.
• Ensure that all the replaced lines have been written to the memory subsystem by executing sync.
• Flash invalidate the cache by writing a 1 to L1CSR0[CFI], performing the required
synchronization, then polling until the bit is cleared. This ensures that the memory region that was
used to cause line replacement in the cache is not present in the cache should the cache flush routine
get called again before the lines get naturally evicted.
• Re-enable any interrupts that were disabled at the beginning of the cache flush routine.
NOTE
Since the hypervisor can interrupt the guest in the middle of the cache flush
routine, this can cause the PLRU bits to change and perturb the flush
algorithm possibly leaving modified lines in the L1 data cache which are not
flushed. This can be handled by either having the hypervisor treat the setting
of L1CSR0[CFI] to 1 by the guest as a flush and invalidate request, or by
providing an hcall service to perform the flush.
Each of the eight ways of each set in the data cache can be locked (by locking all of the cache lines in the
way with the dcbtls or dcbtstls instruction). When at least one way is unlocked, misses are treated
normally and are allocated to one of the unlocked ways on a reload. If all eight ways are locked, store/load
misses proceed to the memory subsystem as normal caching-allowed accesses. In this case, the data is
forwarded to the requesting execution unit when it returns, but it is not loaded into the data cache. If the
data is modified, it creates the appropriate normal burst write transfer.
Note that caching-inhibited stores should not access any of the caches (see Section 5.5.4.3,
“Caching-Inhibited Loads and Stores,” for more information).
This algorithm prioritizes the replacement of invalid entries over valid ones (starting with way 0).
Otherwise, if all ways are valid, one is selected for replacement according to the PLRU bit encodings
shown in Table 5-5.
Table 5-5. L1 PLRU Replacement Way Selection
B0 0 B1 0 B3 0 L0
0 0 1 L1
0 1 B4 0 L2
0 1 1 L3
1 B2 0 B5 0 L4
1 0 1 L5
1 1 B6 0 L6
1 1 1 L7
Figure 5-4 shows the decision tree used to generate the victim line in the PLRU algorithm.
B0 = 0 B0 = 1
B1 = 0 B1 = 1 B2 = 0 B2 = 1
B3 = 0 B3 = 1 B4 = 0 B4 = 1 B5 = 0 B5 = 1 B6 = 0 B6 = 1
During reset, the PLRU and valid bits of the L1 caches are automatically cleared to point to way L0 of
each set.
Note that only three PLRU bits are updated for any access.
The core does not replace locked lines. Lock bits are used at reload time to steer the PLRU algorithm away
from selecting locked cache lines.
properties that the instruction side and the data side both allocate out of the same pool of available lines
(that is, the cache is physically unified).
This dynamic harvard implementation allows fetches to be treated as non global and reduces the overall
snoop overhead that otherwise might be required by the system, while still allowing instructions and data
lines to allocate from the same pool of available lines in the L2 cache. This means that the amount of lines
in use by instructions or data varies according to how the processor is executing.
When N is set for any line (when it is allocated as the result of an instruction fetch), the transaction to read
that line is sent to CoreNet and marked as non global. A later data transaction does not hit to that line, and
any data transaction that targets a line with the N bit set is sent out to CoreNet to acquire coherent data.
When the data line is received by the L2 cache, if a line with the same tag exists which is valid and has the
N bit set, the line is replaced in the L2 cache by the data line and the N bit status is cleared.
To implement dynamic harvard, the L2 cache snoops icbi operations that are performed, regardless of the
core that performs them. Also operations on the processor that can potentially fill the L2 cache from the
fetch path must be propagated to the L2 cache. icbi operations do not hit to lines that are marked as
coherent (N is not set), since the operation effects the instruction cache only. Similarly, snoops for data
operations from data cache block operations, or from stores do not hit to lines that are marked as incoherent
(N is set) since the operation effects the data cache only.
Software must deal with the incoherence of instruction lines in the L2 cache in the same manner that it
does with the harvard L1 instruction cache. To perform instruction modification, data must first be pushed
from the L2 cache, and when that operation is complete, the instruction side must be invalidated using icbi.
Power Architecture already requires software to perform this operation, so no additional software is
required. If software had previously depended on the flash invalidation of the L1 instruction cache to clear
any cache fetched instructions, this method does not work when the L2 cache is enabled and caching
instruction fetches. For this reason, software is strongly encouraged to perform the architectural method
of modifying instructions using dcbf and icbi.
Software can clear all the locks in the L2 cache by L2CSR0[L2LFC], as described in Section 2.15, “L2
Cache Registers.” Note that this operation takes many cycles.
5.9.5 Errors
Table 5-7 describes when L2ERRDET is updated based on error type.
Table 5-7. Errors in Different Arrays
Error L2PE TMHITDIS TPAR MBEC SBEC PARDI TMHI TPARI MBECCINTEN SBECCINTEN PARINTEN TMHI TPAR MBE SBEC PARE
DIS CDIS CDIS S TINTE NTEN T ERR CCE CER RR
N RR R
Tag 0 x x x x x x x x x x 0 0 0 0 0
multi-
1 0 x x x x 0 x x x x 1 0 0 0 0
way
hit 1 0 x x x x 1 x x x x 1 0 0 0 0
1 1 x x x x 0 x x x x 0 0 0 0 0
1 1 x x x x 1 x x x x 0 0 0 0 0
Tag 0 x x x x x x x x x x 0 0 0 0 0
parity error
1 x 0 x x x x 0 x x x 0 1 0 0 0
1 x 0 x x x x 1 x x x 0 1 0 0 0
1 x 1 x x x x 0 x x x 0 0 0 0 0
1 x 1 x x x x 1 x x x 0 0 0 0 0
Data parity error 0 x x x x x x x x x x 0 0 0 0 0
1 x x 1 1 0 x x x x 0 0 0 0 0 1
1 x x 1 1 0 x x x x 1 0 0 0 0 1
1 x x 1 1 1 x x x x x 0 0 0 0 0
Single bit ECC 0 x x x x x x x x x x 0 0 0 0 0
error
1 x x x 0 x x x x 0 x 0 0 0 1 0
1 x x x 0 x x x x 1 x 0 0 0 1 0
1 x x x 1 x x x x x x 0 0 0 0 0
Multi bit ECC 0 x x x x x x x x x x 0 0 0 0 0
error
1 x x 0 x x x x 0 x x 0 0 1 0 0
1 x x 0 x x x x 1 x x 0 0 1 0 0
1 x x 1 x x x x x x x 0 0 0 0 0
misses in the L1 MMU which hit in the L2 MMU, the L1 TLB entries are replaced from their L2 TLB
counterparts using a true LRU algorithm.
Instruction Data
Access Access
MSR[IS] MSR[DS]
LPIDR:
logical partition ID matched
against TLB[TLPID]
MSR[GS] 32-bit EA
0 = Hypervisor access
1 = guest access
8 bits 0–20 bits* 12–32 bits*
GS LPID AS PID Effective Page Number Byte Address
• The following fields replace the standard values shown in Figure 6-1 to form a virtual address (as
shown in Figure 6-2):
— External guest state (EGS) replaces MSR[GS] in forming the virtual address and is compared
against TLB[TGS] during translation. EGS is writable only in hypervisor state.
— External logical partition ID (ELPID) replaces LPIDR and is compared against TLB[TLPID].
ELPID is writable only in hypervisor state.
— External load context AS (EAS) replaces MSR[DS] and is compared against TLB[TS].
— External load context process ID (EPID) replaces PID and is compared against TLB[TID].
EPxC[ELPID]
logical partition ID matched EPxC[EPID] matched
against TLB[TLPID] against TLB[PID]
EPxC[EGS] EPSC[EAS] (store) 32-bit EA
0 = Hypervisor access
1 = guest access EPLC[EAS] (load)
0–20 bits 12–32 bits
EGS ELPID EAS EPID Effective Page Number Byte Address
TLB Entry
TGS TLPID TS TID EPN V
=0?
=?
LPIDR
=0?
=?
MSR[GS]
=?
EA Page Number bits
L2 MMUs (unified)
I-L1VSP D-L1VSP
Additionally, Figure 6-4 shows that when the L2 MMU is checked for a TLB entry, both TLB1 and TLB0
are checked in parallel. It also identifies the L1 MMUs as invisible to the programming model (not
accessible to the operating system); they are managed completely by the hardware as inclusive caches of
the corresponding L2 MMU TLB entries. Conversely, the L2 MMU is managed by the TLB instructions
by way of the MAS registers.
A hit to multiple TLB entries in the L1 MMU (even if they are in separate arrays) is considered to be a
programming error. This is also the case if an access results in a hit to multiple TLB entries in the L2
MMU.
Table 6-2 lists the various TLBs and describes their characteristics.
Table 6-2. Index of TLBs
Location Name Page Sizes Supported Associativity Number of TLB Entries Translations Filled by
Figure 6-5 shows the organization of the L1 TLBs in both the instruction and data L1 MMUs.
Virtual Addresses
L1VSP
VAs 0 Compare
1 Compare
...
...
...
7 Compare
RPN hit
L1TLB4K
V
V
V
way 3
0 V Compare
way 2
Compare
way 1
Compare
way 0
Compare
Select
hit
15
MUX
Virtual Addresses
TLB1
0 Compare
VAs 63 Compare
hit
127
MUX
Real Address
RPN
(translated bits,
depending on page size)
Figure 6-6. L2 MMU TLB Organization
TLB entry. Note that successful invalidation operations in the L2 MMU also invalidate matching entries
in the L1 MMU.
2-bit
counter
On an L1 MMU miss, L1 MMU array entries are automatically reloaded using entries from their level 2
array equivalent. For example, if the L1 data MMU misses but there is a hit for a virtual address in TLB1,
the matching entry is automatically loaded into the data L1VSP array. Likewise, if the L1 data MMU
misses, but there is a hit for the access in TLB0, the matching entry is automatically loaded into the data
L1TLB4K array.
NOTE
Writing to LPIDR or PID causes all L1 entries to be invalidated.
Writing to EPLC or EPSC causes all data-side L1 entries to be invalidated.
Any tlbilx with T = 0 or 1 that clears an L2MMU TLB0/1 entry causes all
L1 TLBs to be invalidated.
NOTE
When any L2 TLB entry is written or invalidated (through any invalidation
mechanism), the corresponding entries in any L1 TLB will also be
invalidated. Changing PID, LPID, EPLC, or EPSC may cause all L1 entries
to be invalidated.
Field Comments
TS Translation address space (compared with AS bit of the current access. For external PID accesses, TS is
compared with EPLC[EAS] or EPSC[EAS].
TID[0–7] Translation ID (compared with PID). For external PID accesses, TID is compared with EPLC[EPID] or
EPSC[EPID].
EPN[0–19] Effective page number (compared with EA[32–51] for 4-Kbyte pages)
RPN[0–23] Real page number: Translated address RA[28–51] for 4-Kbyte pages
PERMIS[0–5] Supervisor execute, write, and read permission bits, and user execute, write, and read permission bits.
WIMGE Memory/cache attributes (write-through, cache-inhibit, memory coherence required, guarded, endian)
U0–U3 User attribute bits—used only by software. These bits exist in the L2 MMU TLBs only (TLB1 and TLB0)
IPROT Invalidation protection (exists in TLB1 only)
VF Virtualization fault. If set, a DSI occurs on data accesses to this page, regardless of the setting of the permission
bits.
0 Data accesses translated by this TLB entry occur normally.
1 Data accesses translated by this TLB entry always cause a Data Storage Interrupt that is directed to the
hypervisor state.
The tlbre, tlbwe, tlbsx, tlbivax, tlbsync, and tlbilx instructions are summarized in this section.
written with following MAS fields: V, IPROT, TID, TS, TSIZE, EPN, WIMGE, RPN, U0—U3, X0, X1,
TLPID, TGS, VF, and permission bits. If the TLB array supports NV, it is written with the NV value.
The effects of updating the TLB entry are not guaranteed to be visible to the programming model until the
completion of a context synchronizing operation. Writing a TLB entry that is used by the programming
model prior to a context synchronizing operation produces undefined behavior.
e500mc does not provide a logical to real translation (LRAT) mechanism so tlbwe can only be executed
by the hypervisor regardless of the state of EPCR[DGTMI]. If the guest supervisor attempts a tlbwe, an
embedded hypervisor privilege interrupt occurs.
Note that architecturally, if the instruction specifies a TLB entry that is not found, the results are undefined.
However, for e500mc, the TLBSEL, ESEL and EPN fields always index to an existing L2 TLB entry and
that indexed entry is written. Note that EPN bits are only used to index into TLB0. In the case of TLB1,
the EPN field is unused for tlbre. See the EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors for information at the architecture level.
Executing tlbsx updates the MAS registers conditionally based on the success or failure of a TLB lookup
in the L2 MMU. The values placed into MAS registers differ, depending on whether the search is
successful. Section 6.7.1, “MAS Register Updates,” describes how MAS registers are updated.
NOTE
Note that rA = 0 is the preferred form for tlbsx and that some Freescale
implementations, including the e500mc, take an illegal instruction
exception program interrupt if rA!=0.
If EA[61] is 0, invalidation is partitioned and only TLB entries whose EPN field matches the EA[0:51]
and whose TLPID field matches the MAS5[SLPID] and whose TGS field matches MAS5[SGS] of the
processor executing the tlbivax instruction are invalidated. Other TLB entries may be invalidated by the
implementation, but in no case will any TLB entries (including ones that match the invalidation criteria)
with the IPROT attribute set be invalidated.
Software should also set MAS6[SPID] and MAS6[SAS] to further identify the entry which is to be
invalidated although the e500mc does not use these values in the comparison (and will thus invalidate
entries regardless of the content of their TID and TS fields). If portability of software to future
implementations is desired, software should not assume that any TLB entry will be invalidated except the
entry corresponding to the EA, MAS5[SLPID], MAS5[SGS], MAS6[SPID], and MAS6[SAS] as future
implementations may invalidate to the stricter MAS6[SPID] and MAS6[SAS] criteria.
Because the virtual address can be much larger than the physical address, a subset of the full virtual address
is broadcast that fits within the space of the implemented physical addressing mode.
Because the tlbivax does not compare PID or AS values, one tlbivax can invalidate multiple entries in a
targeted TLB. A tlbivax targeting TLB0 can invalidate up to all four ways, and up to all four ways within
an L1TLB4K index. A tlbivax targeting TLB1 can invalidate up to all 64 entries in the array, or up to all
8 entries of the L1VSPs (instruction and data). Section 6.3.2.1, “IPROT Invalidation Protection in TLB1,”
describes how to protect TLB1 entries from this type of invalidation.
The tlbivax instruction invalidates all matching entries in the instruction and data L1 TLBs
simultaneously. Also, the core always snoops TLB invalidate transactions and invalidates matching TLB
entries accordingly.
NOTE
Note that rA = 0 is the preferred form for tlbivax and that some Freescale
implementations take an illegal instruction exception program interrupt if
rA!=0.
NOTE
Software must ensure that only one tlbsync operation is active at a given
time. A second tlbsync issued (from any core in the coherence domain)
before the first one has completed, can cause processors to hang. Software
should make sure the tlbsync and its associated synchronization is
contained with a mutual exclusion lock that all processor must acquire
before executing tlbsync.
V 1 Entry is valid
TS 0 Address space 0
TGS 0 Hypervisor address space
TID[0–7] 0 TID value for shared (global) page
TLPID 0 TLPID value for hypervisor page
TID[0–7] 0x00 TID value for shared (global) page
Registers Section/Page
• The TLB entry specified by TLBSEL and ESEL is referred to as TLB0 (if the value comes only
from TLB0), TLB1 (if the value comes only from TLB1), or TLB if the value can come from the
selected TLB array and the field is stored the same regardless of which array it is in.
Table 6-6. MMU Assist Register Field Updates
MAS0
MAS1
V 1 1 0 TLB[V]
MAS2
MAS3
MAS4
WIMGED — — — —
WIMGED, — — — —
X0D,X1D,
TIDSELD,
TLBSELD,
TSIZED
MAS5
SGS — — — —
SLPID — — — —
MAS6
MAS8
VF — TLB[VF] — TLB[VF]
The relationship of these timer facilities (except for the ATB) to each other is shown in Figure 7-1.
32 63 32 63
TBU TBL Timer Clock
(Time Base Clock)
core_tbclk
(Decrementer)
DEC
Decrementer event = 1/0 detect Auto-reload
32 63
DECAR
for the integrated device for details on what occurs on a watchdog timer expiration that should result in
reset.
8.1 Overview
A complete power management scheme for a system using the e500mc requires the support of the
integrated device. The programming model and control of power management states for the core is
provided by the integrated device. With the exception of the wait instruction, all other power management
states are achieved through registers provided by the integrated device.
Power management consists of separate states, shown in Table 8-1 which correspond to power
management states documented in the integrated device reference manual. These states map directly to
core activity states, shown in Table 8-2, which describe more of the state machine of how the core
transitions between states. These transitions are driven by power management signals, shown in Table 8-3,
to the e500mc from the integrated device. In general, software does not need to concern itself about core
activity states or the power management signals since the transitions are handled by signals from the
integrate
State Description
wait The core stops fetching and execution of instructions. All core clocks are active. Timebase continues to
increment and timer functions are active. All state is retained and snooping activity for the caches and other
broadcast CoreNet operations such as msgsnd and tlbivax are still active. The wait state is entered when the
core executes a wait instruction. The wait state is terminated and normal operation resumes when any
asynchronous interrupt is ready to be taken by the core. When the wait state terminates, the core will take the
interrupt, and the save/restore register indicating the address to return to after the interrupt is processed will point
to the instruction following the wait instruction. Note that an external interrupt that is pending, but is not enabled
by the core, will not cause the wait state to be terminated. The wait state is solely initiated by the core and as
such does not participate in the protocol between the integrated device and the core with respect to pm_halted
and pm_stopped core activity states.
Because state is retained in the caches and core registers, and the caches continue to participate in snooping
activities, software does not need to perform any specific actions prior to entering the wait state to ensure that
coherent state is maintained.
doze The doze state provides a similar level of power savings as the wait state, but is controlled by the integrated
device and will terminate when an external asynchronous interrupt is pending, even if the core does not have
that interrupt enabled. The core stops fetching and execution of instructions. All core clocks are active. Timebase
continues to increment and timer functions are active. All state is retained and snooping activity for the caches
and other broadcast CoreNet operations such as msgsnd and tlbivax are still active. The doze state is entered
when the integrated device is programmed to signal the core to enter the doze state. To enter the doze state, the
integrated device signals the core to enter the pm_halted activity state.
The doze state is terminated and normal operation resumes when an asynchronous external interrupt to be
signalled by the integrated device is pending. The doze state may also be terminated when one of the following
internally generated asynchronous interrupts is pending: decrementer, fixed interval timer, watchdog timer,
machine check, performance monitor, processor doorbell, processor doorbell critical, guest processor doorbell,
guest processor doorbell critical, and guest processor doorbell machine check. When the doze state terminates
the integrated device signals the core to exit the pm_halted activity state. The core resumes fetching and
executing instructions from the point at which it stopped executing instructions. If the interrupt condition which
caused the core to exit the doze state is enabled and the interrupt is still pending, the interrupt will immediately
be taken and the save/restore register indicating the address to return to after the interrupt is processed will point
to the instruction which would have executed next after the core entered the pm_halted activity state.
Because state is retained in the caches and core registers, and the caches continue to participate in snooping
activities, software does not need to perform any specific actions prior to entering the doze state to ensure that
coherent state is maintained.
State Description
nap The core stops fetching and execution of instructions. Core clocks are turned off by the integrated device, except
for the timebase. The core retains all its state, however with clocks off the core will not receive and process
transactions from CoreNet. Operations such as snoops, acceptance of messages from a msgsnd operation, and
TLB invalidations from tlbivax operations will not be seen by the core and will be lost with respect to the core.
The nap state is entered when the integrated device is programmed to signal the core to enter the nap state. To
enter the nap state, the integrated device signals the core to enter the pm_halted activity state and then signals
the core to enter the pm_stopped state.
The nap state is terminated and normal operation resumes when an asynchronous external interrupt to be
signalled by the integrated device is pending. The nap state may also be terminated when one of the following
internally generated asynchronous interrupts is pending: decrementer, fixed interval timer, watchdog timer,
machine check, and performance monitor. When the nap state terminates the integrated device signals the core
to transition from the pm_stopped activity state to the pm_halted activity state, then exits the core_halted activity
state. The core resumes fetching and executing instructions from the point at which it stopped executing
instructions. If the interrupt condition which caused the core to exit the nap state is enabled and the interrupt is
still pending, the interrupt will immediately be taken and the save/restore register indicating the address to return
to after the interrupt is processed will point to the instruction which would have executed next after the core
entered the pm_halted activity state.
Because state is retained in the caches and core registers, but the caches no longer continue to participate in
snooping activities, software should always flush, then invalidate the caches prior to initiating nap state to ensure
that any modified data is written out to backing store. Upon exit from nap state, software must update any TLB
entries that may have changed due to invalidations that were missed while the core was in the pm_stopped
activity state. In general, this will require the flushing of any dynamic TLB entries and reloading them from the
software page table. Because the core must flush its caches immediately prior to entering the nap state, the nap
state will generally only be initiated by writing the appropriate integrated device registers by the specific core
which will enter the nap state (that is, a core will generally nap itself, not another core).
sleep The sleep state is the same as the nap state, except that the timebase functions are also turned off.
All software activities required of the nap state are also required by the sleep state. In addition, since the
timebase is also turned off during sleep, upon exit from sleep state, software will have to reload the timebase
from some source external to the core. During sleep, the core will not wake from internally generated
asynchronous interrupts because the core is not processing any events that might cause a wakeup condition to
be noted.
The notion of nap, doze, and sleep modes (or states) pertains to, and are defined by, the integrated device
as a whole. As shown in Figure 8-1, an integrated device may define the terms nap, doze, and sleep to mean
different things. However, the integrated device controls the core power management by requesting the
core to enter the core activity states pm_halted, pm_stopped, and by manipulating the timebase enable
(tben) signal.
full_on ¬halt
halt ¬halt
pm_halted
(Device doze state) halt & ¬stop
pm_stopped
(Device nap or sleep state) stop
In addition to the power-management states, dynamic power management automatically stops clocking
individual internal functional units whenever they are idle.
Table 8-2 describes the core activity states.
Table 8-2. Core Activity States
State Descriptions
full_on Default. All internal units are operating at the full clock speed defined at power-up. Dynamic power management
(default) automatically stops clocking individual internal functional units that are idle.
pm_halted Initiated by asserting the halt input. The e500mc responds by stopping instruction execution. It then it asserts the
halted output to indicate that it is in the core_halted state. Core clocks continue running, and snooping continues
to maintain cache coherency. As Figure 8-1 shows, the e500mc is in pm_halted state when the integrated device
is in doze state.
The following occur once the core is in core_halted state:
• Suspend instruction fetching.
• Complete all previously fetched instructions and associated data transactions.
pm_stopped Initiated when stop is asserted to the core while it is in pm_halted state. The core responds by inhibiting clock
distribution to most of its functional units (after the CoreNet interface idles), and then asserting the stopped
output.
tben Disabling the timebase facilities. Additional power reduction is achieved by negating the time base enable (tben)
input, which stops timebase operations. Note that tben controls the timebase in all power management states.
Timer operation is independent of power management except for software considerations required for processing
timer interrupts that occur during pm_stopped state. For example, if the timer facility is stopped, software
ordinarily uses an external time reference to update the various timing counters upon restart.
halt I Asserted by the integrated device to initiate actions that cause the core to enter halted state.
stop I Asserted by the integrated device to initiate the required actions that cause the core to go from pm_halted
into pm_stopped state (as described in Table 8-2).
stopped O Asserted by the core anytime the internal functional clocks of the e500mc are stopped (for example after
integrated device asserts stop).
wake_req O Asserted when the core detects an internally generated asynchronous interrupt is enabled and pending.
This prompts the integrated device to bring the core to a full_on activity state to service the interrupt. The
interrupts that can assert wake_req are: decrementer, fixed interval timer, watchdog timer, machine
check, performance monitor, processor doorbell, processor doorbell critical, guest processor doorbell,
guest processor doorbell critical, and guest processor doorbell machine check.
No
Instruction Stop Fetching halt
Execution
Stopped? Instructions
Yes halted
No
Clock stop
Distribution Stop Clock
Stopped? Distribution
Yes stopped
9.1.1 Terminology
This chapter uses certain terminology that has the specific meanings defined in this section. This
terminology is used elsewhere in this manual and in the EREF: A Programmer’s Reference Manual for
Freescale Power Architecture® Processors and has the same definition. Some of this terminology, such
as ‘debug event’ appears in Power ISA™ and has a more limited scope, since Power ISA does not define
external debug capabilities.
The term ‘debug condition’ indicates that a set of specific criteria have been met such that the
corresponding debug event occurs in the absence of any gating or masking. The criteria for debug
conditions are obtained from debug control registers.
The term ‘debug event’ means the setting of a bit in either the debug status register (DBSR) or external
debug status register 0 (EDBSR0) upon the occurrence of the associated debug condition. However, a
debug condition does not always result in a debug event. Conditions are prioritized with respect to
exceptions. Except for some special UDE (Unconditional Debug Event) debug conditions, exceptions that
have higher priority than a debug condition prevent the debug condition from being recorded as a debug
event. In internal debug mode (IDM), debug events cause a debug interrupt if the debug enable bit is set
(MSR[DE] = 1). It is possible that a UDE debug event can occur at the same time another debug event
occurs.
The term ‘debug interrupt’ refers to the action of saving old context (machine state register and next
instruction address) into the debug save/restore registers (DSRR0 and DSRR1) and beginning execution
at a predetermined interrupt handler address. For additional information, see Section 4.9.16, “Debug
Interrupt—IVOR15.”
In external debug mode (EDM) (EDBCR0[EDM] = 1), debug events cause the processor to halt.
has been halted to prevent the processor from committing to an interrupt. For example, if after the
processor is halted and the debugger jams a mtmsr instruction that sets MSR[EE] and the external input
pin is signalling an external input interrupt is present, the core will be committed to take that interrupt and
will take the interrupt when the processor is taken out of the halted state even if the processor sets the
EDMEO, EDCEO, or EDEEO bits prior to resuming execution.
EDBCR0, shown in Figure 9-1, contains bits for enabling external debug features.
Offset 0xBASE_100 External debugger
32 33 34 35 36 37 38 63
R
EDM DNH_EN EFT EDMEO EDCEO EDEEO —
W
Reset All zeros
Figure 9-1. External Debug Control Register 0 (EDBCR0)
Debugger Machine Check Interrupt Enable Override. When this bit is set, no asynchronous machine check
interrupts will occur. Exception conditions for asynchronous machine check interrupts which occur will
remain pending. This bit has no effect on error report interrupts, nor does it disable the NMI interrupt which
is taken on the machine check level.
0 = Asynchronous machine check interrupts are enabled as described by the architecture. MSR[ME] and
MSR[GS] are used to determine if an asynchronous machine check interrupt can be taken.
35 EDMEO
1 = Asynchronous machine check interrupts are disabled. MSR[ME] and MSR[GS] are not used to determine
whether an asynchronous machine check interrupt can be taken. NMI interrupts are not affected by the
setting of this bit.
This bit should only be set when the processor is in External Debug mode by the external debugger.
Architecturally, the behavior of this bit is undefined when the processor is not in EDM mode. For e500mc,
this bit behaves the same regardless of whether the processor is in EDM mode.
Debugger Critical Interrupt Enable Override. When this bit is set, no asynchronous critical interrupts (critical
input, processor doorbell critical, guest processor doorbell critical, guest processor doorbell machine check,
or watchdog timer) will occur. Exception conditions for critical interrupts which occur will remain pending
unless the pending condition is cleared.
0 = Critical interrupts are enabled as described by the architecture. MSR[CE] and MSR[GS] are used to
determine if an asynchronous critical interrupt can be taken.
36 EDCEO
1 = Asynchronous machine check interrupts are disabled. MSR[CE] and MSR[GS] are not used to determine
whether an asynchronous critical interrupt can be taken.
This bit should only be set when the processor is in External Debug mode by the external debugger.
Architecturally, the behavior of this bit is undefined when the processor is not in EDM mode. For e500mc,
this bit behaves the same regardless of whether the processor is in EDM mode.
Debugger External Interrupt Enable Override. When this bit is set, no asynchronous external interrupts
(external input, decrementer, fixed interval timer, performance monitor, processor doorbell, or guest
processor doorbell) will occur. Exception conditions for external interrupts which occur will remain pending
unless the pending condition is cleared.
0 External interrupts are enabled as described by the architecture. MSR[EE] and MSR[GS] are used to
determine if an asynchronous external interrupt can be taken.
1 Asynchronous external interrupts are disabled. MSR[EE] and MSR[GS] are not used to determine whether
37 EDEEO an asynchronous external interrupt can be taken.
This bit should only be set when the processor is in External Debug mode by the external debugger.
Architecturally, the behavior of this bit is undefined when the processor is not in EDM mode. For e500mc,
this bit behaves the same regardless of whether the processor is in EDM mode.
Note: EREF: A Programmer’s Reference Manual for Freescale Power Architecture® Processors allows
implementations to consider a delayed floating-point enabled interrupt to be asynchronous, however
the taking of delayed floating-point is not enabled by MSR[EE] and is unaffected by the setting of
EDEEO.
38–63 — Reserved
32 — Reserved
34–35 — Reserved
42–43 — Reserved
49–56 — Reserved
Critical Interrupt Taken Debug Event
57 CIRPT
Set if a critical interrupt debug condition occurred (DBCR0[CIRPT] = 1) and DBCR0[EDM] = 1
59–63 — Reserved
Upon resuming, the status bit remains asserted until the next enabled debug event halts the core. This
provides software visibility (read only) into the event that caused the most recent halt.
Table 9-3 provides bit definitions for external debug status register mask 0.
Table 9-3. EDBSRMSK0 Field Descriptions
32–39 — Reserved
42–43 — Reserved
48–63 — Reserved
44 — Reserved
47–63 — Reserved
1
The e500mc only supports jamming one instruction at a time
2
EDBSR1[IJAE] is also available at the SoC. Refer to the SoC reference manual for details on external polling of this bit.
3
EDBSR1[IJBUSY] is also available at the SoC. Refer to the SoC reference manual for details on external polling of this bit.
48 63
R
—
W1
Reset All zeros
Figure 9-5. Processor Run Status Register (PRSR)
1
The writable bits of this register support a write-1-to-clear functionality. Writing zeros has no effect.
32 HALTED Halted state. Set whenever the core is halted. Cleared whenever the core resumes program execution.
Power management halt. Set whenever the processor is halted in response to a power management
33 PM_HALT
request from the system. This is a non-debug halt.
34 — Reserved
Debugger notify halt event. Set whenever the processor is halted in response to the dnh instruction.
35 DNH_HALT
This bit should be cleared by the debugger prior to issuing a resume command.
Debug event halt. Set whenever the processor is halted due to the occurrence of an enabled debug
36 DE_HALT
condition in EDM. This bit should be cleared by the debugger prior to issuing a RESUME command.
External debug halt request event. Set whenever the processor core receives a debug halt request from
37 EDB_HALT
the system. This bit should be cleared by the debugger prior to issuing a resume command.
38–40 — Reserved
Stopped state. Set whenever the core is stopped. Cleared whenever the core resumes program
42 STOPPED
execution.
Power management stop. Set whenever the processor is stopped in response to a power management
43 PM_STOP
stop request from the system. This is a non-debug stop.
44–58 — Reserved
Debugger notify halt message contains the additional information provided by the dnh instruction. The
59–63 DNHM
information is derived from the DUI operand of the dnh instruction.
48 63
R
—
W
Reset All zeros
Figure 9-6. Extended External Debug Control Register 0 (EEDCR0)
32–34 — Reserved
Forced halt. If a debug halt has been requested, but has not completed, writing a 1 to this field will force
the processor to halt. When halting the processor using this mechanism, the processor may not be put
back into a run state unless the entire integrated device is reset.
If this bit is set and a debug halt is not in progress, the request will be ignored.
Forcing the processor to halt using this control should only be used when a normal halt command does
35 forced_halt not complete. The normal halt mechanism may fail to complete if there are problems in the CoreNet
fabric whereby transactions are not completed. The processor in this case will fail to halt because part
of the protocol for halting the core is to force all queued memory transaction to complete and wait until
CoreNet has fully accepted those transactions. If the CoreNet fabric does not acknowledge the
transactions, the halt sequence will hang. This control could then be used to force the processor into the
halt state to examine the state of the processor. After a forced_halt is commanded, the external debugger
should not take any action or jam any instructions which would cause the processor to attempt a
transaction on the CoreNet interface, as doing so will likely cause the processor to hang.
This field is not present on e500mc Rev 1.x or e500mc Rev 2.x.
36–63 — Reserved
32–38 — Reserved
40–48 — Reserved
Timestamp Enable
0x = Timestamp is disabled
50–51 TSEN
10 = Timestamp is enabled for all messages (timestamp is applied to all messages)
11 = Coarse timestamp is enabled (timestamp is periodically applied every 32 messages)
Event In Control1
00 = EVTI0 assertion causes Program Trace Sync message (trigger use of uncompressed address information
on next message)
54–55 EIC
01 = Reserved
10 = EVTI0 disabled for this module
11 = Reserved (should not be used to ensure future compatibility)
56–57 — Reserved
Trace Mode2
000000 = All trace disabled
xxxxx1 = Ownership Trace enabled
xxxx1x = Data Trace enabled
58–63 TM xxx1xx = Program Trace enabled
xx1xxx = Watchpoint Trace enabled
x1xxxx = Reserved (writing x1xxxx may not read back the same at this bit position - software should not set
TM1 to a non-zero value)
1xxxxx = Data Acquisition Trace enabled
NOTE
Further details decribing the full debug functionality of this device are
beyond the scope of this document. The debugging and performance
monitoring capability enabled by the device hardware coexists within a
debug ecosystem that offers a rich variety of tools at different levels of the
hardware/software stack. Software development and debug tools from
Freescale (Codewarrior), as well as third-party vendors, provide a rich set
of options for configuring, controlling and analyzing debug- and
performance-related events.
48-49 — Reserved
33–52 — Reserved
48–63 — Reserved
Note: Using start and stop triggers for data trace may preclude the ability to correlate data trace and
program trace if watchpoint messages are used with DACs as a means to try and correlate data trace
messages to the appropriate region of code (that is, the program trace).
32–49 — Reserved
32–33 — Reserved
Suppression Threshold
00 = Suppression threshold is when message queues are 1/4 full
34–35 SPTHOLD 01 = Suppression threshold is when message queues are 1/2 full
10 = Suppression threshold is when message queues are 3/4 full
11 = Reserved
36–41 — Reserved
Suppression Enable
000000 = Suppression is disabled
xxxxx1 = Ownership Trace message suppression is enabled
xxxx1x = Data Trace message suppression is enabled
42–47 SPEN
xxx1xx = Program Trace message suppression is enabled
xx1xxx = Watchpoint Trace message suppression is enabled
x1xxxx = Reserved
1xxxxx = Data Acquisition message suppression is enabled
48–49 — Reserved
Stall Threshold
00 = Stall threshold is when message queues are 1/4 full
50–51 STTHOLD 01 = Stall threshold is when message queues are 1/2 full
10 = Stall threshold is when message queues are 3/4 full
11 = Reserved
52–62 — Reserved
Stall Enable
63 STEN 0 = Processor stalling is disabled
1 = Processor stalling is enabled
32–51 — Reserved
Real Address (bits 28:31). If the jammed instruction is a load or store instruction, and IJER = 1, these
52–55 IJRA 4 bits are prepended to the 32-bit effective address to form a 36-bit physical address (PA[28:63] =
IJRA[0:3] || EA[32:63]). This field is only used when the jammed instruction is a load or store instruction.
57 — Reserved
Page attributes for any storage access instruction (current access) when IJCFG[IJER] = 1. The
meaning of these attributes is the same as defined when the processor is executing storage accesses
58–62 WIMGE through normal instruction execution. The definition of the WIMGE attributes can be found in the EREF:
A Programmer’s Reference Manual for Freescale Power Architecture® Processors -Cache and MMU
Architecture.
Offset PMCC 0xBASE_030, PMCC1- 0xBASE_034, PMCC1- 0xBASE_038, PMCC1- External debugger
0xBASE_03C
32 63
R Captured counter value
W
Reset All zeros
Figure 9-16. Performance Monitor Counter Capture Registers (PMCC0–PMCC3)
32–63 Counter Value Value of the PMCn counter upon occurrence of the EVTO[4] trigger
32–63 Captured program counter value Value of the program counter upon occurrence of the EVTO[4] trigger
debug interrupts from occurring) when executing in embedded hypervisor state when the guest operating
system is using the debug facility.
When EPCR[DUVD] = 1 and MSR[GS] = 0, all debug events and associated exceptions do not occur
except for the unconditional debug event, and no debug events are posted in the DBSR. Refer to the EREF:
A Programmer’s Reference Manual for Freescale Power Architecture® Processors for more details on the
embedded hypervisor.
DBCR0
HALTED MSR[DE] Action
EDM IDM
1 x x x No action
0 0 x 0 No action
0 0 0 x No action
Table 9-17 lists responses for unconditional debug event (UDE), interrupt taken (IRPT), trap (TRAP),
return from interrupt (RET), critical interrupt taken (CIRPT), and critical return from interrupt (CRET)
conditions.
Table 9-17. Response—UDE, IRPT, TRAP, RET, CIRPT, CRET
DBCR0
HALTED MSR[DE] Action
EDM IDM
1 x x x No action.
0 0 0 x No action
Set DBSR[x], DBSR[IDE]. More than one DBSR bit may be set for imprecise debug events
0 0 1 0 for various operations. Table 9-20 lists possible combinations of concurrent imprecise debug
events for all operations.
Table 9-18 lists responses for instruction address compare 1 and 2 conditions.
Table 9-18. Response—IAC1, IAC2
DBCR0
HALTED MSR[DE] Action
EDM IDM
1 x x x No action.
Set DBSR[IAC1/2], DBSR[IDE]. Generate IAC1/2 watchpoint. More than one DBSR bit
0 0 1 0 may be set for imprecise debug events for various operations. Table 9-20 lists possible
combinations of concurrent imprecise debug events for all operations
Table 9-19 lists responses for debug address compare 1 and 2 conditions.
Table 9-19. Response—DAC1, DAC2
1 x x x x No action.
Table 9-20 lists the combinations of debug events that may be simultaneously recorded in the DBSR
(DBCR0[EDM] = 0, DBCR0[IDM] = 1) when debug interrupts are disabled (MSR[DE] = 0) for every
operation type. Multiple debug events can be recorded as a result of a single operation if the DBSR[IDE]
is set for that operation.
Table 9-20. Recording of Imprecise Debug Events (IDEs)
Interrupt Operations
IAC1, and IAC2. These debug conditions cause debug events to be recorded in DBSR if MSR[DE] = 1
and no higher priority exception exists or according to Table 9-20 if MSR[DE] = 0. When MSR[DE] = 1,
the IAC debug conditions are logged when the debug IAC1/2 interrupt is taken. When MSR[DE] = 0, IAC
debug conditions are recorded in DBSR when an instruction marked with an IAC1/2 condition takes an
interrupt or completes, according to Table 9-20. MSR[DE] has no effect on the updates to EDBSR0.
Instruction address compares may specify user/supervisor mode and instruction space (MSR[IS]), along
with an effective address, masked effective address, or range of effective addresses for comparison. Refer
to Section 2.17.3, “Debug Control Register 1 (DBCR1),” for details on the controls for the various IAC
event modes.
IAC conditions are masked from generating IAC events if DBCR2[DACLINK1/2] are set. The IAC fields
of DBSR and EDBSR0 are not updated. In this case, a DAC event occurs if an instruction generates both
a DAC condition and an IAC condition and no exceptions of higher priority are present.
In EDM, an unmasked IAC debug condition is recorded as a debug event in EDBSR0[IAC1, IAC2], the
execution of the instruction causing the debug event is suppressed, the processor halts, and NIA is set to
the address of the excepting instruction.
In IDM, an unmasked IAC debug condition is recorded as a debug event in DBSR[IAC1, IAC2] if
MSR[DE] = 1 and no higher priority exception exists, or according to Table 9-20, if MSR[DE] = 0. More
than one bit will be set in DBSR if the instruction address compare mode is not exact address compare
mode as DBSR bits corresponding to IAC1 and IAC2 will be set.
If debug interrupts are enabled (MSR[DE] = 1) and the debug event is recorded, a debug interrupt is
generated, the execution of the instruction causing the debug event is suppressed, and DSRR0 is set to the
address of the excepting instruction.
If debug interrupts are disabled (MSR[DE] = 0), the IAC event is conditionally recorded in the DBSR
according to the type of operation associated with the event. See Table 9-20 for a complete list of
operations and their effect on the recording of imprecise debug events as well as which imprecise debug
events can be simultaneously recorded for a given operation. If the IAC event is recorded in the DBSR,
DBSR[IDE] is also set to indicate that the debug interrupt (if later enabled) is an imprecise event. In the
case of a delayed debug interrupt, the DSRR0 contains the address of the instruction following the one that
enabled debug interrupts. Software in the debug interrupt handler can use the DBSR[IDE] information to
determine how to interpret the contents of the DSRR0.
where GS = 1. Refer to the EREF: A Programmer’s Reference Manual for Freescale Power
Architecture® Processors for details.
One or more data address compare debug conditions (DAC1R, DAC1W, DAC2R, DAC2W) occur if they
are enabled, execution is attempted of a data storage access instruction, and the type and address of the
data storage access meet the criteria specified in the DBCR0, DBCR2, DAC1, and DAC2. These
conditions cause debug events to be recorded in DBSR if MSR[DE] = 1 and no higher priority exception
exists, or according to Table 9-20, if MSR[DE] = 0. MSR[DE] has no effect on the updates to EDBSR0.
Data address compares may specify user/supervisor mode and data space (MSR[DS]), along with an
effective address, masked effective address, or range of effective addresses for comparison. Refer to
Section 2.17.4, “Debug Control Register 2 (DBCR2),” for details on the controls for the various DAC
event modes.
DBCR0[DAC1] determines whether DAC1 comparisons are performed on read-type accesses, write-type
accesses, or both. Similarly, DBCR0[DAC2] determines if DAC2 comparisons are performed on read-type
accesses, write-type accesses, or both.
All load instructions are considered reads with respect to debug conditions, while all store instructions are
considered writes with respect to debug conditions. In addition, the cache management instructions and
certain special cases are handled as follows:
dcbt[ls], dcbtst, dcbtep, dcbtstep, icbt[ls], icbi, icbiep, and icblc are all considered reads with respect to
debug events. Note that dcbt[ep], dcbtst[ep], and icbt are treated as NOPs when they report data storage
or data TLB miss exceptions, instead of being allowed to cause interrupts. However, these instructions
cause debug interrupts, even when they would otherwise have been NOPed due to a data storage or data
TLB miss exception.
dcbz[ep], dcbi, dcbf[ep], dcba, dcbst[ep], dcbtstls, and dcblc are all considered writes with respect to
debug events. Note that dcbf and dcbst are considered reads with respect to data storage exceptions,
because they do not actually change the data at a given address. However, because execution of these
instructions may result in write activity on the processor’s data bus, they are treated as writes with respect
to debug events. See Table 4-2 for the list of exceptions for all load, store, and cache management
instructions.
lmw or stmw operations may partially complete if a DAC event occurs after the initial transfer has started.
DAC events may be further qualified by requiring an IAC condition on the corresponding data storage
access instruction by setting DBCR2[DACLINK1/2]. When DACs are linked to IACs in this way, a DAC
event occurs only if an instruction generates both a DAC condition and an IAC condition (IAC1 or IAC2
debug condition). These linked events are recorded in DBSR[DAC1,DAC2], according to which DAC
comparator generated the debug condition. For e500mc, a DACLINK1/2 event will only occur if the DAC
condition matches the first word of a lmw or stmw instruction.
In EDM, if no higher priority exception is associated with the instruction, a DAC debug condition is
recorded as a debug event in EDBSR0[DAC1R, DAC1W, DAC2R, DAC2W], the execution of the
instruction causing the debug event is suppressed, the processor halts, and NIA is set to the address of the
excepting instruction.
In IDM, a DAC debug condition is recorded as a debug event in DBSR[DAC1R,DAC1W,DAC2R,
DAC2W] if MSR[DE] = 1 and no higher priority exception exists, or according to Table 9-20, if
MSR[DE] = 0. More than one bit will be set in DBSR if the data address compare mode is not exact
address compare mode as DBSR bits corresponding to DAC1 and DAC2 will be set.
If debug interrupts are enabled (MSR[DE] = 1) and the debug event is recorded, a debug interrupt is
generated, the execution of the instruction causing the debug condition is suppressed, and DSRR0 is set to
the address of the excepting instruction.
If debug interrupts are disabled (MSR[DE] = 0), the DAC event is conditionally recorded in the DBSR
according to the type of operation associated with the event. See Table 9-20 for a complete list of
operations and their effect on the recording of imprecise debug events as well as which imprecise debug
events can be simultaneously recorded for given operation. If the DAC event is recorded in the DBSR,
DBSR[IDE] is also set to indicate that the debug interrupt (if later enabled) is an imprecise event. In the
case of a delayed debug interrupt, the DSRR0 contains the address of the instruction following the one that
enabled debug interrupts. Software in the debug interrupt handler can use the DBSR[IDE] information to
determine how to interpret the contents of the DSRR0.
instructions which enable or disable instruction complete debug events through the side effect of a change
to MSR[DE] is not applied to the return instruction itself, but takes effect on the next instruction following
the return.
When an instruction complete debug event is recorded in internal debug mode, a debug interrupt is
generated and the address of the next instruction to be executed is recorded in DSRR0.
If debug interrupts are disabled (MSR[DE] = 0) at the time of the execution of the rfi (that is, before the
MSR is updated by the rfi), the RET event is conditionally recorded in the DBSR according to the type of
operation associated with the event. See Table 9-20 for a complete list of operations and their effect on the
recording of imprecise debug events as well as which imprecise debug events can be simultaneously
recorded for given operation. If the RET debug event is recorded in the DBSR, DBSR[IDE] is also set to
1 to record the imprecise debug event. In the case of a delayed debug interrupt, the DSRR0 contains the
address of the instruction following the one that enabled debug interrupts. Software in the debug interrupt
handler can use the DBSR[IDE] information to determine how to interpret the contents of the DSRR0.
If debug interrupts are disabled (MSR[DE] = 0), the CIRPT debug event is conditionally recorded in the
DBSR according to the type of operation associated with the event. See Table 9-20 for a complete list of
operations and their effect on the recording of imprecise debug events as well as which imprecise debug
events can be simultaneously recorded for given operation. If the CIRPT debug event is recorded in the
DBSR, DBSR[IDE] is also set to indicate that the debug interrupt (if later enabled) is an imprecise event.
In the case of a delayed debug interrupt, the DSRR0 contains the address of the instruction following the
one that enabled debug interrupts. Software in the debug interrupt handler can use the DBSR[IDE]
information to determine how to interpret the contents of the DSRR0.
9.9.1.1 Halt
When the e500mc is in the halted state, the clocks are still running, but the core is not fetching or executing
instructions. While in this state, an external debugger can jam instructions into the pipeline, and they are
executed. The core also continues to snoop the core complex bus and maintains cache coherency.
Assertion of pm_halt causes the core to enter the halted state. PRSR[PM_HALT] is asserted to indicate
that pm_halt has been asserted, and PRSR[HALTED] indicates that the core is in the halted state. When
pm_halt is deasserted, PRSR[PM_HALT] transitions to zero and, if the processor has not also been halted
for a halt condition in the debug class, the core resumes immediately.
There are several mechanisms that halt the core. These are described in Table 9-21.
Table 9-21. Methods for Halting the Core
DNH Debug EDBCR0[DNH_EN] Section 9.9.3, “Debugger Notify Halt (dnh) Instruction”
Most external debug operations can only be performed when the processor is halted. Note that if the core
is halted only because pm_halt is asserted (that is, no other halt requests are active in PRSR), it resumes
immediately if pm_halt is deasserted. therefore, the core should always be halted with some other debug
mechanism (for example, setting a system debug event halt) before accessing the contents of the core.
The Processor Run Status Register (PRSR) indicates whether or not the core is halted for debug.
9.9.1.3 Wait
When the processor executes the wait instruction, it discontinues fetching and executing instructions, and
waits for an asynchronous interrupt. This is the program wait state. This state does not have any effect on
the processor while it is in the debug halted state, but affects resuming from the halted state. If the
processor is in the program wait state when the core_resume signal is asserted to exit the halted state, the
core does not fetch or execute any instructions until an asynchronous interrupt occurs. Otherwise, it begins
fetching and executing instructions immediately.
If the processor is in the program wait state when the debug halted state is entered, the processor remains
in the program wait state. Jamming an mtspr to the NIA causes the processor to exit the program wait
state. Jamming a wait instruction causes the processor to enter the program wait state.
The debugger can examine PRSR[WAIT] to determine whether or not the processor is in the program wait
state.
2. Assert core_resume.
Similarly, assume that the processor has been stopped by one of the stop conditions in the debug class. To
resume from this state, the debugger must:
1. Clear all of the bits in PRSR that correspond to stop requests in the debug class,
2. Assert core_resume.
Normally, when the processor has been halted for power management by asserting pm_halt, the processor
resumes execution when pm_halt is deasserted. Similarly, the processor normally exits the power
management stopped state whenever core_stop is deasserted. However, if the core has been halted or
stopped for a halt or stop condition in the debug class, deassertion of pm_halt or core_stop do not cause
the processor to resume until core_resume is asserted.
If core_resume is asserted while pm_halt or core_stop is asserted, the core remains in the halted or stopped
for power management state.
If any of the debug related halt status bits are set in the PRSR indicating whether or not the core has been
halted or stopped for a debug condition, core_resume must be asserted before the core resumes execution.
If the core has been halted or stopped only by assertion of pm_halt or core_stop, simply releasing pm_halt
or core_stop allows the processor to resume execution.
If the core is in the stopped state, and some halt requests are active in PRSR, then an attempt to resume
causes the processor to go directly from the stopped to the halted state. If no halt requests are active, the
processor goes directly from the stopped to the running state.
In order to be able to resume from a stopped state, special steps must be taken when stopping the core.
These steps are:
1. Flush the caches so that they do not contain any modified data. This prevents coherency problems.
2. Discontinue any snoop traffic.
3. Halt the core
4. Stop the core
9.9.2 Singlestep
An external development tool can singlestep through code using the instruction complete (ICMP),
interrupt taken (IRPT) and critical interrupt taken (CIRPT) debug events in EDM. If a resume command
is issued while the ICMP, IRPT, and CIRPT events are enabled in EDM, the processor does one of the
following:
• Execute and complete one instruction, then halt before executing the next instruction.
• Execute one instruction and take a synchronous interrupt, then halt before executing the first
instruction of the interrupt handler.
• Immediately take an asynchronous interrupt and halt on the first instruction of the interrupt handler.
therefore, to single step, set ICMP and IRPT and CIRPT, set EDM, clear PRSR and resume. Note that
PRSR must be cleared prior to each resume command.
Refer to the SoC reference manual for specifics on accessing the internal memory mapped resources via
the external JTAG interface as well as the Aurora high speed serial interface.
Figure 9-18 shows the address bit fields used in accessing debug/expert resources and Table 9-23
summarizes the debug/expert resource memory map.
23 12 11 8 7 0
0x04 Reserved
Clock Control/Status 0x40 Test Mode Configuration Register R/W E?, H? 32 POR
0x10 Reserved
Jammed instructions have no instruction address. therefore, they do not require translation of an instruction
address, and there is no way to have an ITLB miss or ISI. Furthermore, a jammed instruction does not
increment the NIA.
Jammed instructions can have undesired effects, particularly if the jammed instruction causes an
exception. The processor provides some facilities that reduce the number of architectural registers that are
affected by a jammed instruction that causes an exception. See Section 9.9.5.5, “Exception Conditions and
Affected Architectural Registers,” for details.
NOTE
Instruction jamming operations require the processor to be halted.
Instruction jamming may change architecture-defined processor state. It is
the responsibility of the external debug facility to save and restore any
critical state.
the IJRA field of IJCFG. The 36-bit address is formed by prepending the 4-bit IJRA field to the effective
address calculated by the jammed load/store instruction (PA[28:63] = IJRA[0:3] || EA[32:63).
Because the WIMGE bits are not supplied by the MMU, they are supplied by the IJCFG[WIMGE] bits
when IJER = 1. Care must be taken to specify the correct page attributes for a given real address so that
cache paradoxes do not occur (that is, specifying a page attribute of cache-inhibited for a real address
which has been previously accessed as cacheable may result in the load or store not accessing memory
coherently with previous accesses or other processors or agents in the system).
When IJCFG[IJER] = 0, a data TLB miss error occurs if the MMU does not contain an entry that matches
the virtual address. However, in real addressing mode, MMU translation is not performed and TLB miss
errors do not occur.
Table 9-25. Instruction Jamming Addressing Modes
Mnemonic Description
dcbtstls Data Cache Block Touch for Store and Lock Set
Table 9-26. Implemented IJAM Instructions when the Processor Is Halted (continued)
Mnemonic Description
wait Wait
mfspr NIA Illegal instruction exception Executed Move from SPR, NIA
SPRN = 559
mtspr NIA Illegal instruction exception Executed Move to SPR, NIA
occurs. Table 9-28 lists some interesting architectural registers and indicates whether or not they are
affected by an exception on a jammed instruction.
Table 9-28. Effect of Exceptions on Machine State
SRR0, SRR1 No —
CSRR0, CSRR1 No —
DSRR0, DSRR1 No —
MCSRR0, MCSRR1 No —
MSR No —
MCSR Yes —
MCAR Yes —
DEAR No —
MAS registers No —
DBSR No —
As Table 9-28 shows, the NIA is not updated when an exception occurs on a jammed instruction. Instead,
EDBSR1 indicates the IVOR number of the exception that occurred. Similarly, the ESR is not updated,
but the EDESR contains the information that would have been in the ESR if the exception had occurred in
functional mode.
Data TLB misses are the most likely exceptions to occur on jammed instructions. They happen if no
translation is available for a jammed load or store instruction. As can be seen in Table 9-28, the MAS
registers and DEAR are not updated by a DTLB miss.
Asynchronous interrupts are always disabled when the processor is halted. therefore, asynchronous
interrupts do not occur around the time that the processor is executing a jammed instruction.
The core should be halted for debug before jamming instructions. If an IJAM is performed while the core
is not halted for debug, an internal bus error is generated. The IJAM may be performed, and the results are
undefined.
If an access error occurs while jamming instructions, EDBSR1[IJAE] is set.
NOTE
For 8-bit (byte) and 16-bit (halfword) writes, data should always be written
to IJDATA1 right-justified (least significant) independent of the specific
address accessed.
The following procedure is used for instructions with associated data (output):
1. Confirm that the processor is halted. If not halted, issue a HALT command and wait until the
processor is halted.
2. Write [MODE] =1 to configure load/store operation.
3. Write IJIR to load instruction and run.
4. Check for IJAM completion status (one of two options):
— Scan the SoC-level JTAG IR and capture the status that is shifted out in the process. If the status
is IJAM not done, repeat this step.
— Read EDBSR1[IJBUSY] to determine status.
5. On error, check EDBSR1 and EDESR.
6. If no error, read IJDATA0—most significant word (if 64-bit data).
7. If no error, read IJDATA1—least-significant word, halfword, or byte.
NOTE
For 8-bit (byte) and 16-bit (halfword) reads, data is always read from
IJDATA1 right-justified (least significant) independent of the specific
address accessed.
For instructions with no associated data, use the following procedure:
1. Confirm that the processor is halted. If not halted, issue a HALT command and wait until the
processor is halted.
2. Write IJIR to load instruction and run
3. Check for IJAM completion status (one of two options):
— Scan the SoC-level JTAG IR and capture the status that is shifted out in the process. If the status
is IJAM not done, repeat this step.
— Read EDBSR1[IJBUSY] to determine status
4. On error, check EDBSR1 and EDESR
• EDESR = effective value of the ESR if the exception had been processed
• EDBSR1[IJAE] = 1
vendor-specific features outside the scope of the public messages. The Nexus block currently supports the
public and vendor defined TCODEs shown in Table 9-29.
Table 9-29. Supported TCODEs
encodings. Software decoding Nexus messages should account for this difference.
3 There will be microarchitected (implementation specific) amount of “skid” in terms of the specific instruction address that is
transmitted relative to the sync condition. Subsequent program trace message fields (I-CNT / HIST) will be based from this
messaged PC value maintaining a coherent trace flow.
4 Program Trace -Indirect Branch History Message w/ Sync is the message that is generated periodically with the F-ADDR
Table 9-30. Data Trace Size (DSZ) Encodings (TCODE = 13) (continued)
0101 5 bytes
1100-1111 Reserved
1
Implied data instructions and cache management instructions utilize these
encodings. Refer to Section 9.10.13.3, “Data Trace Size Field (DSZ).
xxxxxxx1 Watchpoint Trace Message(s) lost. Applies only to Error Type 0 (ETYPE = 0000)
xxxxxx1x Data Trace Message(s) lost. Applies only to Error Type 0 (ETYPE = 0000)
xxxx1xxx Ownership Trace Message(s) lost. Applies only to Error Type 0 (ETYPE = 0000)
xxx1xxxx Status message(s) lost (Debug Status). Applies only to Error Type 0 (ETYPE = 0000)
x1xxxxxx Reserved
1xxxxxxx Reserved
1 Note: e500mc uses only 8 bit ECODE encodings, whereas other Nexus clients on the integrated device may use 12 bit ECODE
encodings. Software decoding Nexus messages should account for this difference.
0001 Contention with higher priority messages caused one or more messages to be lost
0010–1111 Reserved
00 Branch instruction
01 Interrupt
1x Reserved
0001 EVCODE #2 Entry into halted or stopped state for power management
0010–0011 — Reserved
0101–1000 — Reserved
EVCODE #10 Begin masking of program trace due to MSR[PMM] = 0. This event applies only if
1001
DC4[PTMARK] = 1.
1010 EVCODE #11 Branch and link occurrence (direct branch function call)
1011–1111 — Reserved
Bit 0 1 2 3 4 5
MSB LSB
When a message pends due to contention with other message types, a 4-bit counter is used to keep track
of how long the message pends until it actually enters the message queues. This 4-bit correction value is
concatenated with the 24-bit timestamp and can be used to correct the timestamp value for that pending
latency by subtracting the correction value from the 24-bit timestamp value. If a message pends for 15 or
more cycles, the timestamp correction indicates a value of 0xF. A timestamp correction value of 0xF
should be taken to mean that the timestamp value for that message is unreliable.
Whenever the 24-bit timestamp counter overflows, a Resource Full Message (RFM) is generated with a
resource code of 0x8 and an RDATA field of 0xFF_FFFF. The Timestamp Resource Full Messages caused
by do not pend. Clearing DC1[TSEN] will disable the timestamp counter, preventing Resource Full
Messages from being generated due to timestamp overflow.
Table 9-36. Message Type Priority and Message Dropped Responses (continued)
Program Trace Indirect Branch with History (IHM) Y BTM error message
(port 1) sync upgrade next IHM
3 Resource Full Message (RFM) for
instruction counter, history buffer and Y
timestamp overflow
Program Trace Program Correlation Message (PCM) BTM error message
Y
4 (port 2) sync upgrade next IHM
The specific resource that has become full is indicated by the resource code (RCODE) within the Resource
Full message. The data associated with the specific resource is captured in the resource data field
(RDATA). These fields and their values are outlined in Table 9-33, "Resource Code (RCODE) Encoding
(TCODE = 27).”
Not taken direct branches 0 b, ba, bc, bca, bla, bcla, bl, bcl —
Taken direct branches b, ba, bc, bca, bla, bcla, bl, bcl If EVCODE for direct branch function calls
is not masked in DC4, taken bl and bcl
1 instructions generate Program
Correlation Messages and are not logged
in the history buffer.
Reserved 11 N/A
instruction count (I-CNT) and branch history (HIST) values starting from the program counter (PC) value
transmitted within this message. Hard sync cases are outlined in Table 9-39.
Table 9-39. Hard Synchronization Conditions
Hard SYNC
Description
Condition
The e500mc EVTI0 pin has been asserted (high to low transition) and DC1[EIC] determines that
EVTI0 Assertion
EVTI0 generates trace synchronization messages.
Exit from Debug The embedded processor has exited from the debug HALT state.
Program Trace Enable Program Trace is enabled during normal execution of the embedded processor.
An overrun condition had previously occurred in which one or more trace occurrences were
discarded by the debug logic. To inform the tool that an overrun condition occurred, the target
FIFO Overrun
outputs an Error Message (TCODE = 8) prior to a sync message. The error message contains
an ECODE value indicating the type(s) of messages lost due to the overrun condition.
One or more messages was lost due to contention with a higher priority message. To inform the
tool that this condition occurred, the target outputs an Error Message (TCODE = 8) prior to a
Message Contention
sync message. The error message contains an ECODE value indicating the type of message
lost due to the contention.See Section 9.10.7, “Nexus Message Priority”
Conditions which do not create a discontinuity are considered “soft” sync cases. These conditions cause
the next branch trace message to use an uncompressed target address (TCODE 29). Soft sync cases are
outlined in Table 9-40.
Table 9-40. Soft Synchronization Conditions
Soft SYNC
Description
Condition
EVTI1 The e500mc EVTI1 pin has been asserted (high to low transition).
Assertion
The periodic trace message counter has expired indicating that there have been 255 program trace messages
Periodic
without an uncompressed address.This will upgrade the next program trace message to be a sync type.
Message
This insures that with a sufficiently large sample of trace information, there is guaranteed to be a reference
Counter
address that can be used to meaningfully interpret the remainder of the program trace.
• Each address compare is limited to a maximum of 4 Kbytes on exact match (see Section 2.17.5,
“Debug Control Register 4 (DBCR4),” for detail on programming extended DAC ranges).
• Misaligned stores are not combined, meaning that each half that has an associated DAC set is sent
as an independent data trace message.
• Store multiple word instructions (stmw) produce a separate data trace message for each word
stored that meets the trace criteria.
• Clearing the appropriate DCI[TM] bit (DC1[62]). Note that resetting the Nexus module clears all
Nexus registers, disabling data trace as a side effect.
• Programming WT1[DTE] to disable data trace on the occurrence of a watchpoint condition.
NOTE
The latter two mechanisms defined above will disable additional stores from
entering the e500mc store queue, but accesses which have already entered
the queue (that is, accesses in flight) are messaged out before the DTMs are
actually disabled.
Data trace is effectively suppressed whenever the processor core is in the debug halted or debug stopped
state. Instruction jamming operations do not produce any Data Trace Messages. Whenever the processor
core leaves the debug halted state, data trace enable state reverts to the status of DC1[62].
MSB LSB
A value of 0 for the DAC tag indicates that this store matched only the DAC1 conditions or matched both
the DAC1 and DAC2 conditions. A value of 1 for the DAC tag indicates that this store matched only the
DAC2 conditions. The full effective address can be reconstructed by concatenating the DAC information
with the data trace address field information as follows:
The upper address information should be selected from DAC1 or DAC2 according to the DAC tag bit in
the data trace address field and the DAC settings. Note that setting the DAC conditions to include regions
in excess of 4 Kbytes results in address aliasing making precise reconstruction of the full effective address
impossible (without other implied restrictions or information that can remove the ambiguity).
Cache management instructions that dcba, dcbal, dcbz, dcbzl, dcbzep, dcbzl, Treated like data storage writes with write
are treated as store-type data dcbzlep data value of zero (refer to Section 9.10.13.3,
storage accesses “Data Trace Size Field (DSZ)).
Store instructions that produce data stb[u][x], stbepx, stfd[u][x], stfdepx, For stmw, a separate data trace message is
storage write accesses stfiwx, stfs[u][x], sth[u][x], sthbrx, generated for each word stored that meets
sthepx, stmw, stw[u][x], stwbrx, stwepx the trace criteria.
Conditional store instruction stwcx. Only generates data trace messages if the
associated store is successful (that is, the
condition evaluates to true)
PID Index (4 Bits) PID Index Description PID Value (up to 40 bits)
IDTAG is sampled from DEVENT[32:39] when a write to DDAM is performed via mtspr operations.
The IDTAG is left to the discretion of the development tool to be used in whatever manner is deemed
appropriate for the application.
When the processor is in EDM, IACs, DACs return from interrupt. Return from critical interrupt debug
conditions cause bits to be set in EDBSR0 and the processor to halt instead of taking a debug interrupt. In
these cases, the watchpoint for the respective events will trigger on the update to EDBSR0 rather than the
debug interrupt. For e500mc Rev 1.x and Rev 2.x, IACs and DACs that cause halts in EDM mode will not
trigger watchpoints.
WP15 WP14 WP13 WP12 WP11 WP10 WP9 WP8 WP7 WP6 WP5 WP4 WP3 WP2 WP1
9.11.1 Overview
The performance monitor provides the ability to count predefined events and processor clocks associated
with particular operations, for example cache misses, mispredicted branches, or the number of cycles an
execution unit stalls. The count of such events can be used to trigger the performance monitor interrupt.
The performance monitor can be used to do the following:
• Improve system performance by monitoring software execution and then recoding algorithms for
more efficiency. For example, memory hierarchy behavior can be monitored and analyzed to
optimize task scheduling or data distribution algorithms.
• Characterize processors in environments not easily characterized by benchmarking.
• Help system developers bring up and debug their systems.
The performance monitor uses the following resources:
• The performance monitor mark bit, MSR[PMM], controls which programs are monitored.
• The move to/from performance monitor registers (PMR) instructions, mtpmr and mfpmr.
• The external input, pm_event.
Figure 9-24 shows a detailed view of one of the PMC counters available within the core performance
monitor. Blue highlights special triggering controls that are available for e500mc.
PMLCA0[EVENT]
DVT7 DVT7
PMLCB0[TRIGONCTL] PMLCB0[TRIGOFFCTL]
...
...
• PMRs:
— The performance monitor counter registers (PMC0–PMC3) Section 2.18.4, “Performance
Monitor Counter Registers (PMC0–PMC3/UPMC0–UPMC3) are 32-bit counters used to
Marked 0 0 0 1
Not marked 0 0 1 0
Supervisor 0 1 0 0
User 1 0 0 0
All 0 0 0 0
None X X 1 1
None 1 1 X X
Memory
PerfMon Counters Mapped
Interface
& Program Counter I/F
32
PCC Capture Reg
32
PMC0 Capture Reg
32
PMC1 Capture Reg
32
PMC2 Capture Reg
32
PMC3 Capture Reg
IAC Matches 1
DAC Matches core
watchpt EVTO4 EVTO4
Other
control
Other
EVTO[0:3] EVTO[0:3]
EVTI0
EVTI1
NOTE
Note that the EVTI signal, provided from the SoC, can be used to capture
not only PMC counter (& program counter) values from a single core, but
from all cores (or a subset of cores) on the SoC as well as the SoC-level
performance counters located in the event processing unit (EPU).
9.11.5 Examples
The following sections provide examples of how to use the performance monitor facility.
The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The
above sequence is not necessary if the counters are frozen.
9.11.5.2 Thresholding
Threshold event measurement enables the counting of duration and usage events. For example, data line
fill buffer (DLFB) load miss cycles (event C0:76 and C1:76) require a threshold value. A DLFB load miss
cycles event is counted only when the number of cycles spent recovering from the miss is greater than the
threshold. Because this event is counted on two counters and each counter has an individual threshold, one
execution of a performance monitor program can sample two different threshold values. Measuring code
performance with multiple concurrent thresholds expedites code profiling significantly.
establishes a correlation between each counter, events to be traced and the pattern required for the desired
selection.
For the purposes of event descriptions, the following definitions of micro-ops apply:
• A micro-op is defined to be:
— 2 for load and store instructions that use an update form (such as lwzu);
— 1 to 32 for load and store multiple instructions (lmw, stmw) depending on the number of
registers processed;
— 1 for all other instructions
• A store micro-op is defined to be:
— 1 to 32 for store multiple instructions (stmw) depending on the number of registers processed;
— 2 for any misaligned store that crosses a doubleword boundary;
— 1 for all other store instructions including store with update forms;
— 1 for all other instructions that are treated as a store or are processed as an entry in the store
queue by the implementation:
– dcba*, dcbf*, dcbst*, dcbz*;
– dcbt (CT=1), dcbtst (CT=1);
– icbi*;
– icbt (CT=1);
– dcbtls, dcbtstls, dcblc, icbtls, icblc;
– msgsnd, mbar, sync, tlbivax, tlbilx
— dcbt* instructions that are processed as a NOP are not counted
• A load micro-op is defined to be:
— 1 to 32 for load multiple instructions (lmw) depending on the number of registers processed;
— 2 for any misaligned load that crosses a doubleword boundary;
— 1 for all other load instructions including load with update forms;
— 1 for all other instructions that are treated as a load by the implementation:
– dcbt (CT=0), dcbtst (CT=0)
— dcbt* instructions that are processed as a NOP are not counted
• A cacheable store micro-op is defined to be a store micro-op to an address that is marked with
WIMGE = 0b00xxx (not write-through and not cacheing inhibited).
• A cacheable load micro-op is defined to be a load micro-op to an address that is marked with
WIMGE = 0bx0xxx (not cacheing inhibited).
The Spec/Nonspec column indicates whether the event count includes any occurrences due to processing
that was not architecturally required by the Power ISA sequential execution model (speculative
processing).
• Speculative counts include speculative instructions that were later flushed.
• Nonspeculative counts do not include speculative operations, which are flushed.
Table 9-46 describes how event types are indicated in Table 9-47.
Common Com:# Shared across counters PMC0–PMC3. Fairly specific to e500 microarchitectures.
Counter-specific Counted only on one or more specific counters. The notation indicates the counter to which
C[0–3]:#
an event is assigned. For example, an event assigned to counter PMC2 is shown as C2:#.
Spec/
Number Event Count Description
Nonspec
General Events
Com:7 PM_EVENT cycles Spec Processor cycles that occur when the pm_event input is asserted.
Number of CQ redirects Fetch redirects initiated from the completion unit. (for example,
Com:11 Nonspec
resulting from sc, rfi, rfci, rfdi, rfmci, isync, and interrupts)
Com:13 Taken branches finished Spec Includes all taken branch instructions
Finished unconditional branches Includes all taken branch instructions not allocated in the BTB
Com:14 Spec
that miss the BTB
Branches mispredicted (for any Counts branch instructions mispredicted due to direction, target (for
reason) example if the CTR contents change), or IAB prediction. Does not
Com:15 Spec
count instructions that the branch predictor incorrectly predicted to
be branches.
Branches in the BTB mispredicted Counts branch instructions mispredicted due to direction prediction.
Com:16 Spec
due to direction prediction.
Spec/
Number Event Count Description
Nonspec
BTB hits and pseudo-hits Branch instructions that hit in the BTB or miss in the BTB and are
Com:17 Spec not-taken (a pseudo-hit). Characterizes upper bound on prediction
rate.
Pipeline Stalls
Com:18 Cycles decode stalled Spec Cycles the IQ is not empty but 0 instructions decoded
Com:19 Cycles issue stalled Spec Cycles the issue buffer is not empty but 0 instructions issued
Com:20 Cycles branch issue stalled Spec Cycles the branch buffer is not empty but 0 instructions issued
Com:21 Cycles SFX0 schedule stalled Spec Cycles SFX0 is not empty but 0 instructions scheduled
Com:22 Cycles SFX1 schedule stalled Spec Cycles SFX1 is not empty but 0 instructions scheduled
Com:23 Cycles MU schedule stalled Spec Cycles MU is not empty but 0 instructions scheduled
Com:24 Cycles LRU schedule stalled Spec Cycles LRU is not empty but 0 instructions scheduled
Com:25 Cycles BU schedule stalled Spec Cycles BU is not empty but 0 instructions scheduled
Load/Store, Data Cache, and Data Line Fill Buffer (DLFB) Events
Com:26 Total translated Total of load and store micro-ops that reach the second stage of the
Spec
LSU 1
Com:27 Loads translated Spec Cacheable load micro-ops translated.1
Com:30 Cacheops translated Spec dcba*, dcbf*, dcbi, dcbst*, and dcbz* instructions translated.
Loads translated and allocated to Applies to same class of instructions as loads translated.
Com:36 Spec
DLFB
Stores completed and allocated to Applies to same class of instructions as stores translated.
Com:37 Nonspec
DLFB
Touches translated and allocated Applies to same class of instructions as touches translated.
Com:38 Spec
to DLFB
Com:40 Data L1 cache locks Cache lines locked in the data L1 cache. (Counts a lock even if an
Nonspec
overlock condition occurs.)
Spec/
Number Event Count Description
Nonspec
Com:41 Data L1 cache reloads Counts cache reloads for any reason. Typically used to determine
Spec
data cache miss rate (along with loads/stores completed).
Com:42 Data L1 cache castouts Spec Does not count castouts due to dcbf*.
Com:43 Load miss with DLFB full. Spec Counts number of stalls; Com:51 counts cycles stalled.
Com:44 Load miss with load queue full. Spec Counts number of stalls; Com:52 counts cycles stalled.
Load guarded miss when the load Counts number of stalls; Com:53 counts cycles stalled.
Com:45 Spec
is not yet at the bottom of the CQ.
Translate a store when the store Counts number of stalls; Com:54 counts cycles stalled.
Com:46 Spec
queue is full.
Com:47 Address collision. Spec Counts number of stalls; Com:55 counts cycles stalled.
Com:48 Data MMU miss. Spec Counts number of stalls; Com:56 counts cycles stalled.
Com:49 Data MMU busy. Spec Counts number of stalls; Com:57 counts cycles stalled.
Second part of misaligned access Counts number of stalls; Com:58 counts cycles stalled.
Com:50 Spec
when first part missed in cache.
Com:51 Load miss with DLFB full. Spec Counts cycles stalled; Com:43 counts number of stalls.
Com:52 Load miss with load queue full. Spec Counts cycles stalled; Com:44 counts number of stalls.
Load guarded miss when the load Counts cycles stalled; Com:45 counts number of stalls.
Com:53 Spec
is not yet at the bottom of the CQ.
Translate a store when the store Counts cycles stalled; Com:46 counts number of stalls.
Com:54 Spec
queue is full.
Com:55 Address collision. Spec Counts cycles stalled; Com:47 counts number of stalls.
Com:56 Data MMU miss. Spec Counts cycles stalled; Com:48 counts number of stalls.
Com:57 Data MMU busy. Spec Counts cycles stalled; Com:49 counts number of stalls.
Second part of misaligned access Counts cycles stalled; Com:50 counts number of stalls.
Com:58 Spec
when first part missed in cache.
Fetch, Instruction Cache, Instruction Line Fill Buffer (ILFB), and Instruction Prefetch Events
Com:59 Instruction L1 cache locks Counts cache lines locked in the instruction L1 cache. (Counts a
Nonspec
lock even if an overlock occurs.)
Instruction L1 cache reloads from Counts reloads due to demand fetch. Typically used to determine
Com:60 Spec
fetch instruction cache miss rate (along with instructions completed)
Number of fetches Counts fetches that write at least one instruction to the IQ. (With
Com:61 Spec instruction fetched (com:4), can used to compute
instructions-per-fetch)
Spec/
Number Event Count Description
Nonspec
Instruction MMU TLB4K reloads Counts reloads in the level 1 instruction MMU TLB4K.þA reload in
Com:62 Spec
the level 2 MMU TLB4Kis not counted.
Instruction MMU VSP reloads Counts reloads in the level 1 instruction MMU VSP.þA reload in the
Com:63 Spec
level 2 MMU VSP is not counted.
Data MMU TLB4K reloads Counts reloads in the level 1 data MMU TLB4K.þA reload in the
Com:64 Spec
level 2 MMU TLB4K is not counted.
Data MMU VSP reloads Counts reloads in the level 1 data MMU VSP. A reload in the level 2
Com:65 Spec
MMU VSP is not counted.
Com:66 L2MMU misses Nonspec Counts instruction TLB/data TLB error interrupts
Com:67 BIU master requests Spec Master transaction starts (number of Aout sent to CoreNet)
BIU master global requests Master transaction starts that are global (M=1) (number of Aout with
Com:68 Spec M=1 sent to CoreNet). For e500mc Rev 1.x and Rev 2.x this event
is not supported.
BIU master data-side requests Master data-side transaction starts (number of D-side Aout sent to
Com:69 Spec
CoreNet)
BIU number of stash requests Stash request on Ain matches stash IDs for the core and are sent to
Com:70 Spec
received LFB. For e500mc Rev 1.x and Rev 2.x this event is not supported.
BIU number of stash accepts LFB signals snarf snoop response for ACRout for stash request. For
Com:71 N/A
e500mc Rev 1.x and Rev 2.x this event is not supported.
Snoop
Snoop requests Externally generated snoop requests. (number of Ain from CoreNet
Com:72 N/A
not from self)
Snoop hits Snoop hits on all data-side resources regardless of the cache state
Com:73 N/A
(modified, shared, or exclusive)
Snoop pushes Snoop pushes from all data-side resources. (Number of ACRout to
Com:74 N/A CoreNet for any snoop push). For e500mc Rev 1.x and Rev 2.x this
event is not supported.
Snoop sharing Number of ACRout when the core retains a copy of the coherency
Com:75 N/A granule. For e500mc Rev 1.x and Rev 2.x this event is not
supported.
Threshold Events
Data line fill buffer load miss cycles Instances when the number of cycles between a load allocation in
C0:76
Spec the data line fill buffer (entry 0) and write-back to the data L1 cache
C1:76
exceeds the threshold.
ILFB fetch miss cycles Instances when the number of cycles between allocation in the ILFB
C0:77
Spec (entry 0) and write-back to the instruction L1 cache exceeds the
C1:77
threshold.
Spec/
Number Event Count Description
Nonspec
External input interrupt latency Instances when the number of cycles between request for interrupt
C0:78
cycles N/A (int) asserted (but possibly masked/disabled) and redirecting fetch
C1:78
to external interrupt vector exceeds threshold.
Critical input interrupt latency Instances when the number of cycles between request for critical
C0:79
cycles N/A interrupt (cint) is asserted (but possibly masked/disabled) and
C1:79
redirecting fetch to the critical interrupt vector exceeds threshold.
External input interrupt pending Instances when the number of cycles between external interrupt
latency cycles pending (enabled and pin asserted) and redirecting fetch to the
C0:80 external interrupt vector exceeds the threshold. Note that this and
N/A
C1:80 the next event may count multiple times for a single interrupt if the
threshold is very small and the interrupt is masked a few cycles after
it is asserted and later becomes unmasked.
Critical input interrupt pending Instances when the number of cycles between pin request for
C0:81 latency cycles critical interrupt pending (enabled and pin asserted) and redirecting
N/A
C1:81 fetch to the critical interrupt vector exceeds the threshold. See note
for previous event.
Chaining Events2
Com:82 PMC0 overflow N/A PMC0[32] transitions from 1 to 0.
Interrupt Events
Misc Events
Transitions of TBL bit selected by Counts transitions of the TBL bit selected by PMGC0[TBSEL].
Com:90 Nonspec
PMGC0[TBSEL].
Com:93 Castouts released — Speculative reservations in castout buffer that are not needed
Com:96 Store retries due to misc — Retries to store queue, excluding MBAR case
Stashing Events
Spec/
Number Event Count Description
Nonspec
Backside L2 Events
L2 cache accesses L2 cache accesses, which include the following: load, store, fetch,
dcba*, dcbz*, dcblc CT = 1, icblc CT=1, dcblc CT=2, icblc CT=2,
dcbf*, dcbst*, dcbi, CI store, icbi*, lwarx, stwcx., write-though
Com:110 Spec
store, CI stwcx., mbar, sync, tlbsync, tlbivax, tlbilx, prefetch
requests(dcbt, dcbtst, dcbtls, dcbtstls CT = 0,1,2 & icbt CT=1, 2
& icbtls CT = 0,1,2), L2 cache allocation
L2 hit cache accesses L2 cache accesses, which include the following: load, store, fetch,
dcba*, dcbz*, dcblc CT = 1, icblc CT=1, dcblc CT=2, icblc CT=2,
dcbf*, dcbst*, dcbi, CI store, icbi*, lwarx, stwcx., write-though
Com:111 Spec
store, CI stwcx., mbar, sync, tlbsync, tlbivax, tlbilx, prefetch
requests(dcbt, dcbtst, dcbtls, dcbtstls CT = 0,1,2 & icbt CT=1, 2
& icbtls CT = 0,1,2) && L2_hit
Com:118 L2 cache dirty data allocations Spec L2 cache dirty data allocations
Spec/
Number Event Count Description
Nonspec
Com:123 L2 cache clean redundant updates Spec L2 cache clean redundant updates
Com:124 L2 cache dirty redundant updates Spec L2 cache dirty redundant updates
Com:127 L2 cache data dirty hits Spec L2 cache data dirty hits
Com:128 Instruction lfb went high priority Spec Instruction lfb went high priority
Coherent lookup miss due to valid Coherent lookup miss due to valid but incoherent (address
Com:133 Spec
but incoherent (address matches) matches)
DVT Events
Com: 148 DVT0 detected Nonspec Detection of a write to DEVENT SPR with DVT0 set
Com: 149 DVT1 detected Nonspec Detection of a write to DEVENT SPR with DVT1 set
Com: 150 DVT2 detected Nonspec Detection of a write to DEVENT SPR with DVT2 set
Com: 151 DVT3 detected Nonspec Detection of a write to DEVENT SPR with DVT3 set
Com: 152 DVT4 detected Nonspec Detection of a write to DEVENT SPR with DVT4 set
Com: 153 DVT5 detected Nonspec Detection of a write to DEVENT SPR with DVT5 set
Com: 154 DVT6 detected Nonspec Detection of a write to DEVENT SPR with DVT6 set
Com: 155 DVT7 detected Nonspec Detection of a write to DEVENT SPR with DVT7 set
Com: 156 Cycles completion stalled (Nexus) Nonspec Number of completion cycles stalled due to Nexus FIFO full
FPU Events
Spec/
Number Event Count Description
Nonspec
Com: 160 FPU double pump Double pump penalized ops finished through the pipe. Counts once
Spec
for every multiply family double pump operation
Com: 162 FPU divide cycles Spec Counts once for every cycle of divide execution. (fdivs and fdiv)
FPU denorm input Counts extra cycles delay due to denormalized inputs. If there is
one, this is incremented 4 times, Two operands increments it 5
Com: 163 Spec
times. This shows the real penalty due to denorms, not just how
often they occur.
FPU result stall Counts extra cycles due to denorm results, overflow, mass
Com: 164 Spec cancellation, zero results, carry-in mispredict, exponent range
check.
FPU pipe sync stall Synchronization-op stalls: count once for each cycle that a
“break-before” FPU is in the RS/issue stage but cannotissue. Also
Com: 166 Spec count once for each cycle that an FPU op is in the RS/issue stage
but cannot issue due to “break-after”: of an FPU op currently in
progress.
FPU input data stall FPU data-ready stall: cycles in which there is an op in the RS/issue
Com: 167 Spec stage that cannot issue because one or more of its operands is not
yet available.
Com: 176 Decorated loads Nonspec Number of decorated loads to cache inhibited memory performed
Com: 177 Decorated stores Nonspec Number of decorated stores to cache inhibited memory performed
PMC2 is selected to count PMC2 overflow events, PMC2 does not increment.
in both that issue queue and the completion queue. If space is available, it decodes
instructions supplied by the instruction queue, renames any source/target
operands, and dispatches them to the appropriate issue queues.
Dispatch Dispatch is the event at the end of the decode stage during which instructions are
passed to the issue queues and tracking of program order is passed to the
completion queue.
Fetch The process of bringing instructions from memory (such as a cache or system
memory) into the instruction queue.
Finish An executed instruction finishes by signaling the completion queue that execution
has concluded. An instruction is said to be finished (but not complete) when the
execution results have been saved in rename registers and made available to
subsequent instructions, but the completion unit has not yet updated the
architected registers.
Issue The stage responsible for reading source operands from rename registers and
register files. This stage also assigns instructions to the proper execution unit.
Latency The number of clock cycles necessary to execute an instruction and make the
results of that execution available to subsequent instructions.
Pipeline In the context of instruction timing, this term refers to interconnected stages. The
events necessary to process an instruction are broken into several cycle-length
tasks to allow work to be performed on several instructions
simultaneously—analogous to an assembly line. As an instruction is processed, it
passes from one stage to the next. When work at one stage is done and the
instruction passes to the next stage, another instruction can begin work in the
vacated stage.
Although an individual instruction may have multiple-cycle latency, pipelining
makes it possible to overlap processing so the number of instructions processed
per clock cycle (throughput) is greater than if pipelining were not implemented.
Program order The order of instructions in an executing program. More specifically, this term is
used to refer to the original order in which program instructions are fetched into
the instruction queue from the cache.
Rename registers Temporary buffers for holding results of instructions that have finished execution
but have not completed. The ability to forward results to rename registers allows
subsequent instructions to access the new values before they have been written
back to the architectural registers.
Reservation station A buffer between the issue and execute stages that allows instructions to be issued
even though resources necessary for execution or results of other instructions on
which the issued instruction may depend are not yet available.
Retirement Removal of a completed instruction from the completion queue at the end of the
completion stage. (In other documents, this is often called deallocation.)
Speculative instruction Any instruction that is currently behind an older branch instruction that has not
been resolved.
Stage Used in two different senses, depending on whether the pipeline is being discussed
as a physical entity or a sequence of events. As a physical entity, a stage can be
viewed as the hardware that handles operations on an instruction in that part of the
pipeline. When viewing the pipeline as a sequence of events, a stage is an element
in the pipeline during which certain actions are performed, such as decoding the
instruction, performing an arithmetic operation, or writing back the results.
Typically, the latency of a stage is one processor clock cycle. Some events, such
as dispatch, write-back, and completion, happen instantaneously and may be
thought to occur at the end of a stage.
An instruction can spend multiple cycles in one stage; for example, a divide takes
multiple cycles in the execute stage.
An instruction can also be represented in more than one stage simultaneously,
especially in the sense that a stage can be seen as a physical resource. For example,
when instructions are dispatched, they are assigned a place in the CQ at the same
time they are passed to the issue queues.
Stall An occurrence when an instruction cannot proceed to the next stage. Such a delay
is initiated to resolve a data or resource hazard, that is, a situation in which a
planned instruction cannot execute in the proper clock cycle because data or
resources needed to process the instruction are not yet available.
Superscalar A superscalar processor is one that can issue multiple instructions concurrently
from a conventional linear instruction stream. In a superscalar implementation,
multiple instructions can execute in parallel at the same time.
Throughput The number of instructions processed per cycle. In particular, throughput
describes the performance of a multiple-stage pipeline where a sequence of
instructions may pass through with a throughput that is much faster than the
latency of an individual instruction.
Write-back Write-back (in the context of instruction handling) occurs when a result is written
into the architecture-defined registers (typically the GPRs). On the e500mc,
write-back occurs in the clock cycle after the completion stage. Results in the
write-back buffer cannot be flushed. If an exception occurs, results from previous
instructions must write back before the exception is taken.
The e500mc can complete as many as two instructions on each clock cycle.
The instruction pipeline stages are described as follows:
• Instruction fetch—Includes the clock cycles necessary to request an instruction and the time the
memory system takes to respond to the request. Fetched instructions are latched into the instruction
queue (IQ) for consideration by the dispatcher.
The fetcher tries to initiate a fetch in every cycle in which it is guaranteed that the IQ has room for
fetched instructions. Instructions are typically fetched from the L1 instruction cache; if caching is
disabled or the fetch misses in the cache, instructions are fetched from the instruction line fill buffer
(ILFB). Likewise, on a cache miss, as many as four instructions can be forwarded to the fetch unit
from the ILFB as the cache line is passed to the instruction cache.
Fetch timing is affected by many things, such as whether an instruction is in the on-chip instruction
cache or an L2 cache. Those factors increase when it is necessary to fetch instructions from system
memory and include the processor-to-bus clock ratio, the amount of bus traffic, and whether any
cache coherency operations are required.
Fetch timing is also affected by whether effective address translation is available in a TLB, as
described in Section 10.3.1.1, “L1 and L2 TLB Access Times.”
• The decode/dispatch stage fully decodes each instruction; most instructions are dispatched to the
issue queues, but isync, rfi, rfgi, rfci, rfdi, rfmci, sc, ehpriv, dnh, wait, and nops are not. Every
dispatched instruction is assigned a GPR rename register, an FPR rename register, and a CR field
rename register, even if they do not specify a GPR, FPR, or CR operand. There is a set of
GPR/FPR/CRF rename registers for each CQ entry.
The three issue queues, BIQ, GIQ, and FIQ, can accept as many as one, two, and two instructions,
respectively, in a cycle. Instruction dispatch requires the following:
— Instructions dispatch only from IQ0 and IQ1.
— As many as two instructions can be dispatched per clock cycle.
— Space must be available in the CQ and the target issue queue for an instruction to decode and
dispatch.
— Dispatch is in order, if IQ0 cannot dispatch, IQ1 will not dispatch.
In this chapter, dispatch is treated as an event at the end of the decode stage.
• The issue stage reads source operands from rename registers and register files and determines when
instructions are latched into reservation stations.
The general behavior of the issue queues is described as follows:
— The GIQ accepts as many as two instructions from the dispatch unit per cycle. SFX0, SFX1,
CFX, and all LSU instructions are dispatched to the GIQ, shown in Figure 10-1.
From IQ0/IQ1
GIQ3
GIQ2
GIQ1 To SFX1, CFX, or LSU
GIQ0 To SFX0, CFX, or LSU
– SFX0 executes all integer simple unit instructions (that is, all that can be dispatched to
simple units).
– SFX1 executes most, but not all of the instructions that can be executed in SFX0.
Most SFX instructions execute in 1 cycle. However some instructions can take more than 1
cycle.
— Complex unit (CFX) executes integer multiplication and division instructions.
The execution unit executes the instruction (perhaps over multiple cycles), writes results on its
result bus, and notifies the CQ when the instruction finishes. The execution unit reports any
exceptions to the completion stage. Instruction-generated exceptions are not taken until the
excepting instruction is next to retire.
Most integer instructions have a 1-cycle latency, so results of these instructions are available
1 clock cycle after an instruction enters the execution unit. The LSU, FPU, and CFX are pipelined.
• The complete and write-back stages maintain the correct architectural machine state and commit
results to the architecture-defined registers in the proper order. If completion logic detects an
instruction containing an exception status or a mispredicted branch, all following instructions are
cancelled, their execution results in rename registers are discarded, and the correct instruction
stream is fetched.
The complete stage ends when the instruction is retired. Two instructions can be retired per clock
cycle. If no dependencies exist, as many as two instructions are retired in program order. The
write-back stage occurs in the clock cycle after the instruction is retired.
If a later instruction needs the result as a source operand, the result is simultaneously made available to the
appropriate execution unit, which allows a data-dependent instruction to be decoded and dispatched
without waiting to read the data from the architected register file. Results are then stored into the correct
architected GPR, CR, or FPR during the write-back stage. Branch instructions that update either the LR or
CTR write back their results in a similar fashion.
To resolve branch instructions and improve the accuracy of branch predictions, the e500mc implements a
dynamic branch prediction mechanism using the 512-entry BTB, a four-way set associative cache of
branch target effective addresses. A BTB entry is allocated whenever a branch resolves as
taken—unallocated branches are always predicted as not taken. Each BTB entry holds a 2-bit saturating
branch history counter whose value is incremented or decremented depending on whether the branch was
taken. These bits can take four values: strongly taken, weakly taken, weakly not taken, and strongly not
taken. This mechanism is described in Section 10.4.1.2, “BTB Branch Prediction and Resolution.”
The e500mc ignores static branch prediction hints; a and t bits in the BO field in branch encodings are
ignored.
Dynamic branch prediction is enabled by setting BUCSR[BPEN]. Clearing BUCSR[BPEN] disables
dynamic branch prediction, in which case the e500mc predicts every branch as not taken.
Branch instructions are treated like any other instruction and are assigned CQ entries to ensure that the
CTR and LR are updated sequentially.
The dispatch rate is affected by the serializing behavior of some instructions and the availability of issue
queues and CQ entries. Instructions are dispatched in program order; an instruction in IQ1 cannot be
dispatched ahead of one in IQ0.
misses, the execution of TLB instructions, and TLB snoop operations (snooping of TLB invalidate
operations from tlbivax instructions on CoreNet).
Note that when a TLB invalidate operation is detected, the L2 MMU arrays become inaccessible due to
the snooping activity caused by the invalidate.
If the MMU is busy due to a higher priority operation, such as a tlbivax or tlbilx, instructions cannot be
fetched until that operation completes.
If the page translation is in neither TLB, an instruction TLB error interrupt occurs, as described in
Section 4.9.15, “Instruction TLB Error Interrupt—IVOR14/GIVOR14.”
TLBs are described in detail in Chapter 6, “Memory Management Units (MMUs).”
If the cache is busy due to a higher priority operation, such as an icbi or a cache line reload,
instructions cannot be fetched until that operation completes.
• If an instruction fetch misses in the instruction cache, it is fetched from the L2 cache.
• If an instruction fetch misses in the instruction cache and the L2 cache, the e500mc initiates a bus
transaction to the off-core memory system.
The architecture defines WIM (of WIMGE) bits that define caching characteristics for the corresponding
page. Fetching instruction as caching-inhibited (I=1) produce the following actions:
• The ILFB may hit, and the instructions returned from the ILFB will be used, even if the ILFB entry
was established by an earlier cacheable access.
• The instruction cache will perform an access and may hit, and if a hit occurs the instructions will
be used.
• The L2 cache will not attempt to perform an access if the access is caching-inhibited.
• If the ILFB and instruction cache do not hit, the fetch is performed by performing bus transactions
to memory and the fetch will return and use the entire fetch group that was requested. Fetching
using caching-inhibited accesses will therefore not produce a bus transaction for each instruction,
but instead one bus transaction for each fetch group.
Software should not alias caching and caching-inhibited real addresses without first invalidating the
caches and performing an isync prior to fetching to those same addresses caching-inhibited.
completion ensures the correct architectural state when the e500mc must recover from a mispredicted
branch or exception.
Instructions are retired much as they are dispatched: as many as two can be retired simultaneously, but
never out of order.
NOTE
• Instructions must be nonspeculative to complete.
• As many as two rename registers can be updated per clock cycle.
Because load and store with update instructions require two rename
registers they are broken into two instructions at dispatch (lwzu is
broken into lwz and addi). These two instructions are assigned two CQ
entries and each is assigned CR and GPR renames at dispatch.
• Some instructions have retirement restrictions, such as retiring only out
of CQ0. See Section 10.3.2.1, “Instruction Serialization.”
Program-related exceptions are signaled when the instruction causing the exception reaches CQ0.
Previous instructions are allowed to complete before the exception is taken, which ensures that any
exceptions those instructions may cause are taken.
Move-to serialization A move-to serialized instruction cannot execute until the cycle after it is in CQ0,
that is, the cycle after it becomes the oldest instruction. This serialization is
weaker than move-from serialization in that the instruction need not spend an
extra cycle in the reservation station. Move-to serializing instructions include
tlbre, tlbsx, tlbwe, mtmsr, wrtee, wrteei, and all mtspr instructions.
Refetch serialization Refetch-serialized instructions force refetching of subsequent instructions after
completion. Refetch serialization is used when an instruction has changed or may
change a particular context needed by subsequent instructions. Examples include
isync, sc, rfi, rfci, rfmci, rfdi, rfgi, wait, sc, ehpriv, dnh, and any instruction that
causes the summary-overflow XER(SO) bit to change state.
Store serialization Applies to stores and some LSU instructions that access the data cache.
Store-serialized instructions are dispatched and held in the LSU’s finished store
queue. They are not committed to memory until all prior instructions have
completed. Although a store-serialized instruction waits in the finished store
queue, other load/store instructions can be freely executed. Some store-serialized
instructions are further restricted to complete only from CQ0. Only one
store-serialized instruction can complete per cycle, although nonserialized
instructions can complete in the same cycle as a store-serialized instruction. In
general, all stores and cache operation instructions are store serialized.
Unit serialization Unit serialization instructions proceed down the execution pipeline in a normal
manner, but blocks the reservation station for the execution unit. This prevents
other instructions from issuing to the reservation station while the unit serialized
instruction executes. Normally such instructions will modify the architectural
state of a renamed register and the serialization ensures that no other instruction
will be accessing the renamed register when the unit serialized instruction
executes.
Execution of sync also generates a SYNC command on the CoreNet interface after which the sync
instruction may be allowed to complete. Subsequent instructions can execute out of order, but they can
complete only after sync completes.
It is the responsibility of the system to guarantee the intention of the SYNC command on the CoreNet
interface—usually by ensuring that any transactions received before the SYNC command from the
e500mc complete in its queues or at their destinations before completing the SYNC command on the
CoreNet interface.
10.4 Execution
The following sections describe instruction execution behavior within each of the respective execution
units in the e500mc.
The e500mc minimizes penalties associated with flow control operations by features such as the branch
target buffer (BTB), dynamic branch prediction, speculative link and counter registers, and nonblocking
caches.
BIQ1
BIQ0 bc
GIQ3
GIQ2
GIQ1 add2
GIQ0 cmp add1 add3
CQ13
. . . . . .
. . . . . .
. . . . . .
CQ6
CQ5
CQ4
CQ3 add2 add2 (SFX1) add3 (SFX0)
CQ2 add1 add1 (SFX0) add2√
CQ1 bc bc (BU) bc (BU) add1√
CQ0 cmp cmp (SFX0) cmp√ bc √
√ indicates that the instruction has finished execution.
Figure 10-2. Branch Completion (LR/CTR Write-Back)
In this example, the bc depends on cmp and is predicted as not taken. At the end of clock cycle 1, cmp
and bc are dispatched to the GIQ and BIQ, respectively, and are issued to SFX0 and the BU at the end of
clock 2.
In clock cycle 3, the cmp executes in SFX0 but the bc cannot resolve and complete until the cmp results
are available; add1 and add2 are dispatched to the GIQ.
In cycle 4, the bc resolves as correctly predicted; add1 and add2 are issued to the SUs and are marked as
nonspeculative, and add3 is dispatched to the GIQ. The cmp is retired from the CQ at the end of cycle 4.
In cycle 5, bc, add1, and add2 finish execution, and bc and add1 retire.
NOTE
Unconditional branches are allocated in the BTB the first time they are
encountered. This example shows how the prediction is updated depending
on whether a branch is taken.
The BPU detects whether a fetch group includes any branches that hit in the BTB, and if so, determines
the fetching path based on the prediction and the target address.
If the prediction is wrong, subsequent instructions and their results are purged. Instructions ahead of the
predicted branch proceed normally, instruction fetching resumes along the correct path, and the history bits
are revised.
The number of speculative branches that have not yet been allocated (and are predicted as not taken) is
limited only by the space available in the pipeline (the branch execute unit, the BIQ, and the IQ). The
presence of speculative branches allocated in the BTB slightly reduces speculation depth.
Instructions after an unresolved branch can execute speculatively, but in-order completion ensures that
mispredicted speculative instructions do not complete. When misprediction occurs, the e500mc easily
redirects fetching and repairs its machine state because the architectural state is not updated. Any
instructions dispatched after a mispredicted branch instruction are flushed from the CQ, and any results
are flushed from the rename registers.
divwx, rA or rB is 0, or rA < rB 4
divwux
rA is representable in 8 bits 11
rA representable in 16 bits 19
rA representable in 32 bits 35
model described in the EREF: A Programmer’s Reference Manual for Freescale Power Architecture®
Processors.
The effect of alignment on memory operation performance is the same for big- and little-endian addressing
modes, including load-multiple and store-multiple operations.
In Table 10-3, optimal means that one effective address (EA) calculation occurs during the memory
operation. Good means that multiple EA calculations occur during the operation, which may cause
additional cache or bus activities with multiple transfers. Poor means that an alignment interrupt is
generated by the memory operation.
Table 10-3. Performance Effects of Operand Placement in Memory
8 byte 8 optimal — —
<4 good good good
4 byte 4 optimal — —
<4 good good good
2 byte 2 optimal — —
<2 good good good
1 byte 1 optimal — —
Information contained in Table 10-4 does not address all effects of the core pipeline, but is intended as a
guide for instruction scheduling.
• The latency is execution latency from the point of when the instruction begins execution in an
execution unit until the execution unit has produced the intended result (that is, when it finishes
execution).
• Other results of the instruction, such as flags (like XER[OV] or the CR result of a “.” instruction)
may take l extra cycle after execution is finished to be available as inputs to other instructions.
• Other cycles taken for things such as instruction fetch, decode, dispatch, and completion are not
represented in this table.
• The repeat rate specifies how many cycles it takes before another instruction dispatched to the unit
can begin execution. For example, an instruction with a latency of 3 and a repeat rate of 1 means
that even though it takes 3 cycles to produce the result, several of these instructions back to back
can produce a result every cycle. This indicates how the particular execution unit is pipelined.
• The type of serialization performed on instructions is described in Section 10.3.2.1, “Instruction
Serialization”.
Table 10-4. e500mc Instruction Latencies
b BU — 1 1 —
ba BU — 1 1 —
bc BU — 1 1 —
bca BU — 1 1 —
bcctr BU — 1 1 —
bcctrl BU — 1 1 —
bcl BU — 1 1 —
bcla BU — 1 1 —
bclr BU — 1 1 —
bclrl BU — 1 1 —
bl BU — 1 1 —
bla BU — 1 1 —
cmp SFX0, SFX1 — 1 1 or 2 EQ bit is 1 cycle to branch unit, other results are
2 cycles
cmpi SFX0, SFX1 — 1 1 or 2 EQ bit is 1 cycle to branch unit, other results are
2 cycles
cntlzw SFX0 — 1 1 —
cntlzw. SFX0 — 1 1 —
crand BU — 1 1 —
crandc BU — 1 1 —
creqv BU — 1 1 —
crnand BU — 1 1 —
crnor BU — 1 1 —
cror BU — 1 1 —
crorc BU — 1 1 —
crxor BU — 1 1 —
dcbt LSU — 1 3 —
dcbtep LSU — 1 3 —
dcbtst LSU — 1 3 —
dcbtstep LSU — 1 3 —
fabs FPU — 2 8 —
fabs. FPU — 2 8 —
fadd FPU — 4 10 —
fadd. FPU — 4 10 —
fadds FPU — 2 8 —
fadds. FPU — 2 8 —
fcmpo FPU — 2 8 —
fcmpu FPU — 2 8 —
fctiw FPU — 2 8 —
fctiw. FPU — 2 8 —
fctiwz FPU — 2 8 —
fctiwz. FPU — 2 8 —
fdiv FPU — 68 68
fdiv. FPU — 68 68
fdivs FPU — 38 38
fdivs. FPU — 38 38
fmadd FPU — 4 10 —
fmadd. FPU — 4 10 —
fmadds FPU — 2 8 —
fmadds. FPU — 2 8 —
fmr FPU — 2 8 —
fmr. FPU — 2 8 —
fmsub FPU — 4 10 —
fmsub. FPU — 4 10 —
fmsubs FPU — 2 8 —
fmsubs. FPU — 2 8 —
fmul FPU — 4 10 —
fmul. FPU — 4 10 —
fmuls FPU — 2 8 —
fmuls. FPU — 2 8 —
fnabs FPU — 2 8 —
fnabs. FPU — 2 8 —
fneg FPU — 2 8 —
fneg. FPU — 2 8 —
fnmadd FPU — 4 10 —
fnmadd. FPU — 4 10 —
fnmadds FPU — 2 8 —
fnmadds. FPU — 2 8 —
fnmsub FPU — 4 10 —
fnmsub. FPU — 4 10 —
fnmsubs FPU — 2 8 —
fnmsubs. FPU — 2 8 —
fres FPU — 38 38 —
fres. FPU — 38 38 —
frsp FPU — 2 8 —
frsp. FPU — 2 8 —
frsqrte FPU — 2 8 —
frsqrte. FPU — 2 8 —
fsel FPU — 2 8 —
fsel. FPU — 2 8 —
fsub FPU — 4 10 —
fsub. FPU — 4 10 —
fsubs FPU — 2 8 —
fsubs. FPU — 2 8 —
lbepx LSU — 1 3 —
lbz LSU — 1 3 —
lbzu LSU — 1 3 —
lbzux LSU — 1 3 —
lbzx LSU — 1 3 —
lfd LSU — 1 4 —
lfdepx LSU — 1 4 —
lfdu LSU — 1 4 —
lfdux LSU — 1 4 —
lfdx LSU — 1 4 —
lfs LSU — 1 4 —
lfsu LSU — 1 4 —
lfsux LSU — 1 4 —
lfsx LSU — 1 4 —
lha LSU — 1 3 —
lhau LSU — 1 3 —
lhaux LSU — 1 3 —
lhax LSU — 1 3 —
lhbrx LSU — 1 3 —
lhepx LSU — 1 3 —
lhz LSU — 1 3 —
lhzu LSU — 1 3 —
lhzux LSU — 1 3 —
lhzx LSU — 1 3 —
lmw LSU — r+3 r+3 r indicates the number of register loaded. lmw
will actually stall in decode while completion
queue entries are allocated for it each cycle.
lwbrx LSU — 1 3 —
lwepx LSU — 1 3 —
lwz LSU — 1 3 —
lwzu LSU — 1 3 —
lwzux LSU — 1 3 —
lwzx LSU — 1 3 —
mbar LSU Store 1 3 In general, mbar will take several more cycles to
perform the ordering
mcrf BU — 1 1 —
mcrfs FPU — 2 8 —
mcrxr BU Presync, postsync 1 1 —
mfmsr SFX0 — 4 4 —
mfpmr SFX0 — 4 4 —
mfspr SFX0, SFX1 — 1 1 mfctr stalls in decode until any other mtctr
(CTR) instruction finishes execution.
mfspr SFX0, SFX1 — 1 1 mflr stalls in decode until any other mtlr
(LR) instruction finishes execution.
mfspr SFX0 — 4 4 —
(other)
mftb SFX0 — 4 4 —
mtcrf SFX0 Presync, postsync, 4 2 If only single field is moved, latency and repeat
move-to rate is same as mtocrf and there is no
serialization.
mtspr SFX0, SFX1 Move-to 1 1 mtlr stalls in decode until any other mtlr
(LR) instruction finishes execution.
or SFX0, SFX1 — 1 1 —
stmw LSU Store r+1 r+3 r indicates the number of register stored. stmw
will actually stall in decode while completion
queue entries are allocated for it each cycle.
sync LSU Postsync, store 1 3 In general, sync will take several more cycles to
(msync) perform the ordering
tlbilx LSU — 1 or 128 3 or 131 When T=0 or T=1, tlbilx requires 131 cycles
latency and 128 cycles of repeat rate
tlbivax LSU — 1 3 —
twi SFX0 — 2 2 —
11.3.1 GPRs
After reset GPRs may contain random values that may differ from core to core or may differ from reset to
reset. Practically, a GPR should not be used as a source input until it has been previously set to a value by
software. However, to aid in debugging boot software, the GPRs should be set to known values at the start
of reset. This can be accomplished by performing an xor instruction for each register using the same
register as the rA, rS, and rB operands:
xor r0,r0,r0 // set r0 to 0
xor r1,r1,r1 // set r1 to 0
... // do for all 32 GPRs
11.3.2 FPRs
At reset FPRs may contain random values that may differ from core to core or may differ from reset to
reset. Practically, an FPR should not be used as a source input until it has been previously set to a value by
software. However, FPRs contain hidden tag bits that describe the type of information that the FPR holds,
and using an FPR that has never been properly initialized may give unpredictable results. therefore the
FPRs should be set to known values at the start of reset. This can be accomplished by loading the FPRs
with a known value from memory. Note that this operation may not be able to be performed until later in
the boot process when software has properly initialized memory, or even possibly at the start of the
operating system or hypervisor. The following code sequence can be used assuming that r3 points to a
doubleword aligned scratch memory location:
mfmsr r5 // get current MSR
xor r4,r4,r4 // set r4 to 0
ori r4,r5,0x2000 // set MSR[FP]
mtmsr r4
isync
xor r4,r4,r4 // set to 0
stw r4,0(r3) // clear first word of memory location
stw r4,4(r3) // clear second word of memory location
lfd fr0,0(r3) // set fr0 to 0
fmr fr1,fr0 // set fr1 to 0
fmr fr2,fr0 // set fr2 to 0
... // set rest of FPRs using fmr from r0
mtmsr r5 // restore MSR (turn off FP if desired)
isync
11.3.3 SPRs
At reset SPRs are generally set to 0, except for certain SPRs that contain either configuration values or that
reflect special state out of reset. SPRs that have initial values other than 0 out of reset are shown in
Table 11-1.
Table 11-1. SPRs with Non-Zero Reset Values
DBSR DBSR[MRR] is set to reflect the most recent reset, which after a hard reset will be 0b10.
L1CFG0 Set to configuration information describing the L1 cache capabilities and organization.
L1CFG1 Set to configuration information describing the L1 cache capabilities and organization.
L2CFG0 Set to configuration information describing the L2 cache capabilities and organization.
MMUCFG Set to configuration information describing the MMU capabilities and organization.
PIR Set to a unique identifier of the core distinct from other cores in the system. This value is set from signal inputs
from the integrated device. The initial value reflects the core’s location in the device’s topology and all cores in
an integrated device contain unique values for that device.
PVR Set to a value which can identify the version of the core from other Power Architecture® cores.
SVR Set to a unique identifier of the integrated device distinct from other SoC products and versions of the same SoC
from Freescale Semiconductor. This value is set from signal inputs from the integrated device. All cores in the
integrated device contain the same value.
TLB0CFG Set to configuration information describing the TLB0 capabilities and organization.
TLB1CFG Set to configuration information describing the TLB1 capabilities and organization.
Other SPRs will need to be set up by software, particularly those SPRs which enable and control various
aspects about how the core operates. Table 11-2 lists SPRs for which software should initialize to
appropriate values at boot time.
Table 11-2. SPRs to Configure the e500mc
BUCSR Branch unit control and status register. See Section 2.11, “Branch Unit Control and Status Register (BUCSR).”
L1CSR0 L1 control and status register. See Section 2.14, “L1 Cache Registers.”
L1CSR1 L1 control and status register. See Section 2.14, “L1 Cache Registers.”
L1CSR2 L1 control and status register. See Section 2.14, “L1 Cache Registers.”
L2CSR0 L2 control and status register. See Section 2.15, “L2 Cache Registers.”
L2CSR1 L2 control and status register. See Section 2.15, “L2 Cache Registers.”
HID0 Error management can be controlled with HID0. Software can set EMCP in order to receive asynchronous errors
from the SoC. EN_L2MMU_MHD can also be set to have hardware detect multiple hits during translation which
can result from MMU programming errors or soft errors in the TLB arrays.
The core can be configured to strongly order all guarded cache inhibited loads and stores by setting CIGLSO
which allows device drivers that perform memory mapped access to cache inhibited guarded memory to not
require memory barriers.
EN_MAS7_UPDATE should be set to 1 in order to use physical addresses larger than 32 bits. In general this will
be the case for many SoCs in which e500mc operates.
Both the Time Base and the Alternate Time Base are set to 0 out of reset. The Alternate Time Base will
begin counting immediately out of reset, however since Time Base ticks are externally signaled to the core,
the Time Base will begin counting once the integrated device is programmed to enable Time Base ticks to
the core. See the integrated device reference manual for more information on enabling Time Base ticks to
the core.
// L1 instruction cache
xor r4,r4,r4 // set r4 to 0
ori r5,r4,0x0102 // set ICFI and ICFLC bits
sync
isync // synchronize setting of L1CSR1
mtspr L1CSR1,r5 // flash invalidate L1 instruction cache
isync // synchronize setting of L1CSR1
iloop:
mfspr r4,L1CSR1 // get current value
and. r4,r4,r5 // test ICFI and ICFLC bits
bne iloop // check again if not complete
isync // discard prefetched instructions
After the caches have been invalidated, they can be enabled by setting the L1CSR0[CE] and L1CSR1[ICE]
bits respectively. Parity checking and/or write shadow mode can be enabled as well by setting the
appropriate bits in L1CSR0 and L1CSR1. See Section 2.14, “L1 Cache Registers” for descriptions of
L1CSR0 and L1CSR1.
After the L2 cache has been invalidated, it can be enabled by setting the L2CSR0[L2E] bit. Error detection
and correction can be enabled as well by setting the appropriate bits in the L2CSR0 register. See
Section 2.15, “L2 Cache Registers” for descriptions of L2CSR0 and L2 error management registers.
5.4.2/5-8 After paragraph two, added the following text: “Only certain configurations of
cache operation are supported when using write shadow mode. Invalid
configurations are not guaranteed to preserve coherency for store operations
performed by the processor.”
5.4.2/5-8 Added Table 5-1, “Valid Write Shadow Mode Configurations (when
L1CSR2[DCWS] = 1).”
5.7/5-23 Rewrote section and added text to clarify the description of the cache flushing
operation.
B.1 Overview
Simplified (or extended) mnemonics allow an assembly-language programmer to program using more
intuitive mnemonics and symbols than the instructions and syntax defined by the instruction set
architecture. For example, to code the conditional call “branch to an absolute target if CR4 specifies a
greater than condition, setting the LR without simplified mnemonics, the programmer would write the
branch conditional instruction, bc 12,17,target. The simplified mnemonic, branch if greater than, bgt
cr4,target, incorporates the conditions. Not only is it easier to remember the symbols than the numbers
when programming, it is also easier to interpret simplified mnemonics when reading existing code.
Although the Power ISA documents include a set of simplified mnemonics, these are not a formal part of
the architecture, but rather a recommendation for assemblers that support the instruction set.
Many simplified mnemonics have been added to those originally included in the architecture
documentation. Some assemblers created their own, and others have been added to support extensions to
the instruction set. Simplified mnemonics have been added for new architecturally defined and new
implementation-specific special-purpose registers (SPRs). These simplified mnemonics are described
only in a very general way.
B.2.2 Subtract
Subtract from instructions subtract the second operand (rA) from the third (rB). The simplified
mnemonics in Table B-2 use the more common order in which the third operand is subtracted from the
second.
Table B-2. Subtract Simplified Mnemonics
Extract and left justify word immediate extlwi rA,rS,n,b (n > 0) rlwinm rA,rS,b,0,n – 1
Extract and right justify word immediate extrwi rA,rS,n,b (n > 0) rlwinm rA,rS,b + n, 32 – n,31
Insert from left word immediate inslwi rA,rS,n,b (n > 0) rlwimi rA,rS,32 – b,b,(b + n) – 1
Insert from right word immediate insrwi rA,rS,n,b (n > 0) rlwimi rA,rS,32 – (b + n),b,(b + n) – 1
Rotate left word immediate rotlwi rA,rS,n rlwinm rA,rS,n,0,31
Shift left word immediate slwi rA,rS,n (n < 32) rlwinm rA,rS,n,0,31 – n
Shift right word immediate srwi rA,rS,n (n < 32) rlwinm rA,rS,32 – n,n,31
Clear left word immediate clrlwi rA,rS,n (n < 32) rlwinm rA,rS,0,n,31
Clear right word immediate clrrwi rA,rS,n (n < 32) rlwinm rA,rS,0,0,31 – n
Clear left and shift left word immediate clrlslwi rA,rS,b,n (n ≤ b ≤ 31) rlwinm rA,rS,n,b – n,31 – n
The BO and BI operands correspond to two fields in the instruction opcode, as Figure B-1 shows for
Branch Conditional (bc, bca, bcl, and bcla) instructions.
0 5 6 10 11 15 16 29 30 31
0 0 1 0 0 0 BO BI BD AA LK
The BO operand specifies branch operations that involve decrementing CTR. It is also used to determine
whether testing a CR bit causes a branch to occur if the condition is true or false.
The BI operand identifies a CR bit to test (whether a comparison is less than or greater than, for example).
The simplified mnemonics avoid the need to memorize the numerical values for BO and BI.
For example, bc 16,0,target is a conditional branch that, as a BO value of 16 (0b1_0000) indicates,
decrements CTR, then branches if the decremented CTR is not zero. The operation specified by BO is
abbreviated as d (for decrement) and nz (for not zero), which replace the c in the original mnemonic; so
the simplified mnemonic for bc becomes bdnz. The branch does not depend on a condition in the CR, so
BI can be eliminated, reducing the expression to bdnz target.
In addition to CTR operations, the BO operand provides an optional prediction bit and a true or false
indicator can be added. For example, if the previous instruction should branch only on an equal condition
in CR0, the instruction becomes bc 8,2,target. To incorporate a true condition, the BO value becomes 8
(as shown in Table B-6); the CR0 equal field is indicated by a BI value of 2 (as shown in Table B-7).
Incorporating the branch-if-true condition adds a ‘t’ to the simplified mnemonic, bdnzt. The equal
condition, that is specified by a BI value of 2 (indicating the EQ bit in CR0) is replaced by the eq symbol.
Using the simplified mnemonic and the eq operand, the expression becomes bdnzt eq,target.
This example tests CR0[EQ]; however, to test the equal condition in CR5 (CR bit 22), the expression
becomes bc 8,22,target. The BI operand of 22 indicates CR[22] (CR5[2], or BI field 0b10110), as shown
in Table B-7. This can be expressed as the simplified mnemonic. bdnzt 4 * cr5 + eq,target.
The notation, 4 * cr5 + eq may at first seem awkward, but it eliminates computing the value of the CR bit.
It can be seen that (4 * 5) + 2 = 22. Note that although 32-bit registers in Power ISA processors are
numbered 32–63, only values 0–31 are valid (or possible) for BI operands. The encoding of the field in the
instruction uses numbering from 0 - 31 and the instruction converts this into the architecturally described
bit number by adding 32.
0 1 2 3 4
BO Bit Description
1 If set, the CR bit comparison is against true, if not set the CR bit comparison is against false
3 If BO[2] is set, this bit determines whether the CTR comparison is for equal to zero or not equal to zero.
4 Used for static branch prediction. Use of the this bit is optional and independent from the interpretation of the rest of the
BO operand. Because simplified branch mnemonics eliminate the BO operand, this bit (the t bit) and other branch
prediction hint bits (the “a” bit) are programmed by adding a plus or minus sign to the simplified mnemonic, as described
in Section B.4.3, “Incorporating the BO Branch Prediction.”
Thus, a BO encoding of 10100 (decimal 20) means ignore the CR bit comparison and do not decrement
the CTR—in other words, branch unconditionally. Encodings for the BO operand are shown in Table B-6.
A z bit indicates that the bit is ignored. However, these bits should be cleared, as they may be assigned a
meaning in a future version of the architecture.
As shown in Table B-6, the ‘c’ in the standard mnemonic is replaced with the operations otherwise
specified in the BO field, (d for decrement, z for zero, nz for nonzero, t for true, and f for false).
NOTE
The test of when a the CTR reaches 0 varies between 32-bit mode and 64-bit
mode. M = 32 in 32-bit mode (of a 64-bit implementation) and M = 0 in
64-bit mode.
Table B-6. BO Operand Encodings
Value1
BO Field Description Symbol
(Decimal)
0000z2 0 Decrement the CTR, then branch if the decremented CTR ≠ 0; condition is FALSE. dnzf
0001z 2 Decrement the CTR, then branch if the decremented CTR = 0; condition is FALSE. dzf
001at3 4 Branch if the condition is FALSE.4 Note that ‘false’ and ‘four’ both start with ‘f’. f
0100z 8 Decrement the CTR, then branch if the decremented CTR ≠ 0; condition is TRUE. dnzt
0101z 10 Decrement the CTR, then branch if the decremented CTR = 0; condition is TRUE. dzt
011at 12 2
Branch if the condition is TRUE. Note that ‘true’ and ‘twelve’ both start with ‘t’. t
1a00t5 16 Decrement the CTR, then branch if the decremented CTR ≠ 0. dnz6
1a01t5 18 Decrement the CTR, then branch if the decremented CTR = 0. dz 6
• A branch conditional with a nonnegative displacement field is predicted not to be taken (fall
through).
• A branch conditional to an address in the LR or CTR is predicted not to be taken (fall through).
If the likely outcome (branch or fall through) of a given branch conditional instruction is known, a suffix can
be added to the mnemonic that tells the assembler how to set the at bits. That is, ‘+’ indicates that the branch
is to be taken and ‘–’ indicates that the branch is not to be taken. This suffix can be added to any branch
conditional mnemonic, standard or simplified.
For relative and absolute branches (bc[l][a]), the setting of the at bits depends on whether the displacement
field is negative or nonnegative. For negative displacement fields, coding the suffix ‘+’ causes the bit to
be cleared, and coding the suffix ‘–’ causes the bit to be set. For nonnegative displacement fields, coding
the suffix ‘+’ causes the bit to be set, and coding the suffix ‘–’ causes the bit to be cleared.
For branches to an address in the LR or CTR (bclr[l] or bcctr[l]), coding the suffix ‘+’ causes the at bits
to be set, and coding the suffix ‘–’ causes the at bits to be set to 0b10.
Examples of branch prediction follow:
1. Branch if CR0 reflects less than condition, specifying that the branch should be predicted as taken.
blt+ target
2. Same as (1), but target address is in the LR and the branch should be predicted as not taken.
bltlr–
These mnemonics are described in Section B.4.6, “Simplified Mnemonics that Incorporate CR
Conditions (Eliminates BO and Replaces BI with crS).”
Integer record-form instructions update CR0 and floating-point record-form instructions update CR1 as
described in Table B-7.
BI
CRn Bit CR Bits (Operand) Description
0–2 3–4
Some simplified mnemonics incorporate only the BO field (as described Section B.4.2, “Eliminating the
BO Operand”). If one of these simplified mnemonics is used and the CR must be accessed, the BI operand
can be specified either as a numeric value or by using the symbols in Table B-8.
Compare word instructions (described in Section B.5, “Compare Word Simplified Mnemonics”),
floating-point compare instructions, move to CR instructions, and others can also modify CR fields, so
CR0 and CR1 may hold values that do not adhere to the meanings described in Table B-7. CR logical
instructions, described in Section B.7, “Condition Register Logical Simplified Mnemonics,” can update
individual CR bits.
Table B-8. BI Operand Settings for CR Fields for Branch Comparisons
CR Bits BI
CRn
Bit Expression Power Description
Bit BI
ISA Bit 0–2 3–4
Operand)
Number
CRn[0] 4 * cr0 + lt (or lt) 0 32 000 00 Less than or floating-point less than (LT, FL).
4 * cr1 + lt 4 36 001 For integer compare instructions:
4 * cr2 + lt 8 40 010 rA < SIMM or rB (signed comparison) or rA < UIMM or
rB (unsigned comparison).
4 * cr3+ lt 12 44 011 For floating-point compare instructions: frA < frB.
4 * cr4 + lt 16 48 100
4 * cr5 + lt 20 52 101
4 * cr6 + lt 24 56 110
4 * cr7 + lt 28 60 111
CRn[1] 4 * cr0 + gt (or gt) 1 33 000 01 Greater than or floating-point greater than (GT, FG).
4 * cr1 + gt 5 37 001 For integer compare instructions:
4 * cr2 + gt 9 41 010 rA > SIMM or rB (signed comparison) or rA > UIMM or
rB (unsigned comparison).
4 * cr3+ gt 13 45 011 For floating-point compare instructions: frA > frB.
4 * cr4 + gt 17 49 100
4 * cr5 + gt 21 53 101
4 * cr6 + gt 25 57 110
4 * cr7 + gt 29 61 111
CRn[2] 4 * cr0 + eq (or eq) 2 34 000 10 Equal or floating-point equal (EQ, FE).
4 * cr1 + eq 6 38 001 For integer compare instructions: rA = SIMM, UIMM,
4 * cr2 + eq 10 42 010 or rB.
For floating-point compare instructions: frA = frB.
4 * cr3+ eq 14 46 011
4 * cr4 + eq 18 50 100
4 * cr5 + eq 22 54 101
4 * cr6 + eq 26 58 110
4 * cr7 + eq 30 62 111
CRn[3] 4 * cr0 + so/un (or 3 35 000 11 Summary overflow or floating-point unordered (SO,
so/un) 7 39 001 FU).
4 * cr1 + so/un 11 43 010 For integer compare instructions, this is a copy of
XER[SO] at instruction completion.
4* cr2 + so/un 15 47 011 For floating-point compare instructions, one or both of
4* cr3 + so/un 19 51 100 frA and frB is a NaN.
4* cr4 + so/un 23 55 101
4* cr5 + so/un 27 59 110
4* cr6 + so/un 31 63 111
4* cr7 + so/un
To provide simplified mnemonics for every possible combination of BO and BI (that is, including bits that
identified the CR field) would require 210 = 1024 mnemonics, most of that would be only marginally
useful. The abbreviated set in Section B.4.5, “Simplified Mnemonics that Incorporate the BO Operand,”
covers useful cases. Unusual cases can be coded using a standard branch conditional syntax.
To identify a CR bit, an expression in which a CR field symbol is multiplied by 4 and then added to a
bit-number-within-CR-field symbol can be used, (for example, cr0 * 4 + eq).
Branch if condition true bt bta btlr btctr btl btla btlrl btctrl
Branch if condition false bf bfa bflr bfctr bfl bfla bflrl bfctrl
Table B-10 shows the syntax for basic simplified branch mnemonics
Table B-11. Branch Instructions
Standard Simplified
Instruction Syntax Syntax
Mnemonic Mnemonic
Branch Conditional bc (bca bcl bcla) BO,BI,target_addr bx1(bxa bxl bxla) BI2target_addr
The simplified mnemonics in Table B-10 that test a condition require a corresponding CR bit as the first
operand (as examples 2–5 below illustrate). The symbols in Table B-9 can be used in place of a numeric
value.
Simplified Simplified
Branch Semantics bc bca
Mnemonic Mnemonic
Branch unconditionally — — — —
Decrement CTR, branch if CTR ≠ 0 bc 16,0,target bdnz target2 bca 16,0,target bdnza target 2
Decrement CTR, branch if CTR ≠ 0 and bc 8,BI,target bdnzt BI,target bca 8,BI,target bdnzta BI,target
condition true
Decrement CTR, branch if CTR ≠ 0 and bc 0,BI,target bdnzf BI,target bca 0,BI,target bdnzfa BI,target
condition false
Decrement CTR, branch if CTR = 0 bc 18,0,target bdz target 2 bca 18,0,target bdza target 2
Decrement CTR, branch if CTR = 0 and bc 10,BI,target bdzt BI,target bca 10,BI,target bdzta BI,target
condition true
Decrement CTR, branch if CTR = 0 and bc 2,BI,target bdzf BI,target bca 2,BI,target bdzfa BI,target
condition false
1
Instructions for which B0 is either 12 (branch if condition true) or 4 (branch if condition false) do not depend on the
CTR value and can be alternately coded by incorporating the condition specified by the BI field, as described in
Section B.4.6, “Simplified Mnemonics that Incorporate CR Conditions (Eliminates BO and Replaces BI with crS).”
2 Simplified mnemonics for branch instructions that do not test CR bits should specify only a target. Otherwise a
Table B-13 lists simplified mnemonics and syntax for bclr and bcctr without LR updating.
Table B-13. Simplified Mnemonics for bclr and bcctr without LR Update
Simplified Simplified
Branch Semantics bclr bcctr
Mnemonic Mnemonic
Table B-13. Simplified Mnemonics for bclr and bcctr without LR Update (continued)
Simplified Simplified
Branch Semantics bclr bcctr
Mnemonic Mnemonic
Decrement CTR, branch if CTR = 0 and condition true bclr 8,BI bdnztlr BI — —
Decrement CTR, branch if CTR = 0 and condition false bclr 2,BI bdzflr BI — —
1
Simplified mnemonics for branch instructions that do not test a CR bit should not specify one; a programming error may occur.
2
Instructions for which B0 is 12 (branch if condition true) or 4 (branch if condition false) do not depend on a CTR value and can
be alternately coded by incorporating the condition specified by the BI field. See Section B.4.6, “Simplified Mnemonics that
Incorporate CR Conditions (Eliminates BO and Replaces BI with crS).”
Table B-14 provides simplified mnemonics and syntax for bcl and bcla.
Table B-14. Simplified Mnemonics for bcl and bcla with LR Update
Simplified Simplified
Branch Semantics bcl bcla
Mnemonic Mnemonic
Branch unconditionally — — — —
Branch if condition true 1
bcl 12,BI,target btl BI,target bcla 12,BI,target btla BI,target
Branch if condition false 1
bcl 4,BI,target bfl BI,target bcla 4,BI,target bfla BI,target
Decrement CTR, branch if CTR ≠ 0 bcl 16,0,target bdnzl target 2 bcla 16,0,target bdnzla target 2
Decrement CTR, branch if CTR ≠ 0 and bcl 8,0,target bdnztl BI,target bcla 8,BI,target bdnztla BI,target
condition true
Decrement CTR, branch if CTR ≠ 0 and bcl 0,BI,target bdnzfl BI,target bcla 0,BI,target bdnzfla BI,target
condition false
Decrement CTR, branch if CTR = 0 bcl 18,BI,target bdzl target 2 bcla 18,BI,target bdzla target 2
Decrement CTR, branch if CTR = 0 and bcl 10,BI,target bdztl BI,target bcla 10,BI,target bdztla BI,target
condition true
Decrement CTR, branch if CTR = 0 and bcl 2,BI,target bdzfl BI,target bcla 2,BI,target bdzfla BI,target
condition false
1 Instructions for which B0 is either 12 (branch if condition true) or 4 (branch if condition false) do not depend on the CTR value
and can be alternately coded by incorporating the condition specified by the BI field. See Section B.4.6, “Simplified Mnemonics
that Incorporate CR Conditions (Eliminates BO and Replaces BI with crS).”
2 Simplified mnemonics for branch instructions that do not test CR bits should specify only a target. A programming error may
occur.
Table B-15 provides simplified mnemonics and syntax for bclrl and bcctrl with LR updating.
Table B-15. Simplified Mnemonics for bclrl and bcctrl with LR Update
Simplified Simplified
Branch Semantics bclrl bcctrl
Mnemonic Mnemonic
Table B-15. Simplified Mnemonics for bclrl and bcctrl with LR Update (continued)
Simplified Simplified
Branch Semantics bclrl bcctrl
Mnemonic Mnemonic
Decrement CTR, branch if CTR ≠ 0 and condition true bclrl 8,BI bdnztlrl BI — —
Decrement CTR, branch if CTR ≠ 0 and condition false bclrl 0,BI bdnzflrl BI — —
1
Decrement CTR, branch if CTR = 0 bclrl 18,0 bdzlrl — —
Decrement CTR, branch if CTR = 0 and condition true bclrl 10, BI bdztlrl BI — —
Decrement CTR, branch if CTR = 0 and condition false bclrl 2,BI bdzflrl BI — —
1
Simplified mnemonics for branch instructions that do not test a CR bit should not specify one. A programming error may occur.
lt Less than — LT
le Less than or equal (equivalent to ng) ng GT
eq Equal — EQ
ge Greater than or equal (equivalent to nl) nl LT
gt Greater than — GT
nl Not less than (equivalent to ge) ge LT
ne Not equal — EQ
ng Not greater than (equivalent to le) le GT
so Summary overflow — SO
ns Not summary overflow — SO
Table B-17 shows the syntax for simplified branch mnemonics that incorporate CR conditions. Here, crS
replaces a BI operand to specify only a CR field (because the specific CR bit within the field is now part
of the simplified mnemonic. Note that the default is CR0; if no crS is specified, CR0 is used.
Table B-17. Branch Instructions and Simplified Mnemonics that Incorporate CR Conditions
Standard Simplified
Instruction Syntax Syntax
Mnemonic Mnemonic
Branch Conditional bc (bca bcl bcla) BO,BI,target_addr bx 1(bxa bxl bxla) crS2,target_addr
Branch Conditional to Link Register bclr (bclrl) BO,BI bxlr (bxlrl) crS
Branch Conditional to Count Register bcctr (bcctrl) BO,BI bxctr (bxctrl) crS
1
x stands for one of the symbols in Table B-16, where applicable.
2
BI can be a numeric value or an expression as shown in Table B-9.
Branch if less than blt blta bltlr bltctr bltl bltla bltlrl bltctrl
Branch if less than or equal ble blea blelr blectr blel blela blelrl blectrl
Branch if equal beq beqa beqlr beqctr beql beqla beqlrl beqctrl
Branch if greater than or equal bge bgea bgelr bgectr bgel bgela bgelrl bgectrl
Branch if greater than bgt bgta bgtlr bgtctr bgtl bgtla bgtlrl bgtctrl
Branch if not less than bnl bnla bnllr bnlctr bnll bnlla bnllrl bnlctrl
Branch if not equal bne bnea bnelr bnectr bnel bnela bnelrl bnectrl
Branch if not greater than bng bnga bnglr bngctr bngl bngla bnglrl bngctrl
Branch if summary overflow bso bsoa bsolr bsoctr bsol bsola bsolrl bsoctrl
Branch if not summary overflow bns bnsa bnslr bnsctr bnsl bnsla bnslrl bnsctrl
Branch if unordered bun buna bunlr bunctr bunl bunla bunlrl bunctrl
Branch if not unordered bnu bnua bnulr bnuctr bnul bnula bnulrl bnuctrl
Instructions using the mnemonics in Table B-18 indicate the condition bit, but not the CR field. If no field
is specified, CR0 is used. The CR field symbols defined in Table B-9 (cr0–cr7) are used for this operand,
as shown in examples 2–4 below.
Table B-20 shows simplified branch mnemonics and syntax for bclr and bcctr without LR updating.
Table B-20. Simplified Mnemonics for bclr and bcctr without Comparison Conditions
or LR Update
Simplified Simplified
Branch Semantics bclr bcctr
Mnemonic Mnemonic
Branch if less than bclr 12,BI1,target bltlr crS target bcctr 12,BI1,target bltctr crS target
Branch if less than or equal bclr 4,BI2 ,target blelr crS target 2
bcctr 4,BI ,target blectr crS target
Branch if not greater than bnglr crS target bngctr crS target
3 12,BI3
Branch if equal bclr 12,BI ,target beqlr crS target bcctr ,target beqctr crS target
1 1
Branch if greater than or equal bclr 4,BI ,target bgelr crS target bcctr 4,BI ,target bgectr crS target
Branch if not less than bnllr crS target bnlctr crS target
2 2
Branch if greater than bclr 12,BI ,target bgtlr crS target bcctr 12,BI ,target bgtctr crS target
Branch if not equal bclr 4,BI3 ,target bnelr crS target 3
bcctr 4,BI ,target bnectr crS target
4,target
Branch if summary overflow bclr 12,BI bsolr crS target bcctr 12,BI4,target bsoctr crS target
Branch if not summary overflow bclr 4,BI4 ,target bnslr crS target 4
bcctr 4,BI ,target bnsctr crS target
1
The value in the BI operand selects CRn[0], the LT bit.
2
The value in the BI operand selects CRn[1], the GT bit.
3 The value in the BI operand selects CRn[2], the EQ bit.
4 The value in the BI operand selects CRn[3], the SO bit.
Table B-21 shows simplified branch mnemonics and syntax for bcl and bcla.
Table B-21. Simplified Mnemonics for bcl and bcla with Comparison Conditions and
LR Update
Simplified Simplified
Branch Semantics bcl bcla
Mnemonic Mnemonic
Branch if less than bcl 12,BI1,target bltl crS target bcla 12,BI1,target bltla crS target
Branch if less than or equal bcl 4,BI2,target blel crS target bcla 4,BI2,target blela crS target
Branch if not greater than bngl crS target bngla crS target
Branch if equal bcl 12,BI3,target beql crS target bcla 12,BI3,target beqla crS target
Branch if greater than or equal bcl 4,BI1,target bgel crS target bcla 4,BI1,target bgela crS target
Branch if not less than bnll crS target bnlla crS target
Branch if greater than bcl 12,BI2,target bgtl crS target bcla 12,BI2,target bgtla crS target
Branch if not equal bcl 4,BI3,target bnel crS target bcla 4,BI3,target bnela crS target
Branch if summary overflow bcl 12,BI4,target bsol crS target bcla 12,BI 4,target
bsola crS target
Branch if not summary overflow bcl 4,BI4,target bnsl crS target bcla 4,BI4,target bnsla crS target
1 The value in the BI operand selects CRn[0], the LT bit.
2
The value in the BI operand selects CRn[1], the GT bit.
3 The value in the BI operand selects CRn[2], the EQ bit.
4 The value in the BI operand selects CRn[3], the SO bit.
Table B-22 shows the simplified branch mnemonics and syntax for bclrl and bcctrl with LR updating.
Table B-22. Simplified Mnemonics for bclrl and bcctrl with Comparison Conditions
and LR Update
Simplified Simplified
Branch Semantics bclrl bcctrl
Mnemonic Mnemonic
Branch if less than bclrl 12,BI1,target bltlrl crS target bcctrl 12,BI1,target bltctrl crS target
Branch if less than or equal bclrl 4,BI2 ,target blelrl crS target 2
bcctrl 4,BI ,target blectrl crS target
Branch if not greater than bnglrl crS target bngctrl crS target
Branch if equal bclrl 12,BI3,target beqlrl crS target bcctrl 12,BI3,target beqctrl crS target
Branch if greater than or equal bclrl 4,BI1 ,target bgelrl crS target 1
bcctrl 4,BI ,target bgectrl crS target
Branch if not less than bnllrl crS target bnlctrl crS target
2,target
Branch if greater than bclrl 12,BI bgtlrl crS target bcctrl 12,BI2,target bgtctrl crS target
Branch if not equal bclrl 4,BI3,target bnelrl crS target bcctrl 4,BI3,target bnectrl crS target
4,target 4,target
Branch if summary overflow bclrl 12,B bsolrl crS target bcctrl 12,BI bsoctrl crS target
4,target
Branch if not summary overflow bclrl 4,BI bnslrl crS target bcctrl 4,BI4,target bnsctrl crS target
1
The value in the BI operand selects CRn[0], the LT bit.
2
The value in the BI operand selects CRn[1], the GT bit.
3 The value in the BI operand selects CRn[2], the EQ bit.
4 The value in the BI operand selects CRn[3], the SO bit.
As with branch mnemonics, the crD field of a compare instruction can be omitted if CR0 is used, as shown
in examples 1 and 3 below. Otherwise, the target CR field must be specified as the first operand. The
following examples use word compare mnemonics:
1. Compare rA with immediate value 100 as signed 32-bit integers and place result in CR0.
cmpwi rA,100 equivalent to cmpi 0,0,rA,100
2. Same as (1), but place results in CR4.
cmpwi cr4,rA,100 equivalent to cmpi 4,0,rA,100
3. Compare rA and rB as unsigned 32-bit integers and place result in CR0.
cmplw rA,rB equivalent to cmpl 0,0,rA,rB
As with branch mnemonics, the crD field of a compare instruction can be omitted if CR0 is used, as shown
in examples 1 and 3 below. Otherwise, the target CR field must be specified as the first operand. The
following examples use word compare mnemonics:
1. Compare rA with immediate value 100 as signed 64-bit integers and place result in CR0.
cmpdi rA,100 equivalent to cmpi 0,1,rA,100
2. Same as (1), but place results in CR4.
cmpdi cr4,rA,100 equivalent to cmpi 4,1,rA,100
3. Compare rA and rB as unsigned 64-bit integers and place result in CR0.
cmpld rA,rB equivalent to cmpl 0,1,rA,rB
lt Less than 16 1 0 0 0 0
le Less than or equal 20 1 0 1 0 0
eq Equal 4 0 0 1 0 0
ge Greater than or equal 12 0 1 1 0 0
gt Greater than 8 0 1 0 0 0
nl Not less than 12 0 1 1 0 0
ne Not equal 24 1 1 0 0 0
ng Not greater than 20 1 0 1 0 0
llt Logically less than 2 0 0 0 1 0
lle Logically less than or equal 6 0 0 1 1 0
lge Logically greater than or equal 5 0 0 1 0 1
lgt Logically greater than 1 0 0 0 0 1
lnl Logically not less than 5 0 0 1 0 1
lng Logically not greater than 6 0 0 1 1 0
— Unconditional 31 1 1 1 1 1
1
The symbol ‘<U’ indicates an unsigned less-than evaluation is performed.
2
The symbol ‘>U’ indicates an unsigned greater-than evaluation is performed.
The mnemonics in Table B-27 are variations of trap instructions, with the most useful TO values
represented in the mnemonic rather than specified as a numeric operand.
Table B-27. Trap Simplified Mnemonics
32-Bit Comparison
Trap Semantics
twi Immediate tw Register
32-Bit Comparison
Trap Semantics
twi Immediate tw Register
2 Equal
mtsprgn rS mfsprgn rD
The following instruction complements the contents of rS and places the result into rA. This mnemonic
can be coded with a dot (.) suffix to cause the Rc bit to be set in the underlying instruction.
not rA,rS equivalent to nor rA,rS,rS
Lightweight sync
lwsync equivalent to sync 1
Heavyweight sync
hwsync equivalent to sync 0
Book E / PowerPC compatibility
sync equivalent to sync 0
msync equivalent to sync 0