Pentium 4 Structure
Pentium 4 Structure
Pentium 4 Structure
and Architecture
Pentium Processor
Pentium 4 Diagram (Simplified)
Pentium 4 Core Processor
Fetch/Decode Unit
• Fetches instructions from L2 cache
• Decode into micro-ops
• Store micro-ops in L1 cache
Out of order execution logic
• Schedules micro-ops
• Based on data dependence and resources
• May speculatively execute
Execution units
• Execute micro-ops
• Data from L1 cache
• Results in registers
Memory subsystem
• L2 cache and systems bus
Pentium 4 Core Processor
System bus Speed
400MH
datapath between the L2 memory cache
and L1 data cache is 256-bit
between L2 memory cache and the pre-
fetch unit continues to be 64-bit wide.
128 internal registers
• Pentium 4 has five execution units
working in parallel and two units for
loading and storing data on RAM memory.
• BTB was increased to 4,096 entries
Pentium 4 Core Processor
each CPU uses its own RISC
instructions, which are not public
documented and are incompatible
with microinstructions from other
CPUs. I.e., Pentium III
microinstructions are different from
Pentium 4
Intel doesn’t tell the depth (size) of
this queue.
Pentium 4 Design Reasoning
Decodes instructions into RISC like micro-ops before L1
cache
Micro-ops fixed length
• Superscalar pipelining and scheduling
Pentium instructions long & complex
Performance improved by separating decoding from
scheduling & pipelining
Data cache is write back
• Can be configured to write through
L1 cache controlled by 2 bits in register
• CD = cache disable
• NW = not write through
• 2 instructions to invalidate (flush) cache and write back then
invalidate
Pentium Data Types
8 bit Byte
16 bit word
32 bit double word
64 bit quad word
Addressing is by 8 bit unit
A 32 bit double word is read at
addresses divisible by 4
Specific Data Types
General - arbitrary binary contents
Integer - singned binary value
Floating Point
Pentium Floating Point Data Types
Pentium operations Types
Arithmetic
Logical
Data Movement
Control Transfer
String operations
MMX
Segment Register
Protection
Cache management
Pentium Addressing Modes
• Immediate
• Register operand
• Displacement
• Base
• Base with displacement
• Scaled index with displacement
• Base with index and displacement
• Base scaled index with displacement
• Relative
Pentium Addressing Mode Calculation
Pentium Instruction Format
Pentium 4 Registers
EFLAGS Register
Control Registers
MMX Register Mapping
MMX uses several 64 bit data types
Use 3 bit register address fields
• 8 registers
No MMX specific registers
• Aliasing to lower 64 bits of existing
floating point registers
MMX Register Mapping Diagram
Pentium 4 Diagram
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
PIPELINE STAGES
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
TC Nxt IP: looks at BTBfor the next
microinstruction to be executed. This step
takes two stages.
TC Fetch: Trace cache fetch. Loads, from
the trace cache, this microinstruction. This
step takes two stages.
Drive: Sends the microinstruction to be
processed to the resource allocator and
register renaming circuit.
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
Alloc: Allocate. Checks which CPU resources will
be needed by the microinstruction
Rename: If the program uses one of the eight
standard x86 registers it will be renamed into
one of the 128 internal registers present on
Pentium 4. This step takes two stages.
Que: Queue. The microinstructions are put in
queues accordingly to their types (for example,
integer or floating point.
Sch: Schedule. Microinstructions are scheduled
to be executed accordingly to its type (integer,
floating point, etc). Before arriving to this stage,
all instructions are in order, This step takes
three stages
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
Disp: Dispatch. Sends the microinstructions to their
corresponding execution engines. This step takes two
stages.
RF: Register file. The internal registers, stored in the
instructions pool, are read. This step takes two stages.
Ex: Execute. Microinstructions are executed.
Flgs: Flags. The microprocessor flags are updated.
Br Ck: Branch check. Checks if the branch taken by the
program is the same predicted by the branch prediction
circuit.
Drive: Sends the results of this check to the branch target
buffer (BTB) present on the processor’s entrance
Power pc processors summary
First Ship Clock L1 cache L2 cache Number of
Date Speeds transistors
(106)
601 1993 50- - - 2.8
120Mhz
603/ 603e 1994 100- 16KB inst - 1.6-2.6
300MHz 16KB dat
604/ 604e 1994 166- 32KB inst 3.6-5.1
350MHz 32KB dat
740/750 1997 200- 32KB inst 256KB – 6.35
(G3) 366MHz 32KB dat 1MB