Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Pentium 4 Structure

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Computer Organization

and Architecture

Pentium Processor
Pentium 4 Diagram (Simplified)
Pentium 4 Core Processor
„ Fetch/Decode Unit
• Fetches instructions from L2 cache
• Decode into micro-ops
• Store micro-ops in L1 cache
„ Out of order execution logic
• Schedules micro-ops
• Based on data dependence and resources
• May speculatively execute
„ Execution units
• Execute micro-ops
• Data from L1 cache
• Results in registers
„ Memory subsystem
• L2 cache and systems bus
Pentium 4 Core Processor
„ System bus Speed
„ 400MH
„ datapath between the L2 memory cache
and L1 data cache is 256-bit
„ between L2 memory cache and the pre-
fetch unit continues to be 64-bit wide.
„ 128 internal registers
• Pentium 4 has five execution units
working in parallel and two units for
loading and storing data on RAM memory.
• BTB was increased to 4,096 entries
Pentium 4 Core Processor
„ each CPU uses its own RISC
instructions, which are not public
documented and are incompatible
with microinstructions from other
CPUs. I.e., Pentium III
microinstructions are different from
Pentium 4
„ Intel doesn’t tell the depth (size) of
this queue.
Pentium 4 Design Reasoning
„ Decodes instructions into RISC like micro-ops before L1
cache
„ Micro-ops fixed length
• Superscalar pipelining and scheduling
„ Pentium instructions long & complex
„ Performance improved by separating decoding from
scheduling & pipelining
„ Data cache is write back
• Can be configured to write through
„ L1 cache controlled by 2 bits in register
• CD = cache disable
• NW = not write through
• 2 instructions to invalidate (flush) cache and write back then
invalidate
Pentium Data Types

„ 8 bit Byte
„ 16 bit word
„ 32 bit double word
„ 64 bit quad word
„ Addressing is by 8 bit unit
„ A 32 bit double word is read at
addresses divisible by 4
Specific Data Types
„ General - arbitrary binary contents
„ Integer - singned binary value

„ Ordinal - unsigned integer

„ Unpacked BCD - One digit per byte

„ Packed BCD - 2 BCD digits per byte

Floating Point
Pentium Floating Point Data Types
Pentium operations Types
„ Arithmetic
„ Logical
„ Data Movement
„ Control Transfer
„ String operations
„ MMX
„ Segment Register
„ Protection
„ Cache management
Pentium Addressing Modes

• Immediate
• Register operand
• Displacement
• Base
• Base with displacement
• Scaled index with displacement
• Base with index and displacement
• Base scaled index with displacement
• Relative
Pentium Addressing Mode Calculation
Pentium Instruction Format
Pentium 4 Registers
EFLAGS Register
Control Registers
MMX Register Mapping
„ MMX uses several 64 bit data types
„ Use 3 bit register address fields
• 8 registers
„ No MMX specific registers
• Aliasing to lower 64 bits of existing
floating point registers
MMX Register Mapping Diagram
Pentium 4 Diagram
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
PIPELINE STAGES
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
„ TC Nxt IP: looks at BTBfor the next
microinstruction to be executed. This step
takes two stages.
„ TC Fetch: Trace cache fetch. Loads, from
the trace cache, this microinstruction. This
step takes two stages.
„ Drive: Sends the microinstruction to be
processed to the resource allocator and
register renaming circuit.
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
„ Alloc: Allocate. Checks which CPU resources will
be needed by the microinstruction
„ Rename: If the program uses one of the eight
standard x86 registers it will be renamed into
one of the 128 internal registers present on
Pentium 4. This step takes two stages.
„ Que: Queue. The microinstructions are put in
queues accordingly to their types (for example,
integer or floating point.
„ Sch: Schedule. Microinstructions are scheduled
to be executed accordingly to its type (integer,
floating point, etc). Before arriving to this stage,
all instructions are in order, This step takes
three stages
BREIF DESCRIPTION OF EACH
PIPELINE STAGE
„ Disp: Dispatch. Sends the microinstructions to their
corresponding execution engines. This step takes two
stages.
„ RF: Register file. The internal registers, stored in the
instructions pool, are read. This step takes two stages.
„ Ex: Execute. Microinstructions are executed.
„ Flgs: Flags. The microprocessor flags are updated.
„ Br Ck: Branch check. Checks if the branch taken by the
program is the same predicted by the branch prediction
circuit.
„ Drive: Sends the results of this check to the branch target
buffer (BTB) present on the processor’s entrance
Power pc processors summary
First Ship Clock L1 cache L2 cache Number of
Date Speeds transistors
(106)
601 1993 50- - - 2.8
120Mhz
603/ 603e 1994 100- 16KB inst - 1.6-2.6
300MHz 16KB dat
604/ 604e 1994 166- 32KB inst 3.6-5.1
350MHz 32KB dat
740/750 1997 200- 32KB inst 256KB – 6.35
(G3) 366MHz 32KB dat 1MB

G4 1999 500MHz 32KB inst 256KB –


32KB dat 1MB

G5 2003 2.5GHz 32KB inst 512kB 58


64KB dat
POWER PC BLOCK DIAGRAM
Power pc G5 cache
„ L1: eight way set associative
„ L2:two way ( 256k, 512k or 1MB
„ L3: offchip upto 1MB
PowerPC Data Types
„ 8 (byte), 16 (halfword), 32 (word) and 64
(doubleword) length data types
„ Fixed point processor recognises:
• Unsigned byte, unsigned halfword, signed
halfword, unsigned word, signed word,
unsigned doubleword, byte string
• Floating point
• IEEE 754
• Single or double precision
PowerPC Addressing Modes
„ Load/store architecture
• Indirect
„ Instruction includes 16 bit displacement to be added to base
register (may be GP register)
„ Can replace base register content with new address
• Indirect indexed
„ Instruction references base register and index register (both may
be GP)
„ EA is sum of contents
„ Branch address
• Absolute
• Relative
• Indirect
„ Arithmetic
• Operands in registers or part of instruction
• Floating point is register only
PowerPC Memory Operand
Addressing Modes
„ lwz r3, 4(r1) (without update)
„ r3 = mem[r1+4]
„ lwzu r3, 4(r1) (with update)
„ r3 = mem[r1+4]
r1 = r1 + 4
„ lwzx r3, r1, r2
„ r3 = memory[r1+r2]
„ lwzux r3, r1, r2
„ r3 = memory[r1+r2]
r1 = r1 + r2
PowerPC instruction format
PowerPC instruction format
PowerPC User Visible Registers
PowerPC Register Formats

You might also like