IA-32 Architecture: Computer Organization and Assembly Language Dr. Aiman El-Maleh
IA-32 Architecture: Computer Organization and Assembly Language Dr. Aiman El-Maleh
IA-32 Architecture: Computer Organization and Assembly Language Dr. Aiman El-Maleh
COE 205
Computer Organization and Assembly Language
Dr. Aiman El-Maleh
[Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]
Next ...
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 2
Intel Microprocessors
Intel introduced the 8086 microprocessor in 1979
8086, 8087, 8088, and 80186 processors
16-bit processors with 16-bit registers
16-bit data bus and 20-bit address bus
Physical address space = 220 bytes = 1 MB
8087 Floating-Point co-processor
Uses segmentation and real-address mode to address memory
Each segment can address 216 bytes = 64 KB
8088 is a less expensive version of 8086
Uses an 8-bit data bus
80186 is a faster version of 8086
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 3
Intel 80286 and 80386 Processors
80286 was introduced in 1982
24-bit address bus ⇒ 224 bytes = 16 MB address space
Introduced protected mode
Segmentation in protected mode is different from the real mode
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 6
Pentium 4 and Xeon Family
Pentium 4 is a seventh-generation x86 architecture
Introduced in 2000
New micro-architecture design called Intel Netburst
Very deep instruction pipeline, scaling to very high frequencies
Introduced the SSE2 instruction set (extension to SSE)
Tuned for multimedia and operating on the 128-bit XMM registers
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 10
Basic Program Execution Registers
Registers are high speed memory inside the CPU
Eight 32-bit general-purpose registers
Six 16-bit segment registers
Processor Status Flags (EFLAGS) and Instruction Pointer (EIP)
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 11
General-Purpose Registers
Used primarily for arithmetic and data movement
mov eax, 10 move constant 10 into register eax
Specialized uses of Registers
EAX – Accumulator register
Automatically used by multiplication and division instructions
ECX – Counter register
Automatically used by LOOP instructions
ESP – Stack Pointer register
Used by PUSH and POP instructions, points to top of stack
ESI and EDI – Source Index and Destination Index register
Used by string instructions
EBP – Base Pointer register
Used to reference parameters and local variables on the stack
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 12
Accessing Parts of Registers
EAX, EBX, ECX, and EDX are 32-bit Extended registers
Programmers can access their 16-bit and 8-bit parts
Lower 16-bit of EAX is named AX
8 8
AX is further divided into
AH AL 8 bits + 8 bits
AL = lower 8 bits
AH = upper 8 bits AX 16 bits
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 13
Special-Purpose & Segment Registers
EIP = Extended Instruction Pointer
Contains address of next instruction to be executed
EFLAGS = Extended Flags Register
Contains status and control flags
Each flag is a single binary bit
Six 16-bit Segment Registers
Support segmented memory
Six segments accessible at a time
Segments contain distinct contents
Code
Data
Stack
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 14
EFLAGS Register
Status Flags
Status of arithmetic and logical operations
Control and System flags
Control the CPU operation
Programs can set and clear individual bits in the EFLAGS register
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 15
Status Flags
Carry Flag
Set when unsigned arithmetic result is out of range
Overflow Flag
Set when signed arithmetic result is out of range
Sign Flag
Copy of sign bit, set when result is negative
Zero Flag
Set when result is zero
Auxiliary Carry Flag
Set when there is a carry from bit 3 to bit 4
Parity Flag
Set when parity is even
Least-significant byte in result contains even number of 1s
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 16
Floating-Point, MMX, XMM Registers
Floating-point unit performs high speed FP operations
Eight 80-bit floating-point data registers
ST(0), ST(1), . . . , ST(7) ST(0)
ST(1)
Arranged as a stack
ST(2)
Used for floating-point arithmetic
ST(3)
Eight 64-bit MMX registers ST(4)
Used with MMX instructions ST(5)
ST(6)
Eight 128-bit XMM registers
ST(7)
Used with SSE instructions
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 17
Next ...
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 18
Fetch-Execute Cycle
Each machine language instruction is first fetched from
the memory and stored in an Instruction Register (IR).
The address of the instruction to be fetched is stored in a
register called Program Counter or simply PC. In some
computers this register is called the Instruction Pointer
or IP.
After the instruction is fetched, the PC (or IP) is
Flash Movie
incremented to point to the address of the next
instruction.
The fetched instruction is decoded (to determine what
needs to be done) and executed by the CPU.
Flash Movie
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 19
Instruction Execute Cycle
Instruction
Obtain instruction from program storage
Fetch
Instruction
Determine required actions and instruction size
Infinite Cycle
Decode
Operand
Locate and obtain operand data
Fetch
Writeback
Deposit results in storage for later use
Result
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 20
Instruction Execution Cycle – cont'd
PC program
Instruction Fetch I1 I2 I3 I4 ...
memory fetch
Instruction Decode op1
read
op2
registers
Operand Fetch
registers
instruction
I1 register
Execute
decode
Result Writeback
write
write
flags ALU
execute
(output)
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 21
Pipelined Execution
Instruction execution can be divided into stages
Pipelining makes it possible to start an instruction before
completing the execution of previous one
1 I-1
6 dc de I-1
2 I-2 I-1
7 I-2 loc xe
k c cut
Cycles
3 I-2 I-1
8 I-2
yc ion 4 I-2 I-1
9 I-2 les
10 I-2 5 Pipelined I-2 I-1
11 I-2 6 I-2 I-1
Execution
12 I-2 7 I-2
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 22
Wasted Cycles (pipelined)
When one of the stages requires two or more clock
cycles to complete, clock cycles are again wasted
Cycles
4 I-3 I-2 I-1
As more instructions enter the
5 I-3 I-1
pipeline, wasted cycles occur 6 I-2 I-1
7 I-2 I-1
For k stages, where one 8 I-3 I-2
stage requires 2 cycles, n 9 I-3 I-2
instructions require k + 2n – 1 10 I-3
11 I-3
cycles
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 23
Superscalar Architecture
A superscalar processor has multiple execution pipelines
The Pentium processor has two execution pipelines
Called U and V pipes
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 24
Next ...
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 25
Modes of Operation
Real-Address mode (original mode provided by 8086)
Only 1 MB of memory can be addressed, from 0 to FFFFF (hex)
Programs can access any part of main memory
MS-DOS runs in real-address mode
Protected mode (introduced with the 80386 processor)
Each program can address a maximum of 4 GB of memory
The operating system assigns memory to each running program
Programs are prevented from accessing each other’s memory
Native mode used by Windows NT, 2000, XP, and Linux
Virtual 8086 mode
Processor runs in protected mode, and creates a virtual 8086
machine with 1 MB of address space for each running program
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 26
Real Address Mode
A program can access up to six segments
at any time
Code segment
Stack segment
Data segment
Extra segments (up to 3)
Each segment is 64 KB
Logical address
Segment = 16 bits
Offset = 16 bits
Solution:
A1F00 (add 0 to segment in hex)
+ 04C0 (offset in hex)
A23C0 (20-bit linear address in hex)
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 28
Your turn . . .
What linear address corresponds to logical address
028F:0030?
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 30
Programmer View of Flat Memory
Same base address for all segments Linear address space of
All segments are mapped to the same a program (up to 4 GB)
linear address space 32-bit address
ESI
EIP Register EDI DATA
Points at next instruction 32-bit address
EIP
CODE
ESI and EDI Registers
32-bit address
Contain data addresses EBP STACK
Used also to index arrays ESP
CS
ESP and EBP Registers DS Unused
ESP points at top of stack SS
ES
EBP is used to address parameters and
base address = 0
variables on the stack for all segments
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 31
Protected Mode Architecture
Logical address consists of
16-bit segment selector (CS, SS, DS, ES, FS, GS)
32-bit offset (EIP, ESP, EBP, ESI ,EDI, EAX, EBX, ECX, EDX)
Segment unit translates logical address to linear address
Using a segment descriptor table
Linear address is 32 bits (called also a virtual address)
Paging unit translates linear address to physical address
Using a page directory and a page table
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 32
Logical to Linear Address Translation
Upper 13 bits of
segment selector GDTR, LDTR
are used to index
the descriptor table
TI = Table Indicator
Select the descriptor table
0 = Global Descriptor Table
1 = Local Descriptor Table
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 33
Segment Descriptor Tables
Global descriptor table (GDT)
Only one GDT table is provided by the operating system
GDT table contains segment descriptors for all programs
Also used by the operating system itself
Table is initialized during boot up
GDT table address is stored in the GDTR register
Modern operating systems (Windows-XP) use one GDT table
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 35
Segment Visible and Invisible Parts
Visible part = 16-bit Segment Register
CS, SS, DS, ES, FS, and GS are visible to the programmer
Invisible Part = Segment Descriptor (64 bits)
Automatically loaded from the descriptor table
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 36
Paging
Paging divides the linear address space into …
Fixed-sized blocks called pages, Intel IA-32 uses 4 KB pages
Operating system allocates main memory for pages
Pages can be spread all over main memory
Pages in main memory can belong to different programs
If main memory is full then pages are stored on the hard disk
OS has a Virtual Memory Manager (VMM)
Uses page tables to map the pages of each running program
Manages the loading and unloading of pages
As a program is running, CPU does address translation
Page fault: issued by CPU when page is not in memory
IA-32 Architecture COE 205 – Computer Organization and Assembly Language – KFUPM slide 37
Paging – cont’d
Main Memory
The operating
system uses
space of Program 2
space of Program 1
Page m ... Page n
page tables to
... ...
map the pages
in the linear Page 2 Page 2
virtual address Page 1 Page 1
space onto Page 0 Page 0
main memory
Hard Disk
The operating
Each running Pages that cannot system swaps
program has fit in main memory pages between
its own page are stored on the memory and the
table hard disk hard disk