Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
BLM 3622- Microprocessors 1
A Historical Background
• The idea of calculating with a machine dates to before
500 B.C. when the Babylonians invented the abacus, the
first mechanical calculator.
BLM 3622- Microprocessors 2
Blaise Pascal (1623-1662)
 The abacus was not improved until 1642, when
Blaise Pascal invented a calculator constructed a
gear and wheels.
BLM 3622- Microprocessors 3
Charles Babbage
 One early pioneer of mechanical computing machinery was
Charles Babbage and produce a programmable calculating
machine in 1823. He create the “Analytical Engine”. This machine
was a mechanical computer that stored 1000 20-digit decimal
numbers and variable program that could modify the function of the
machine.
BLM 3622- Microprocessors 4
 In 1889, Herman Hollerith developed the punched card for
storing data and also developed a mechanical machine-driven
by one of the new electric motors. He was the former of IBM
Corporation.
Herman Hollerith
BLM 3622- Microprocessors 5
Konrad Zuse
 The first electronic calculating machine invented in
1941 by Konrad Zuse. He had released the first
programmable computer designed to solve
complex engineering equations. It was also the first
machine to work on the binary system, as opposed
to the more familiar decimal system. His calculating
computer was used in aircraft and missile design
during World War II for the German war effort.
BLM 3622- Microprocessors 6
Binary System
BLM 3622- Microprocessors 7
 The first truly electronic computer was places into
operation in 1943 to break secret German military
codes. The first electronic computer system, which
used vacuum tubes, was invented by Alan Turing
who is a British mathematician . Turing called this
machine Colossus, most likely because its size. A
problem with Colossus was that although its design
allowed it to break secret German military codes
generated by mechanical Enigma machine, it could
not solve other problems. Colossus was not
programmable- it was a fixed program computer
system.
Alan Turing
BLM 3622- Microprocessors 8
Sample Turing Machine
BLM 3622- Microprocessors 9
 The first general purpose
programmable computer
system was developed in
1946 and called ENIAC.
The ENIAC was a huge
machine(30 tons) and
performed about 100 000
operation per second.
ENIAC
BLM 3622- Microprocessors 10
John von Neumann
 In 1945, Von Neumann
contributed a new
understanding of how
practical fast computers
should be organized and
built; these ideas, often
referred to as the stored-
program technique,
became fundamental for
future generations of
high-speed digital
computers and were
universally adopted.
BLM 3622- Microprocessors 11
Transistor
BLM 3622- Microprocessors 12
INTEL 4004
 The development of transistor in 1948, followed by
the invention of the integrated circuits in 1958. In
1971, the first microprocessor Intel 4004 was
developed. 4004 was a 4-bit microprocessor and
instruction set contains 45 instruction. It performed
about 50 000 instruction per second.
BLM 3622- Microprocessors 13
 In 1978, Intel released the 8086 microprocessor
which was 16-bit microprocessor and performed 2.5
million instruction per second.
 This microprocessor were called CISC(Complex
Instruction Set Computers) because of the number
and complexity of instructions.
 The popularity of Intel family was ensured in 1981
when IBM Corp. decided to use 8088/8086
microprocessors in its personal computers.
Intel 8086
BLM 3622- Microprocessors 14
Intel 8086/8088 Microprocessors
 Intel 8086 and 8088 Microprocessors are
the basis of all IBM-PC compatible
computers
(8086 introduced in 1978, first IBM-PC released in 1981)
 All Intel, AMD and other advanced
microprocessors are based on and are
compatible with the original 8086/8
 At Power Up and Reset time, Pentiums,
Athlons etc all look like 8086 processors
BLM 3622- Microprocessors 15
Intel 8086/8088 Microprocessors
 Intel 8086 is a 16-bit microprocessor
 16-bit data registers
 16 or 8 bit external data bus
 Some techniques to optimise the CPU
performance when it’s executing programs
 Segment: Offset memory model
 Little-Endian Data Format
BLM 3622- Microprocessors 16
8086/8088 (1)
 Original IBM PC used 8088 micrprocessor
 8088 is similar to the 8086 microprocessor but it
has an external 8-bit bus & only 4-deep queue
 For cost reduction reasons
 We can consider 8086 and 8088 together
 PC clones often used 8086 for better
performance
 8-bit bus reduces performance, but meant
cheaper computers
BLM 3622- Microprocessors 17
8086/8088 (2)
 Remember the Fetch-Decode-Execute cycle?
 Fetching from EXTERNAL MEMORY is SLOW
 The 8086/8 used an instruction queue to
speed up performance
 While the processor is decoding and
executing an instruction, its bus interface
can be reading new instructions, since at
that time the bus is not actually in use
BLM 3622- Microprocessors 18
8086/8088 Functional Units
Execution Unit
(EU)
Bus Interface
Unit(BIU)
Fetches Opcodes,
Reads Operands,
Writes Data
8086/8088 MPU
BLM 3622- Microprocessors 19
8086/8088 (3)
 8086/8088 consists of two internal units
 The execution unit (EU) - executes the
instructions
 The bus interface unit (BIU) - fetches
instructions, reads operands and writes results
 The 8086 has a 6-byte prefetch queue
 The 8088 has a 4-byte prefetch queue
BLM 3622- Microprocessors 20
8086/8088 Internal Organisation
Temporary
Registers
ALU
Flags
EU
Control
AH AL
BH BL
CH CL
DH DL
SP
BP
DI
BI
CS
DS
SS
ES
IO
Internal
Communications
Registers
SUMMATION
Address Bus 20 bits
Data Bus
Bus
Control
1 2 3 4
Instruction Queue
8088
Bus
EU BIU
BIU Elements
 Instruction Queue: the next instructions or data can be
fetched from memory while the processor is executing
the current instruction
 The memory interface is slower than the processor execution
time so this speeds up overall performance
 Segment Registers:
 CS, DS, SS and ES are 16-bit registers
 Used with the 16-bit Base registers to generate the 20-bit
address
 Allow the 8086/8088 to address 1Mb of memory
 Changed under program control to point to different segments
as a program executes
 Instruction Pointer (IP) contains the Offset Address of
the next instruction, the distance in bytes from the
address given by the current CS register
BLM 3622- Microprocessors 21
BLM 3622- Microprocessors 22
8086/8088 20-bit Addresses
16-bit Segnment Base Address 0000
16-bit Offset Address
20-bit Physical Address
CS
IP
BLM 3622- Microprocessors 23
Exercise: 20-bit Addressing
Memory
00000h
123A0h
157BBh
2239Fh
223A0h
341Bh
Range of
Code
Segment
CS=123Ah
IP=341Bh
BLM 3622- Microprocessors 24
Exercise: 20-bit Addressing
1. CS contains 0A820h,IP contains 0CE24h.
What is the resulting physical address?
2. CS contains 0B500h, IP contains 0024h.
What is the resulting physical address?
BLM 3622- Microprocessors 25
Segment Registers
The utilization of the segment registers
essentially divides the memory space into
overlapping segments, with each segment
being 64K bytes long and at an address that
is divisible by 16.
BLM 3622- Microprocessors 26
The advantage of using segment registers
 Allow the memory capacity to be 1 M Byte even
though the addresses associated with the individual
instructions are only 16 bits wide.
 Allow the instruction, data or stack portion of a
program to be more than 64K Bytes long by allowing
more than one code, data or stack segment.
 Facilitate the use of separate memory areas for a
program, its data and the stack.
 Permit a program and/or its data to be put into
different areas of memory each time the program is
executed.
BLM 3622- Microprocessors 27
8086/8 In Circuit (1)
 8086/8 microprocessors need support
circuits in a microcomputer system
 8086/8 multiplex the address and data
buses on the same pins
 This saves pins but at a price:
 Demultiplexing logic is needed to build up
separate address and data buses to interface
with RAMs and ROMs
BLM 3622- Microprocessors 28
MAXIMUM
MODE
MINIMUM
MODE
1 40
20 21
8086
GND
AD14
AD13
AD12
AD11
AD10
AD9
AD8
AD7
AD6
AD5
AD4
AD3
AD2
AD1
AD0
NMI
INTR
CLK
GND
Vcc
AD15
A16,S3
A17,S4
A18,S5
A19,S6
/BHE,S7
MN,/MX
/RD
/RQ,/GT0
/LOCK
/S2
/S1
/S0
QS0
QS1
/TEST
READY
RESET
/RQ,/GT1
HOLD
/WR
IO/M
DT/R
/DEN
ALE
/INTA
HLDA
BLM 3622- Microprocessors 29
Pin Connections
AD15-AD0: (I/O-3)
The 8086 address/data bus lines compose
the upper multiplexed address/data bus on
8086. These lines contains address bits
whenever ALE is logic 1. These pins enter
a high-impedance state whenever a hold
acknowledge occurs.
BLM 3622- Microprocessors 30
A19/S6-A16/S3: (O-3)
The address/status bus bits are multiplexed to
provide address signals A19-A16 and also status
bits S6-S3. The pins also attain a high-impedance
state during the hold acknowledge. S4 and S3
show which segment is accessed during the
current bus cycle.
Pin Connections
BLM 3622- Microprocessors 31
Pin Connections
S4 S3 Function
0 0 Extra segment
0 1 Stack segment
1 0 Code or no segment
1 1 Data segment
BLM 3622- Microprocessors 32
Pin Connections
: (O-3)
Whenever the read signal is logic 0, the data bus
is receptive to data from the memory or I/O
devices connected to system.
READY: (I)
This input is controlled to insert wait states into
the timing of the microprocessor.
READY=0: P enters into wait states and remain idle
READY=1: It has no effect on operation of P
RD
BLM 3622- Microprocessors 33
Pin Connections
: (I)
The test pin is an input that is tested by the WAIT
instruction.
NMI: (I)
The non-maskable interrupt input is similar to
INTR except that the NMI does not check to see
if IF flag bit is a logic 1. This interrupt input uses
interrupt vector 2.
TEST
BLM 3622- Microprocessors 34
Pin Connections
7/ SBHE
RESET: (I)
The reset input causes the P to reset itself if this pin
is held high for a minimum four clocking periods.
It begins executing instructions at memory location
FFFF0H and disables future interrupts by clearing
the IF flag bit.
: (I)
Minimum/maximum mode pin select.
: (O-3)
BHE pin is used to enable the most sig. data bus
bits (D15-D8) during a read or write operation.
MXMN /
BLM 3622- Microprocessors 35
Minimum mode Pins
IOM / : (O-3)
The pin selects memory or I/O. This pin
indicates that the microprocessor address bus
contains either a memory address or an I/O
port address.
: (O-3)
This line indicates that 8086 is outputting data
to a memory or I/O device.
WR
BLM 3622- Microprocessors 36
Minimum mode Pins
: (O-3)
The interrupt acknowledge signal is a response to
the INTR input pin. This pin is normally used to
gate the interrupt vector number onto the data
bus in response to an interrupt request.
: (O)
Address latch enable shows that the 8086
address/data bus contains address information.
This address can be a memory address or an
I/O port number.
INTA
ALE
BLM 3622- Microprocessors 37
Minimum mode Pins
:(0-3)
The data transmit/receive signal shows that the
microprocessor data bus is transmitting or
receiving data.
: (O-3)
Data bus enable activates external data bus
buffers.
RDT /
DEN
BLM 3622- Microprocessors 38
HOLD : (I)
The hold input requests a direct memory access
(DMA). If the HOLD signal is logic 1, the
microprocessor stops executing software and places
its address, data and control bus at the high-
impedance state.
HLDA : (O)
Hold acknowledge indicates that the 8086
microprocessor entered the hold state.
Minimum mode Pins
BLM 3622- Microprocessors 39
Maximum mode Pins
 In order to achieve maximum mode for use
with external coprocessors or
multiprocessing applications.
(O)
The status bits indicate the function of
the current bus cycle. These signals
are normally decoded by the 8288 bus
controller.
:0and,1,2 SSS
BLM 3622- Microprocessors 40
Satatus bits
S2 S1 S0 Function
0 0 0 Interrupt acknowledge
0 0 1 I/O read
0 1 0 I/O write
0 1 1 Halt
1 0 0 Opcode fetch
1 0 1 Memory read
1 1 0 Memory write
1 1 1 Passive
:0and,1,2 SSS
BLM 3622- Microprocessors 41
MAXIMUM
MODE
MINIMUM
MODE
1 40
20 21
8088
GND
A14
A13
A12
A11
A10
A9
A8
AD7
AD6
AD5
AD4
AD3
AD2
AD1
AD0
NMI
INTR
CLK
GND
Vcc
A15
A16,S3
A17,S4
A18,S5
A19,S6
MN,/MX
/RD
/RQ,/GT0
/LOCK
/S2
/S1
/S0
QS0
QS1
/TEST
READY
RESET
/RQ,/GT1
HOLD
/WR
IO/M
DT/R
/DEN
ALE
/INTA
HLDA
high /SS0
MAXIMUM
MODE
MINIMUM
MODE
1 40
20 21
8086
GND
AD14
AD13
AD12
AD11
AD10
AD9
AD8
AD7
AD6
AD5
AD4
AD3
AD2
AD1
AD0
NMI
INTR
CLK
GND
Vcc
AD15
A16,S3
A17,S4
A18,S5
A19,S6
/BHE,S7
MN,/MX
/RD
/RQ,/GT0
/LOCK
/S2
/S1
/S0
QS0
QS1
/TEST
READY
RESET
/RQ,/GT1
HOLD
/WR
IO/M
DT/R
/DEN
ALE
/INTA
HLDA
BLM 3622- Microprocessors 42
8086/8 In Circuit (2)
 In Maximum Mode the 8086/8 needs at
least the following: 8288 Bus Controller,
8284A Clock Generator, 74HC373s and
74HC245s
 With the aid of these devices the 8086
begins to look like the ideal
microprocessor we looked at earlier
74LS245
x2
8284A
Clock
Generator
RDY
Vcc
8086
CPU
CLK
READY
RESET
MN/MX#
S0#
S1#
S2#
8288
Bus
Controller
MRDC#
MWTC#
AMWC#
IORC#
IOWC#
AIOWC#
INTA#
CLK
74LS373
x3
ADDR/DATA
LE
OE#
ALE
DEN
DT/R#
BHE#
AD15:AD0
A19:A16
74LS245
x2
EN#
DIR
D15:D0
A19:A0,
BHE#
ADDR/Data
INTR
i8086 Circuit - Maximum Mode
BLM 3622- Microprocessors 43
BLM 3622- Microprocessors 44
8086/8 Maximum Mode
 In maximum mode, the 8288 uses a set of
status signals (S0, S1, S2) to rebuild the
normal bus control signals of the
microprocessor
 MRDC#, MWTC#, IORC#, IOWC# etc
 Equivalent to MEMR# etc
 Look at some special signals briefly
BLM 3622- Microprocessors 45
74LS373 Octal Transparent Latch
with 3-state Outputs
BLM 3622- Microprocessors 46
74LS245 Octal Bus Tranceiver
BLM 3622- Microprocessors 47
RESET# Signal
 The Active low RESET# signal puts the 8086/8
into a defined state
 Clears the flags register, segment registers etc.
 Sets the effective program address to 0FFFF0h
(CS=0F000h, IP=0FFF0h)
 8086/8 Programs always start at FFFF0H after
Reset has been asserted and removed
 Continues into latest generation CPUs
BLM 3622- Microprocessors 48
BHE# Signal (8086 Only)
 The 8086 processor can address memory a
byte at a time
 Its data bus is 16-bits wide
 It uses the BHE# signal and A0 (sometimes
called BLE#) to address bytes using its 16-
bit bus
BLM 3622- Microprocessors 49
Use of BHE#/A0(BLE#)
FFFFF
FFFFD
FFFFB
FFFF9
00005
00003
00001
ODD Addresses (8086)
FFFFE
FFFFC
FFFFA
FFFF8
00004
00002
00000
EVEN Addresses (8086)
A19..A1 A19..A1
D15:D8 D7:D0
FFFFF
FFFFE
FFFFD
FFFFC
00002
00001
00000
Byte-Wide addressing
(8088)
BHE# A0/BLE#
BLM 3622- Microprocessors 50
Use of BHE#/BLE#
BHE# A0/BLE# Selection
0 0 Whole word (16-bits)
0 1 High byte to/from odd
address
1 0 Low byte to/from even
address
1 1 No selection
BLM 3622- Microprocessors 51
ALE and Address/data Bus
Multiplexing
 8086/8 Multiplexes the Address and Data
signals onto the same set of pins
 Need off-chip logic to separate the signals
 Transparent latches designed just for
address demultiplexing
BLM 3622- Microprocessors 52
ALE and 74HC373 Transparent Latch
Address
Time
Clock
Address/
Data
Bus
Data Time
ALE
Output of
74HC373
Microcomputer AddressBus
LE
OE#
ALE
Address/
Data Bus
System Address Bus
In0:In7 Q0:Q7
74HC373 or equivalent
TriState Control signal,
OE#, shown connected to
GND for simplicity
BLM 3622- Microprocessors 53
Use of ALE (Address Latch Enable)
 ALE is used with an external latch
(74HC373) to demultiplex the address and
data lines
 74HC373 is transparent when its LE input
(connected to ALE) is high
 When ALE goes low, the ‘373 holds the last
data until ALE goes high again
BLM 3622- Microprocessors 54
8288 Bus Controller and Bus
Transceivers
8288
Bus Controller
DIR
DEN#
DT/R#
74HC245
EN#
74HC245
EN#
DIR
DIR
CPU [D15:D8]
CPU [D7:D0]
Buffered [D15:D8]
Buffered [D7:D0]
ToMemoryandI/O
Systems
8288 Bus Controller also
generates Direction and
Enable signals for Bi-
Directional Transeivers
Supports Buffering the
System Data Bus
BLM 3622- Microprocessors 55
8086 Read Cycle
T1 T2 T3 T4
Address Status
001 or 101
Address Valid Data
float float
Valid Address
CLK
/S0, /S1, /S2
A16..A19, /BHE
ALE
AD0..AD15
A0..A19
S3..S6
DT/R
DEN
/MRDC or /IORC
BLM 3622- Microprocessors 56
8086 Write CycleT1 T2 T3 T4
Address Status
010 or 110
Address Valid Data
Valid Address
CLK
/S0, /S1, /S2
A16..A19, /BHE
ALE
AD0..AD15
A0..A19
S3..S6
DT/R
DEN
/MWTC or /IOWC
BLM 3622- Microprocessors 57
8086 Read Cycle (1 Wait State)
T1 T2 T3 Tw
Address Status
001 or 101
Address Valid Data
float float
Valid Address
CLK
/S0, /S1, /S2
A16..A19, /BHE
ALE
AD0..AD15
A0..A19
S3..S6
DT/R
DEN
/MRDC or /IORC
T4
8284 RDY
READY
BLM 3622- Microprocessors 58
8086/8088 Summary
 First Generation (introduced June 1978)
 One of the first 16-bit processors on the
market
 16-bit internal registers
 16/8-bit external data bus
 20-bit address bus (1MB addressable)
 Used in 1st generation IBM PCs (1981)
BLM 3622- Microprocessors 59
80186/80188
 Evolution of 8086/8088 80186/80188
 Increased instruction set
 On-chip system components (Clock
generator, DMA, Interrupt, Timers…)
 Unsuccessful in PCs
 Popular in embedded systems…
BLM 3622- Microprocessors 60
2nd Generation Processor 286
 P2 (286) = 2nd Generation Processor
 Introduced in 1981
 CPU behind IBM AT
 Throughput of original IBM AT (6MHz) was about
500% of IBM PC (4.77MHz)
 Level of integration: 134k transistors (vs 29k in
8086)
 Still a 16-bit processor…
 Available in higher clock frequencies: 25MHz
BLM 3622- Microprocessors 61
2nd Generation Processors 286
 Fully backwards compatible to 8086
80286 runs 8086 software without modification
 Improved instruction execution
Average instruction takes 4.5 cycles vs. 12 cycles (8086)
 Improved instruction set
 Real mode and Protected Mode
Multitasking-support. What happens in one area of memory doesn’t affect
other programs. Protected mode supported by Windows 3.0.
 16MB addressable physical memory
 On-chip MMU (1GB virtual memory)
 Non-multiplexed address-bus and data-bus
BLM 3622- Microprocessors 62
Improving Computer Performance
 We’ve seen how 16-bit computer
technology based on the 8086 and
80286 processors developed
 These computers are not powerful
enough for today’s applications
 How do you improve the performance
of your computer?
 Let’s start with the CPU
BLM 3622- Microprocessors 63
CPU Performance (1)
 MOST OBVIOUS: Processor Clock Frequency
 Increased frequency – increased execution
rate
 State of the Art: >2GHz (Jan 2002)
 Memory and I/O access times can be
performance bottleneck – unless you take
some special measures
BLM 3622- Microprocessors 64
CPU Performance (2)
 ALU register width
 A processor is an n-bit processor, where N represents
the precision of the ALU – N can be 4, 8, 16, 32, or 64
 The wider the registers – the more processing per
clock
 Data bus width
 The wider the data bus the faster we can transfer data
 Since the memory and I/O device access times are
finite, the more bits transferred per cycle the better
BLM 3622- Microprocessors 65
CPU Performance (3)
 Address bus width
 Increased address width doesn’t provide a
‘speed’ increase as such
 CPU can directly address more memory
 PCs use big programs, which would not fit in a
smaller address space
 Overcoming small address space takes time
 Impacts on overall system performance
BLM 3622- Microprocessors 66
3rd Generation Processor 386
 P3 (386) = 3rd Generation Processor
 Introduced: 10/1985
 Full 32-bit processor
(32-bit registers. 32-bit internal and external databus. 32-bit address bus)
 275k transistors. CMOS. 132-pin PGA package.
(Supply current Icc=400mA. Roughly the same as 8086 !)
 Clock speeds: 16-33MHz
 P3 processors were far ahead of their time:
It took 10 years before 32-bit operating systems became mainstream!
 First 386 PCs early 1987
(COMPAQ)
BLM 3622- Microprocessors 67
3rd Generation Processor 386
 Modes of operation:
 Real. Protected. Virtual Real.
 Protected mode of 386 is fully compatible
with 286
Protected mode=native mode of operation. Chips are designed for
advanced operating systems such as Windows NT
 New virtual real mode
Processor can run with hardware memory protection while simulating
the 8086’s real-mode operation. Multiple copies of e.g. DOS can run
simultaneously, each in a protected area of memory. If a program in
one memory area crashes, the rest of the system is protected.
Intel 32-bit Architecture:IA-32
Addressing Unit
(AU)
Bus Unit (BU)
Instruction Unit (IU)
Prefetch Queue
Registers
Control
Unit (CU)
ALU
Execution Unit (EU)
Address
Data
The 80386 includes a Bus Interface Unit for reading and providing data and instructions,
witha Prefetch Queue, an IU for controlling the EU with its registers, as well as an AU for
generating memory and I/O addresses
BLM 3622- Microprocessors 68
BLM 3622- Microprocessors 69
80386 Features
 32-bit general and offset registers
 16-byte prefetch queue
 Memory management unit with segmentation unit and
paging unit
 32-bit address and data bus
 4-Gbyte physical address space
 64-Tbyte virtual address space
 i387 numerical coprocessor
 Implementation of real, protected and virtual 8086 modes
BLM 3622- Microprocessors 70
80386 Operating Modes
 Protected Mode for Multitasking support
 Real Mode (native 8086 mode)
 Processor powers up in Real Mode
 System Management Mode
 Power management or system security
 Processor switches to separate address space, while
saving the entire context of the currently running
program or task
80386 Register Set
EIP IP
31 16 15 0
FLAG
31 16 15 E0
EFLAG
AH
31 16 15 0
AL
78
BH BL
EAX
EBX
CH CLECX
DH DLEDX
SIESI
DIEDI
BPEBP
SPESP
Instruction Pointer EFLAG Register
General-Purpose Registers
Segment Registers
15 0
CS
SS
DS
ES
FS
GS
BLM 3622- Microprocessors 71
BLM 3622- Microprocessors 72
80386 Prefetch Queue
Execution Unit Bus Interface Unit
16-byte deep
Instruction Queue
32-bit Data
Bus
Fetching from
on-chip Queue
is fast
Reading from off-chip
Memory is slow
BLM 3622- Microprocessors 73
80386 Prefetch Queue
 80386 Prefetch queue is 16-bytes deep
1. The instruction fetch can read from the
prefetch queue faster than from memory
2. The prefetcher can do some work while
the execution unit is doing other tasks in
parallel
BLM 3622- Microprocessors 74
Coprocessor: i387
 The hardware implementation of floating
point processing in the i387 means floating
point operations run at much higher speed.
 The i386 can execute all mathematical
expressions using software emulation of the
i387.
BLM 3622- Microprocessors 75
80386: Classic CISC Processor
 CISC = Complex Instruction Set Computer
 Complex instructions
 ...but code-size efficient
 Micro-encoding of the machine instructions
 Extensive addressing capabilities for
memory operations
 Few, but very useful CPU registers
BLM 3622- Microprocessors 76
80386 Execution SequenceBusInterface
PrefetchQueue
DecodingUnit
Control Unit
Microcode
ROM
Microcode
Queue
ExecutionUnit
Register
Register
Register
Register
ALU
Coprocessor
CISC Processor
In a microprogrammed CISC the processor fetches the instructions via the bus interface into a
prefetch queue, which transfers them to a decoding unit. The decoding unit breaks the machine
instruction into many elementary micro-instructions and apples them to a microcode queue. The
micro-instructions are transferred from the microcode queue to the control and execution unit which
drives the ALU and the registers
BLM 3622- Microprocessors 77
80386 Complex Instructions
 CISC drawback: Most instructions are so
complicated, they have to be broken into a
sequence of micro-steps
 These steps are called Micro-Code
 Stored in a ROM in the processor core
 Micro-code ROM: Access-time and size...
 They require extra ROM and decode logic
BLM 3622- Microprocessors 78
RISC: Less is More
 RISC = Reduced Instruction Set Computer
 20/80 Rule: 20% of the instructions take up
80% of the time
 Sometimes executing a sequence of simple
instructions runs quicker than a single
complex machine instruction that has the
same effect
BLM 3622- Microprocessors 79
RISC Ideas (1)
 Reduce the instruction set to simplify the
decoding
 Smaller Instruction Set -> Simpler Logic ->
Smaller Logic -> Faster Execution
 Eliminate microcode – hardwire all
instruction execution
 Pipeline instruction decoding and executing
– do more operations in parallel
BLM 3622- Microprocessors 80
RISC Ideas (2)
 Load/Store Architecture – only the load and
store instructions can access memory
 All other instructions work with the processor
internal registers
 This is necessary for single-cycle execution – the
execution unit can’t wait for data to be
read/written
BLM 3622- Microprocessors 81
RISC Ideas (3)
 Increase number of internal register due to
Load/Store Architecture
 Also registers are more general purpose and less
associated with specific functions
 Compiler designed along with the RISC processor
deesign. Compiler has to be aware of the
processor architecture to produce code that can
be executed efficiently
BLM 3622- Microprocessors 82
Instruction Pipelining - Operations
Can Be Carried Out in Parallel
 Read the instruction from memory or the
prefetch queue (instruction fetch phase)
 Decode the instruction (decode phase)
 Where necessary, fetch the operands
(operand fetch phase)
 Execute the instruction (execute phase)
 Write back the result (write-back phase)
Pipelined Execution
InstructionFetch
Decode
OperandFetch
Execution
Write-back
Instruction
k
Instruction
k-1
Instruction
k-2
Instruction
k-3
Instruction
k-4
Instruction
k+1
Instruction
k
Instruction
k-1
Instruction
k-2
Instruction
k-3
Instruction
k+2
Instruction
k+1
Instruction
k
Instruction
k-1
Instruction
k-2
Instruction
k+4
Instruction
k+3
Instruction
k+2
Instruction
k+1
Instruction
k
Instruction
k+3
Instruction
k+2
Instruction
k+1
Instruction
k
Instruction
k-1
Cycle n
Cycle n+1
Cycle n+2
Cycle n+3
Cycle n+4
Result k-4
Result k-3
Result k-2
Result k-1
Result k
BLM 3622- Microprocessors 83
BLM 3622- Microprocessors 84
Superscalar Architecture:
 The processor may have more than one
pipeline (Pentium…)
 Where possible each pipeline works
independently
 Not always possible
 May achieve average completed execution
of more more than one instruction per clock
cycle
BLM 3622- Microprocessors 85
Pipelining problems
 More logic per pipeline stage – same
resource can’t be used twice
 E.g. can’t re-use ALU for computing implied
addresses
 Synchronisation Problems
 Delayed Jump/Branch
 Data and Register dependency, e.g.
ADD reg1, reg2, reg7
AND reg6, reg1, reg3
BLM 3622- Microprocessors 86
Getting the Benefits of Pipelining
 Simplified Instruction decoding
 Simpler, faster logic
 On-chip cache memories
 Local memory on-chip to avoid memory
access bottlenecks
 Floating Point pipeline for FP coprocessor
 Speculative Execution to get around pipeline
flushes
BLM 3622- Microprocessors 87
Software Implications of RISCs
 Optimising Compiler must know how
pipeline works
(Compiler must be aware of pipeline delays, and
insert NOPs if need be)
 Lower code density in RISC because
instructions are less efficient
 PowerPC code takes up to 30% more code
to do the same tasks as an x86 CPU
 more memory accesses, potential
performance impact...
BLM 3622- Microprocessors 88
80486: IA-32 with RISC elements
 Introduced 04/91
 Greatly improved 80386 CPU
 Hard-wired implementation of frequently used instructions
(as in RISCs). On average 2 clock cycles/instruction.
 5 stage instruction pipeline
 Internal L1 Cache Memory (8kB) + cache controller
 On-chip Floating Point coprocessor (FPU)
 Longer Prefetch Queue (32-bytes as opposed to 16 on the
80386)
 Higher frequency operation: up to 120MHz
 >1.2M transistors, 0.8m CMOS. 168-pin PGA.
BLM 3622- Microprocessors 89
80486 Block Diagram
BusInterface
Cache
(8K
bytes)Prefetcher
(32-byte
queue)
Paging
Unit
Decoding
Unit
Segmentation
Unit
Control
Unit
Register
and ALU
Floating
Point Unit
A31-A0
D31-D0
Control and
Status Signals
i486 CPU
BLM 3622- Microprocessors 90
80486 Pipeline
InstructionFetch
Decode1
(memoryaccess)
Decode2
Execution
Write-back
ADD eax,
mem32
Decode ADD,
fetch mem32
Decode ADD
(continued)
Write result
into eax
Add eax and
mem32
Cycle n
Cycle n+1
Cycle n+2
Cycle n+3
Cycle n+4

More Related Content

8086 cpu 1

  • 1. BLM 3622- Microprocessors 1 A Historical Background • The idea of calculating with a machine dates to before 500 B.C. when the Babylonians invented the abacus, the first mechanical calculator.
  • 2. BLM 3622- Microprocessors 2 Blaise Pascal (1623-1662)  The abacus was not improved until 1642, when Blaise Pascal invented a calculator constructed a gear and wheels.
  • 3. BLM 3622- Microprocessors 3 Charles Babbage  One early pioneer of mechanical computing machinery was Charles Babbage and produce a programmable calculating machine in 1823. He create the “Analytical Engine”. This machine was a mechanical computer that stored 1000 20-digit decimal numbers and variable program that could modify the function of the machine.
  • 4. BLM 3622- Microprocessors 4  In 1889, Herman Hollerith developed the punched card for storing data and also developed a mechanical machine-driven by one of the new electric motors. He was the former of IBM Corporation. Herman Hollerith
  • 5. BLM 3622- Microprocessors 5 Konrad Zuse  The first electronic calculating machine invented in 1941 by Konrad Zuse. He had released the first programmable computer designed to solve complex engineering equations. It was also the first machine to work on the binary system, as opposed to the more familiar decimal system. His calculating computer was used in aircraft and missile design during World War II for the German war effort.
  • 6. BLM 3622- Microprocessors 6 Binary System
  • 7. BLM 3622- Microprocessors 7  The first truly electronic computer was places into operation in 1943 to break secret German military codes. The first electronic computer system, which used vacuum tubes, was invented by Alan Turing who is a British mathematician . Turing called this machine Colossus, most likely because its size. A problem with Colossus was that although its design allowed it to break secret German military codes generated by mechanical Enigma machine, it could not solve other problems. Colossus was not programmable- it was a fixed program computer system. Alan Turing
  • 8. BLM 3622- Microprocessors 8 Sample Turing Machine
  • 9. BLM 3622- Microprocessors 9  The first general purpose programmable computer system was developed in 1946 and called ENIAC. The ENIAC was a huge machine(30 tons) and performed about 100 000 operation per second. ENIAC
  • 10. BLM 3622- Microprocessors 10 John von Neumann  In 1945, Von Neumann contributed a new understanding of how practical fast computers should be organized and built; these ideas, often referred to as the stored- program technique, became fundamental for future generations of high-speed digital computers and were universally adopted.
  • 11. BLM 3622- Microprocessors 11 Transistor
  • 12. BLM 3622- Microprocessors 12 INTEL 4004  The development of transistor in 1948, followed by the invention of the integrated circuits in 1958. In 1971, the first microprocessor Intel 4004 was developed. 4004 was a 4-bit microprocessor and instruction set contains 45 instruction. It performed about 50 000 instruction per second.
  • 13. BLM 3622- Microprocessors 13  In 1978, Intel released the 8086 microprocessor which was 16-bit microprocessor and performed 2.5 million instruction per second.  This microprocessor were called CISC(Complex Instruction Set Computers) because of the number and complexity of instructions.  The popularity of Intel family was ensured in 1981 when IBM Corp. decided to use 8088/8086 microprocessors in its personal computers. Intel 8086
  • 14. BLM 3622- Microprocessors 14 Intel 8086/8088 Microprocessors  Intel 8086 and 8088 Microprocessors are the basis of all IBM-PC compatible computers (8086 introduced in 1978, first IBM-PC released in 1981)  All Intel, AMD and other advanced microprocessors are based on and are compatible with the original 8086/8  At Power Up and Reset time, Pentiums, Athlons etc all look like 8086 processors
  • 15. BLM 3622- Microprocessors 15 Intel 8086/8088 Microprocessors  Intel 8086 is a 16-bit microprocessor  16-bit data registers  16 or 8 bit external data bus  Some techniques to optimise the CPU performance when it’s executing programs  Segment: Offset memory model  Little-Endian Data Format
  • 16. BLM 3622- Microprocessors 16 8086/8088 (1)  Original IBM PC used 8088 micrprocessor  8088 is similar to the 8086 microprocessor but it has an external 8-bit bus & only 4-deep queue  For cost reduction reasons  We can consider 8086 and 8088 together  PC clones often used 8086 for better performance  8-bit bus reduces performance, but meant cheaper computers
  • 17. BLM 3622- Microprocessors 17 8086/8088 (2)  Remember the Fetch-Decode-Execute cycle?  Fetching from EXTERNAL MEMORY is SLOW  The 8086/8 used an instruction queue to speed up performance  While the processor is decoding and executing an instruction, its bus interface can be reading new instructions, since at that time the bus is not actually in use
  • 18. BLM 3622- Microprocessors 18 8086/8088 Functional Units Execution Unit (EU) Bus Interface Unit(BIU) Fetches Opcodes, Reads Operands, Writes Data 8086/8088 MPU
  • 19. BLM 3622- Microprocessors 19 8086/8088 (3)  8086/8088 consists of two internal units  The execution unit (EU) - executes the instructions  The bus interface unit (BIU) - fetches instructions, reads operands and writes results  The 8086 has a 6-byte prefetch queue  The 8088 has a 4-byte prefetch queue
  • 20. BLM 3622- Microprocessors 20 8086/8088 Internal Organisation Temporary Registers ALU Flags EU Control AH AL BH BL CH CL DH DL SP BP DI BI CS DS SS ES IO Internal Communications Registers SUMMATION Address Bus 20 bits Data Bus Bus Control 1 2 3 4 Instruction Queue 8088 Bus EU BIU
  • 21. BIU Elements  Instruction Queue: the next instructions or data can be fetched from memory while the processor is executing the current instruction  The memory interface is slower than the processor execution time so this speeds up overall performance  Segment Registers:  CS, DS, SS and ES are 16-bit registers  Used with the 16-bit Base registers to generate the 20-bit address  Allow the 8086/8088 to address 1Mb of memory  Changed under program control to point to different segments as a program executes  Instruction Pointer (IP) contains the Offset Address of the next instruction, the distance in bytes from the address given by the current CS register BLM 3622- Microprocessors 21
  • 22. BLM 3622- Microprocessors 22 8086/8088 20-bit Addresses 16-bit Segnment Base Address 0000 16-bit Offset Address 20-bit Physical Address CS IP
  • 23. BLM 3622- Microprocessors 23 Exercise: 20-bit Addressing Memory 00000h 123A0h 157BBh 2239Fh 223A0h 341Bh Range of Code Segment CS=123Ah IP=341Bh
  • 24. BLM 3622- Microprocessors 24 Exercise: 20-bit Addressing 1. CS contains 0A820h,IP contains 0CE24h. What is the resulting physical address? 2. CS contains 0B500h, IP contains 0024h. What is the resulting physical address?
  • 25. BLM 3622- Microprocessors 25 Segment Registers The utilization of the segment registers essentially divides the memory space into overlapping segments, with each segment being 64K bytes long and at an address that is divisible by 16.
  • 26. BLM 3622- Microprocessors 26 The advantage of using segment registers  Allow the memory capacity to be 1 M Byte even though the addresses associated with the individual instructions are only 16 bits wide.  Allow the instruction, data or stack portion of a program to be more than 64K Bytes long by allowing more than one code, data or stack segment.  Facilitate the use of separate memory areas for a program, its data and the stack.  Permit a program and/or its data to be put into different areas of memory each time the program is executed.
  • 27. BLM 3622- Microprocessors 27 8086/8 In Circuit (1)  8086/8 microprocessors need support circuits in a microcomputer system  8086/8 multiplex the address and data buses on the same pins  This saves pins but at a price:  Demultiplexing logic is needed to build up separate address and data buses to interface with RAMs and ROMs
  • 28. BLM 3622- Microprocessors 28 MAXIMUM MODE MINIMUM MODE 1 40 20 21 8086 GND AD14 AD13 AD12 AD11 AD10 AD9 AD8 AD7 AD6 AD5 AD4 AD3 AD2 AD1 AD0 NMI INTR CLK GND Vcc AD15 A16,S3 A17,S4 A18,S5 A19,S6 /BHE,S7 MN,/MX /RD /RQ,/GT0 /LOCK /S2 /S1 /S0 QS0 QS1 /TEST READY RESET /RQ,/GT1 HOLD /WR IO/M DT/R /DEN ALE /INTA HLDA
  • 29. BLM 3622- Microprocessors 29 Pin Connections AD15-AD0: (I/O-3) The 8086 address/data bus lines compose the upper multiplexed address/data bus on 8086. These lines contains address bits whenever ALE is logic 1. These pins enter a high-impedance state whenever a hold acknowledge occurs.
  • 30. BLM 3622- Microprocessors 30 A19/S6-A16/S3: (O-3) The address/status bus bits are multiplexed to provide address signals A19-A16 and also status bits S6-S3. The pins also attain a high-impedance state during the hold acknowledge. S4 and S3 show which segment is accessed during the current bus cycle. Pin Connections
  • 31. BLM 3622- Microprocessors 31 Pin Connections S4 S3 Function 0 0 Extra segment 0 1 Stack segment 1 0 Code or no segment 1 1 Data segment
  • 32. BLM 3622- Microprocessors 32 Pin Connections : (O-3) Whenever the read signal is logic 0, the data bus is receptive to data from the memory or I/O devices connected to system. READY: (I) This input is controlled to insert wait states into the timing of the microprocessor. READY=0: P enters into wait states and remain idle READY=1: It has no effect on operation of P RD
  • 33. BLM 3622- Microprocessors 33 Pin Connections : (I) The test pin is an input that is tested by the WAIT instruction. NMI: (I) The non-maskable interrupt input is similar to INTR except that the NMI does not check to see if IF flag bit is a logic 1. This interrupt input uses interrupt vector 2. TEST
  • 34. BLM 3622- Microprocessors 34 Pin Connections 7/ SBHE RESET: (I) The reset input causes the P to reset itself if this pin is held high for a minimum four clocking periods. It begins executing instructions at memory location FFFF0H and disables future interrupts by clearing the IF flag bit. : (I) Minimum/maximum mode pin select. : (O-3) BHE pin is used to enable the most sig. data bus bits (D15-D8) during a read or write operation. MXMN /
  • 35. BLM 3622- Microprocessors 35 Minimum mode Pins IOM / : (O-3) The pin selects memory or I/O. This pin indicates that the microprocessor address bus contains either a memory address or an I/O port address. : (O-3) This line indicates that 8086 is outputting data to a memory or I/O device. WR
  • 36. BLM 3622- Microprocessors 36 Minimum mode Pins : (O-3) The interrupt acknowledge signal is a response to the INTR input pin. This pin is normally used to gate the interrupt vector number onto the data bus in response to an interrupt request. : (O) Address latch enable shows that the 8086 address/data bus contains address information. This address can be a memory address or an I/O port number. INTA ALE
  • 37. BLM 3622- Microprocessors 37 Minimum mode Pins :(0-3) The data transmit/receive signal shows that the microprocessor data bus is transmitting or receiving data. : (O-3) Data bus enable activates external data bus buffers. RDT / DEN
  • 38. BLM 3622- Microprocessors 38 HOLD : (I) The hold input requests a direct memory access (DMA). If the HOLD signal is logic 1, the microprocessor stops executing software and places its address, data and control bus at the high- impedance state. HLDA : (O) Hold acknowledge indicates that the 8086 microprocessor entered the hold state. Minimum mode Pins
  • 39. BLM 3622- Microprocessors 39 Maximum mode Pins  In order to achieve maximum mode for use with external coprocessors or multiprocessing applications. (O) The status bits indicate the function of the current bus cycle. These signals are normally decoded by the 8288 bus controller. :0and,1,2 SSS
  • 40. BLM 3622- Microprocessors 40 Satatus bits S2 S1 S0 Function 0 0 0 Interrupt acknowledge 0 0 1 I/O read 0 1 0 I/O write 0 1 1 Halt 1 0 0 Opcode fetch 1 0 1 Memory read 1 1 0 Memory write 1 1 1 Passive :0and,1,2 SSS
  • 41. BLM 3622- Microprocessors 41 MAXIMUM MODE MINIMUM MODE 1 40 20 21 8088 GND A14 A13 A12 A11 A10 A9 A8 AD7 AD6 AD5 AD4 AD3 AD2 AD1 AD0 NMI INTR CLK GND Vcc A15 A16,S3 A17,S4 A18,S5 A19,S6 MN,/MX /RD /RQ,/GT0 /LOCK /S2 /S1 /S0 QS0 QS1 /TEST READY RESET /RQ,/GT1 HOLD /WR IO/M DT/R /DEN ALE /INTA HLDA high /SS0 MAXIMUM MODE MINIMUM MODE 1 40 20 21 8086 GND AD14 AD13 AD12 AD11 AD10 AD9 AD8 AD7 AD6 AD5 AD4 AD3 AD2 AD1 AD0 NMI INTR CLK GND Vcc AD15 A16,S3 A17,S4 A18,S5 A19,S6 /BHE,S7 MN,/MX /RD /RQ,/GT0 /LOCK /S2 /S1 /S0 QS0 QS1 /TEST READY RESET /RQ,/GT1 HOLD /WR IO/M DT/R /DEN ALE /INTA HLDA
  • 42. BLM 3622- Microprocessors 42 8086/8 In Circuit (2)  In Maximum Mode the 8086/8 needs at least the following: 8288 Bus Controller, 8284A Clock Generator, 74HC373s and 74HC245s  With the aid of these devices the 8086 begins to look like the ideal microprocessor we looked at earlier
  • 44. BLM 3622- Microprocessors 44 8086/8 Maximum Mode  In maximum mode, the 8288 uses a set of status signals (S0, S1, S2) to rebuild the normal bus control signals of the microprocessor  MRDC#, MWTC#, IORC#, IOWC# etc  Equivalent to MEMR# etc  Look at some special signals briefly
  • 45. BLM 3622- Microprocessors 45 74LS373 Octal Transparent Latch with 3-state Outputs
  • 46. BLM 3622- Microprocessors 46 74LS245 Octal Bus Tranceiver
  • 47. BLM 3622- Microprocessors 47 RESET# Signal  The Active low RESET# signal puts the 8086/8 into a defined state  Clears the flags register, segment registers etc.  Sets the effective program address to 0FFFF0h (CS=0F000h, IP=0FFF0h)  8086/8 Programs always start at FFFF0H after Reset has been asserted and removed  Continues into latest generation CPUs
  • 48. BLM 3622- Microprocessors 48 BHE# Signal (8086 Only)  The 8086 processor can address memory a byte at a time  Its data bus is 16-bits wide  It uses the BHE# signal and A0 (sometimes called BLE#) to address bytes using its 16- bit bus
  • 49. BLM 3622- Microprocessors 49 Use of BHE#/A0(BLE#) FFFFF FFFFD FFFFB FFFF9 00005 00003 00001 ODD Addresses (8086) FFFFE FFFFC FFFFA FFFF8 00004 00002 00000 EVEN Addresses (8086) A19..A1 A19..A1 D15:D8 D7:D0 FFFFF FFFFE FFFFD FFFFC 00002 00001 00000 Byte-Wide addressing (8088) BHE# A0/BLE#
  • 50. BLM 3622- Microprocessors 50 Use of BHE#/BLE# BHE# A0/BLE# Selection 0 0 Whole word (16-bits) 0 1 High byte to/from odd address 1 0 Low byte to/from even address 1 1 No selection
  • 51. BLM 3622- Microprocessors 51 ALE and Address/data Bus Multiplexing  8086/8 Multiplexes the Address and Data signals onto the same set of pins  Need off-chip logic to separate the signals  Transparent latches designed just for address demultiplexing
  • 52. BLM 3622- Microprocessors 52 ALE and 74HC373 Transparent Latch Address Time Clock Address/ Data Bus Data Time ALE Output of 74HC373 Microcomputer AddressBus LE OE# ALE Address/ Data Bus System Address Bus In0:In7 Q0:Q7 74HC373 or equivalent TriState Control signal, OE#, shown connected to GND for simplicity
  • 53. BLM 3622- Microprocessors 53 Use of ALE (Address Latch Enable)  ALE is used with an external latch (74HC373) to demultiplex the address and data lines  74HC373 is transparent when its LE input (connected to ALE) is high  When ALE goes low, the ‘373 holds the last data until ALE goes high again
  • 54. BLM 3622- Microprocessors 54 8288 Bus Controller and Bus Transceivers 8288 Bus Controller DIR DEN# DT/R# 74HC245 EN# 74HC245 EN# DIR DIR CPU [D15:D8] CPU [D7:D0] Buffered [D15:D8] Buffered [D7:D0] ToMemoryandI/O Systems 8288 Bus Controller also generates Direction and Enable signals for Bi- Directional Transeivers Supports Buffering the System Data Bus
  • 55. BLM 3622- Microprocessors 55 8086 Read Cycle T1 T2 T3 T4 Address Status 001 or 101 Address Valid Data float float Valid Address CLK /S0, /S1, /S2 A16..A19, /BHE ALE AD0..AD15 A0..A19 S3..S6 DT/R DEN /MRDC or /IORC
  • 56. BLM 3622- Microprocessors 56 8086 Write CycleT1 T2 T3 T4 Address Status 010 or 110 Address Valid Data Valid Address CLK /S0, /S1, /S2 A16..A19, /BHE ALE AD0..AD15 A0..A19 S3..S6 DT/R DEN /MWTC or /IOWC
  • 57. BLM 3622- Microprocessors 57 8086 Read Cycle (1 Wait State) T1 T2 T3 Tw Address Status 001 or 101 Address Valid Data float float Valid Address CLK /S0, /S1, /S2 A16..A19, /BHE ALE AD0..AD15 A0..A19 S3..S6 DT/R DEN /MRDC or /IORC T4 8284 RDY READY
  • 58. BLM 3622- Microprocessors 58 8086/8088 Summary  First Generation (introduced June 1978)  One of the first 16-bit processors on the market  16-bit internal registers  16/8-bit external data bus  20-bit address bus (1MB addressable)  Used in 1st generation IBM PCs (1981)
  • 59. BLM 3622- Microprocessors 59 80186/80188  Evolution of 8086/8088 80186/80188  Increased instruction set  On-chip system components (Clock generator, DMA, Interrupt, Timers…)  Unsuccessful in PCs  Popular in embedded systems…
  • 60. BLM 3622- Microprocessors 60 2nd Generation Processor 286  P2 (286) = 2nd Generation Processor  Introduced in 1981  CPU behind IBM AT  Throughput of original IBM AT (6MHz) was about 500% of IBM PC (4.77MHz)  Level of integration: 134k transistors (vs 29k in 8086)  Still a 16-bit processor…  Available in higher clock frequencies: 25MHz
  • 61. BLM 3622- Microprocessors 61 2nd Generation Processors 286  Fully backwards compatible to 8086 80286 runs 8086 software without modification  Improved instruction execution Average instruction takes 4.5 cycles vs. 12 cycles (8086)  Improved instruction set  Real mode and Protected Mode Multitasking-support. What happens in one area of memory doesn’t affect other programs. Protected mode supported by Windows 3.0.  16MB addressable physical memory  On-chip MMU (1GB virtual memory)  Non-multiplexed address-bus and data-bus
  • 62. BLM 3622- Microprocessors 62 Improving Computer Performance  We’ve seen how 16-bit computer technology based on the 8086 and 80286 processors developed  These computers are not powerful enough for today’s applications  How do you improve the performance of your computer?  Let’s start with the CPU
  • 63. BLM 3622- Microprocessors 63 CPU Performance (1)  MOST OBVIOUS: Processor Clock Frequency  Increased frequency – increased execution rate  State of the Art: >2GHz (Jan 2002)  Memory and I/O access times can be performance bottleneck – unless you take some special measures
  • 64. BLM 3622- Microprocessors 64 CPU Performance (2)  ALU register width  A processor is an n-bit processor, where N represents the precision of the ALU – N can be 4, 8, 16, 32, or 64  The wider the registers – the more processing per clock  Data bus width  The wider the data bus the faster we can transfer data  Since the memory and I/O device access times are finite, the more bits transferred per cycle the better
  • 65. BLM 3622- Microprocessors 65 CPU Performance (3)  Address bus width  Increased address width doesn’t provide a ‘speed’ increase as such  CPU can directly address more memory  PCs use big programs, which would not fit in a smaller address space  Overcoming small address space takes time  Impacts on overall system performance
  • 66. BLM 3622- Microprocessors 66 3rd Generation Processor 386  P3 (386) = 3rd Generation Processor  Introduced: 10/1985  Full 32-bit processor (32-bit registers. 32-bit internal and external databus. 32-bit address bus)  275k transistors. CMOS. 132-pin PGA package. (Supply current Icc=400mA. Roughly the same as 8086 !)  Clock speeds: 16-33MHz  P3 processors were far ahead of their time: It took 10 years before 32-bit operating systems became mainstream!  First 386 PCs early 1987 (COMPAQ)
  • 67. BLM 3622- Microprocessors 67 3rd Generation Processor 386  Modes of operation:  Real. Protected. Virtual Real.  Protected mode of 386 is fully compatible with 286 Protected mode=native mode of operation. Chips are designed for advanced operating systems such as Windows NT  New virtual real mode Processor can run with hardware memory protection while simulating the 8086’s real-mode operation. Multiple copies of e.g. DOS can run simultaneously, each in a protected area of memory. If a program in one memory area crashes, the rest of the system is protected.
  • 68. Intel 32-bit Architecture:IA-32 Addressing Unit (AU) Bus Unit (BU) Instruction Unit (IU) Prefetch Queue Registers Control Unit (CU) ALU Execution Unit (EU) Address Data The 80386 includes a Bus Interface Unit for reading and providing data and instructions, witha Prefetch Queue, an IU for controlling the EU with its registers, as well as an AU for generating memory and I/O addresses BLM 3622- Microprocessors 68
  • 69. BLM 3622- Microprocessors 69 80386 Features  32-bit general and offset registers  16-byte prefetch queue  Memory management unit with segmentation unit and paging unit  32-bit address and data bus  4-Gbyte physical address space  64-Tbyte virtual address space  i387 numerical coprocessor  Implementation of real, protected and virtual 8086 modes
  • 70. BLM 3622- Microprocessors 70 80386 Operating Modes  Protected Mode for Multitasking support  Real Mode (native 8086 mode)  Processor powers up in Real Mode  System Management Mode  Power management or system security  Processor switches to separate address space, while saving the entire context of the currently running program or task
  • 71. 80386 Register Set EIP IP 31 16 15 0 FLAG 31 16 15 E0 EFLAG AH 31 16 15 0 AL 78 BH BL EAX EBX CH CLECX DH DLEDX SIESI DIEDI BPEBP SPESP Instruction Pointer EFLAG Register General-Purpose Registers Segment Registers 15 0 CS SS DS ES FS GS BLM 3622- Microprocessors 71
  • 72. BLM 3622- Microprocessors 72 80386 Prefetch Queue Execution Unit Bus Interface Unit 16-byte deep Instruction Queue 32-bit Data Bus Fetching from on-chip Queue is fast Reading from off-chip Memory is slow
  • 73. BLM 3622- Microprocessors 73 80386 Prefetch Queue  80386 Prefetch queue is 16-bytes deep 1. The instruction fetch can read from the prefetch queue faster than from memory 2. The prefetcher can do some work while the execution unit is doing other tasks in parallel
  • 74. BLM 3622- Microprocessors 74 Coprocessor: i387  The hardware implementation of floating point processing in the i387 means floating point operations run at much higher speed.  The i386 can execute all mathematical expressions using software emulation of the i387.
  • 75. BLM 3622- Microprocessors 75 80386: Classic CISC Processor  CISC = Complex Instruction Set Computer  Complex instructions  ...but code-size efficient  Micro-encoding of the machine instructions  Extensive addressing capabilities for memory operations  Few, but very useful CPU registers
  • 76. BLM 3622- Microprocessors 76 80386 Execution SequenceBusInterface PrefetchQueue DecodingUnit Control Unit Microcode ROM Microcode Queue ExecutionUnit Register Register Register Register ALU Coprocessor CISC Processor In a microprogrammed CISC the processor fetches the instructions via the bus interface into a prefetch queue, which transfers them to a decoding unit. The decoding unit breaks the machine instruction into many elementary micro-instructions and apples them to a microcode queue. The micro-instructions are transferred from the microcode queue to the control and execution unit which drives the ALU and the registers
  • 77. BLM 3622- Microprocessors 77 80386 Complex Instructions  CISC drawback: Most instructions are so complicated, they have to be broken into a sequence of micro-steps  These steps are called Micro-Code  Stored in a ROM in the processor core  Micro-code ROM: Access-time and size...  They require extra ROM and decode logic
  • 78. BLM 3622- Microprocessors 78 RISC: Less is More  RISC = Reduced Instruction Set Computer  20/80 Rule: 20% of the instructions take up 80% of the time  Sometimes executing a sequence of simple instructions runs quicker than a single complex machine instruction that has the same effect
  • 79. BLM 3622- Microprocessors 79 RISC Ideas (1)  Reduce the instruction set to simplify the decoding  Smaller Instruction Set -> Simpler Logic -> Smaller Logic -> Faster Execution  Eliminate microcode – hardwire all instruction execution  Pipeline instruction decoding and executing – do more operations in parallel
  • 80. BLM 3622- Microprocessors 80 RISC Ideas (2)  Load/Store Architecture – only the load and store instructions can access memory  All other instructions work with the processor internal registers  This is necessary for single-cycle execution – the execution unit can’t wait for data to be read/written
  • 81. BLM 3622- Microprocessors 81 RISC Ideas (3)  Increase number of internal register due to Load/Store Architecture  Also registers are more general purpose and less associated with specific functions  Compiler designed along with the RISC processor deesign. Compiler has to be aware of the processor architecture to produce code that can be executed efficiently
  • 82. BLM 3622- Microprocessors 82 Instruction Pipelining - Operations Can Be Carried Out in Parallel  Read the instruction from memory or the prefetch queue (instruction fetch phase)  Decode the instruction (decode phase)  Where necessary, fetch the operands (operand fetch phase)  Execute the instruction (execute phase)  Write back the result (write-back phase)
  • 84. BLM 3622- Microprocessors 84 Superscalar Architecture:  The processor may have more than one pipeline (Pentium…)  Where possible each pipeline works independently  Not always possible  May achieve average completed execution of more more than one instruction per clock cycle
  • 85. BLM 3622- Microprocessors 85 Pipelining problems  More logic per pipeline stage – same resource can’t be used twice  E.g. can’t re-use ALU for computing implied addresses  Synchronisation Problems  Delayed Jump/Branch  Data and Register dependency, e.g. ADD reg1, reg2, reg7 AND reg6, reg1, reg3
  • 86. BLM 3622- Microprocessors 86 Getting the Benefits of Pipelining  Simplified Instruction decoding  Simpler, faster logic  On-chip cache memories  Local memory on-chip to avoid memory access bottlenecks  Floating Point pipeline for FP coprocessor  Speculative Execution to get around pipeline flushes
  • 87. BLM 3622- Microprocessors 87 Software Implications of RISCs  Optimising Compiler must know how pipeline works (Compiler must be aware of pipeline delays, and insert NOPs if need be)  Lower code density in RISC because instructions are less efficient  PowerPC code takes up to 30% more code to do the same tasks as an x86 CPU  more memory accesses, potential performance impact...
  • 88. BLM 3622- Microprocessors 88 80486: IA-32 with RISC elements  Introduced 04/91  Greatly improved 80386 CPU  Hard-wired implementation of frequently used instructions (as in RISCs). On average 2 clock cycles/instruction.  5 stage instruction pipeline  Internal L1 Cache Memory (8kB) + cache controller  On-chip Floating Point coprocessor (FPU)  Longer Prefetch Queue (32-bytes as opposed to 16 on the 80386)  Higher frequency operation: up to 120MHz  >1.2M transistors, 0.8m CMOS. 168-pin PGA.
  • 89. BLM 3622- Microprocessors 89 80486 Block Diagram BusInterface Cache (8K bytes)Prefetcher (32-byte queue) Paging Unit Decoding Unit Segmentation Unit Control Unit Register and ALU Floating Point Unit A31-A0 D31-D0 Control and Status Signals i486 CPU
  • 90. BLM 3622- Microprocessors 90 80486 Pipeline InstructionFetch Decode1 (memoryaccess) Decode2 Execution Write-back ADD eax, mem32 Decode ADD, fetch mem32 Decode ADD (continued) Write result into eax Add eax and mem32 Cycle n Cycle n+1 Cycle n+2 Cycle n+3 Cycle n+4