Microprocessor Assignment: Priyank Mandal 285/C0/07
Microprocessor Assignment: Priyank Mandal 285/C0/07
PRIYANK MANDAL
285/C0/07
PENTIUM 1
Intel first announced that the name for the successor to the 486 would be "Pentium." The original Pentium
is an extremely modest design by today's standards, and when it was introduced in 1993 it wasn't exactly
a blockbuster by the standards of its RISC contemporaries, either. While its superscalar design (Intel's
first) certainly improved on the performance of its predecessor, the 486, the main thing that the Pentium
had going for it was x86 compatibility.
Other features:
Enhanced debug features with the introduction of the Processor-based debug port
Enhanced self test features like the L1 cache parity check
Pentium II
Unlike previous Pentium and Pentium Pro processors, the Pentium II CPU was
packaged in a slot-based module rather than a CPU socket.
The L2 cache ran at half the processor's clock frequency, unlike the Pentium Pro,
whose off die L2 cache ran at the same frequency as the processor. However, the
smallest cache size was increased to 512 KB from the 256 KB on the Pentium Pro.
Off-package cache solved the Pentium Pro's low yields, allowing Intel to introduce the
Pentium II at a mainstream price level. This arrangement also allowed Intel to easily vary
the amount of L2 cache, thus making it possible to target different market segments with
cheaper or more expensive processors and accompanying performance levels.
The Pentium II was basically a more consumer-oriented version of the Pentium Pro. It
was cheaper to manufacture because of the separate, slower L2 cache memory.
Pentium III
The most notable difference was the addition of the SSE instruction set (to accelerate floating
point and parallel calculations), and the introduction of a controversial serial number embedded
in the chip during the manufacturing process. Pentium lll was a further development of
the Deschutes Pentium II. The only differences were the addition of execution units and the
modification of instruction decode and issue logic to support SSE; as well as an improved L1
cache controller - the L2 cache controller was left unchanged.
Pentium lll contains 9.5 million transistors and has dimensions of 12.3 mm by 10.4 mm
(128 mm2). It is fabricated in Intel's P856.5 process, a 0.25 micrometre CMOS process with five
levels of aluminum interconnect.
Since Katmai was built in the same 0.25 µm process as Pentium II "Deschutes", it had to
implement SSE using as little silicon as possible. To achieve this goal, Intel implemented the
128-bit architecture by double-cycling the existing 64-bit data paths and by merging the SIMD-
FP multiplier unit with the x87 scalar FPU multiplier into a single unit. To utilize the existing 64-
bit data paths, Katmai issues each SIMD-FP instruction as two μops. To compensate partially
for implementing only half of SSE’s architectural width, Katmai implements the SIMD-FP adder
as a separate unit on the second dispatch port. This organization allows one half of a SIMD
multiply and one half of an independent SIMD add to be issued together bringing the peak
throughput back to four floating point operations per cycle — at least for code with an even
distribution of multiplies and adds.
Pentium 4
The Pentium 4 brand refers to Intel's line of single-core desktop and laptop central
processing units (CPUs) introduced on November 20, 2000[1] and shipped through August 8,
2008[2]. They had the 7th-generation x86 microarchitecture, called NetBurst, which was the
company's first all-new design since introduction of P6 microarchitecture of the Pentium Pro
CPUs in 1995. NetBurst differed from the preceding P6 (Pentium III, II, etc.) by featuring a
very deep instruction pipeline to achieve very high clock speeds[3] (up to 3.8 GHz) limited only
by TDPs reaching up to 115 W in 3.4 GHz –3.8 GHz Prescott and Prescotts 2M cores[4] . In
2004, the initial 32-bit x86 instruction set of the Pentium 4 microprocessors was extended by the
64-bit x86-64 set.
The first Pentium 4 cores, codenamed Willamette, were clocked from 1.3 GHz to 2 GHz and the
first Willamette processor was released on November 20, 2000 using Socket 423. Notable with
the introduction of the Pentium 4 was the 400 MHz FSB. It actually operated at 100 MHz but the
FSB was quad-pumped, meaning that the maximum transfer rate was four times the base clock
of the bus, so it was considered to run at 400 MHz. The AMD Athlon's double-pumped FSB was
running at 200 MHz or 266 MHz at that time.
Pentium 4 CPUs introduced the SSE2 and, in the Prescott-based Pentium 4s. SSE3 instruction
sets to accelerate calculations, transactions, media processing, 3D graphics, and games.
Later versions featured Hyper-Threading Technology (HTT), a feature to make one physical
CPU work as two CPUs, one logical and one virtual. Intel also marketed a version of their low-
end Celeronprocessors based on the NetBurst microarchitecture (often referred to as Celeron
4), and a high-end derivative, Xeon, intended for multiprocessor servers and workstations.
The Pentium 4 has an integrated heat spreader (IHS) that prevents the die from accidentally
getting damaged when mounting and unmounting cooling solutions.
Pentium Pro
While the Pentium and Pentium MMX had 3.1 and 4.5 million transistors, respectively,
the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow
role as a server and high-end desktop processor and was used
in supercomputers like ASCI Red. The Pentium Pro was capable of both dual- and quad-
processor configurations. It only came in one form factor, the relatively large
rectangular Socket 8. The Pentium Pro had a completely new microarchitecture, a
departure from the Pentium rather than an extension of it. It has a decoupled, 12 stage,
superpipelined architecture which uses an instruction pool.
The Pentium Pro pipeline had extra decode stages to dynamically translate IA-
32 instructions into buffered micro-operationsequences which could then be analysed,
reordered, and renamed in order to detect parallelizable operations that may be issued
to more than one execution unit at once. The Pentium Pro thus featured out of order
execution, including speculative execution via register renaming. It also had a wider 36-
bit address bus (usable by PAE). The Pentium Pro has an 8 KiB instruction cache, from
which up to 16 bytes are fetched on each cycle and sent to the instruction decoders.
There are three instruction decoders. The decoders are not equal in capability: only one
can decode any x86 instruction, while the other two can only decode simple x86
instructions. This restricts the Pentium Pro's ability to decode multiple instructions
simultaneously, limiting superscalar execution. x86 instructions are decoded into 118-
bit micro-operations
Performance
Performance with 32-bit code was excellent and well ahead of the older Pentiums at the time,
usually by 25-35%. However, Pentium Pro's 16-bit performance was the same as the original
Pentium. It was this, along with the Pentium Pro's high price, that caused the rather lackluster
reception among PC enthusiasts.
Pentium MMX
TECHNICAL DETAILS
MMX defined eight registers, known as MM0 through MM7 (henceforth referred to as MMn). To
avoid compatibility problems with the context switch mechanisms in existing operating systems,
these registers were aliases for the existing x87 FPU stack registers (so no new registers
needed to be saved or restored). Hence, anything that was done to the floating point stack
would also affect the MMX registers and vice versa. However, unlike the FP stack, the MMn
registers are directly addressable (random access). Each of the MMn registers holds 64 bits
(the mantissa-part of a full 80-bit FPU register). The main usage of the MMX instruction set is
based on the concept of packed data types, which means that instead of using the whole
register for a single 64-bit integer, two 32-bit integers, four 16-bit integers, or eight 8-bit integers
may be processed concurrently.
The mapping of the MMX registers onto the existing FPU registers made it somewhat difficult to
work with floating point and SIMD data in the same application. To maximize performance,
programmers often used the processor exclusively in one mode or the other, deferring the
relatively slow switch between them as long as possible.Because the FPU stack registers are
80 bits wide, the upper 16 bits of the stack registers go unused in MMX, and these bits are set
to all ones, which makes them NaNs or infinities in the floating point representation. This can be
used to decide whether a particular register's content is intended as floating point or SIMD data.
MMX provides only integer operations. When originally developed, for the Intel i860, the use of
integer math made sense (both 2D and 3D calculations required it), but as graphics cards that
did much of this became common, integer SIMD in the CPU became somewhat redundant for
graphical applications. On the other hand, the saturation arithmetic operations in MMX could
significantly speed up some digital signal processing applications.
ISA BUS ARCHITECHTURE
The PC/AT-bus is a 16-bit (or 80286-) version of the PC/XT bus was introduced with the IBM
PC/AT, officially termed I/O Channel by IBM. It extends the XT-bus by adding a second shorter
edge connector in-line with the eight-bit XT-bus connector, which is unchanged, retaining
compatibility with most 8-bit cards. The second connector adds four additional address lines for
a total of 24, and eight additional data lines for a total of 16. It also adds new interrupt lines
connected to a second 8259 PIC (connected to one of the lines of the first) and four 16-bit DMA
channels, as well as control lines to select 8 or 16 bit transfers.
Number of devices
Motherboard devices have dedicated IRQs (not present in the slots). 16-bit devices can use
either PC-bus or PC/AT-bus IRQs. It is therefore possible to connect up to 6 devices that use
one 8-bit IRQ each, or up to 5 devices that use one 16-bit IRQ each. At the same time, up to
four devices may use one 8-bit DMA channel each, while up to three devices can use one 16-bit
DMA channel each.
EISA was much favoured by manufacturers due to the proprietary nature of MCA, and even IBM
produced some machines supporting it. It was somewhat expensive to implement (though not
as much as MCA), so it never became particularly popular in desktop PCs. However, it was
reasonably successful in the server market, as it was better suited to bandwidth-intensive tasks
(such as disk access and networking). Most EISA cards produced were either SCSI or network
cards.
Although the EISA bus had a slight performance disadvantage over MCA (bus speed of 8.33
MHz, compared to 10 MHz), EISA contained almost all of the technological benefits that MCA
boasted, including bus mastering, burst mode, software configurable resources, and 32-bit
data/address buses. These brought EISA nearly to par with MCA from a performance
standpoint, and EISA easily defeated MCA in industry support.
EISA replaced the tedious jumper configuration common with ISA cards with software-based
configuration. Every EISA system shipped with an EISA configuration utility; this was usually a
slightly customized version of the standard utilities written by the EISA chipset makers. The user
would boot into this utility, either from floppy disk or on a dedicated hard drive partition. The
utility software would detect all EISA cards in the system, and could configure any hardware
resources (interrupts, memory ports, etc) on any EISA card (each EISA card would include a
disk with information that described the available options on the card), or on the EISA
system motherboard. The user could also enter information about ISA cards in the system,
allowing the utility to automatically reconfigure EISA cards to avoid resource conflicts.
ARCHITECHTURE
PCIe is structured around point-to-point serial links, a pair of which (one in each direction)
make up a lane; rather than a shared parallel bus. These lanes are routed by a hub on
the main-board acting as a crossbar switch. This dynamic point-to-point behavior allows more
than one pair of devices to communicate with each other at the same time. In contrast, older PC
interfaces had all devices permanently wired to the same bus; therefore, only one device could
send information at a time. This format also allows channel grouping, where multiple lanes are
bonded to a single device pair in order to provide higher bandwidth.
INTERCONNECT
PCIe devices communicate via a logical connection called an interconnect or link. A link is a point-to-point
communication channel between 2 PCIe ports, allowing both to send/receive ordinary PCI-requests
(configuration read/write, I/O read/write, memory read/write) and interrupts (INTx, MSI, MSI-X). At the
physical level, a link is composed of 1 or more lanes. Low-speed peripherals (such as an 802.11 Wi-
Ficard) use a single-lane (×1) link, while a graphics adapter typically uses a much wider (and thus, faster)
16-lane link.