Computer Architecture I Lecture Presentation 1
Computer Architecture I Lecture Presentation 1
Lecture Notes
References:
Patterson&Hennessy, “Computer Organization and Design” (4th Ed.), Kaufmann, 2008.
Stallings, “Computer Organization & Architecture” (7th Ed.), Pearson, 2006.
Mano & Kime, “Logic and Computer Design Fundamentals”, 4th Ed., Prentice Hall, 2008.
Brown & Vranesic, “Fund. Of Dig. Logic with VHDL Design” (2nd Ed.), McGraw Hill, 2005.
Dr. Patterson’s and Dr. Mary Jane Irwin’s (Penn State) Lecture notes
Introduction
Lecture 1
Ali Muhtaroğlu 2
What computers were…
Servers
Robots
Smart Routers
phones
Automobiles
Supercomputers
Ali Muhtaroğlu 4 4
CS152-Spring’08
What is Computer Architecture ?
Application
Physics
Ali Muhtaroğlu 6
Computer Architecture
vs. Computer Organization
• Architecture is those attributes visible to the
programmer
– Instruction set, number of bits used for data
representation, I/O mechanisms, addressing techniques.
– e.g. Is there a multiply instruction?
Ali Muhtaroğlu 8
Historical: ENIAC - background
• Electronic Numerical • Decimal (not binary)
Integrator And Computer • 20 accumulators of 10 digits
• Eckert and Mauchly • Programmed manually by
• University of Pennsylvania switches
• Trajectory tables for weapons • 18,000 vacuum tubes
• Started 1943 • 30 tons
• Finished 1946 • 15,000 square feet
– Too late for war effort • 140 kW power consumption
• Used until 1955 • 5,000 additions per second
Ali Muhtaroğlu 9
von Neumann/Turing
• Stored Program concept
• Main memory storing
programs and data
• ALU operating on binary data
• Control unit interpreting
instructions from memory and
executing
• Input and output equipment
operated by control unit
• Princeton Institute for
Advanced Studies
– IAS
• Completed 1952
Ali Muhtaroğlu 10
IAS - details
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions
Ali Muhtaroğlu 11
IAS - details
• 1000 x 40 bit words
– Binary number
– 2 x 20 bit instructions
Ali Muhtaroğlu 12
Components of a Computer
• 5 classic components of a computer
• Input, Output, Memory, Datapath, Control
• Independent of hardware technology
• Represents the past and the present
Ali Muhtaroğlu 13
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
– Up to 100 devices on a chip
• Medium scale integration - to 1971
– 100-3,000 devices on a chip
• Large scale integration - 1971-1977
– 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
– 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
– Over 100,000,000 devices on a chip
Ali Muhtaroğlu 14
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
Ali Muhtaroğlu 16
Speeding it up
• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis
• Speculative execution
Ali Muhtaroğlu 17
Performance Balance
Ali Muhtaroğlu 18
Logic and Memory Performance Gap
Ali Muhtaroğlu 19
Solutions to Logic/Memory Gap
• Increase number of bits retrieved at one time
– Make DRAM “wider” rather than “deeper”
Ali Muhtaroğlu 21
I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this
• Problem moving data
• Solutions:
– Caching
– Buffering
– Higher-speed interconnection buses
– More elaborate bus structures
– Multiple-processor configurations
Ali Muhtaroğlu 22
Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
– Fundamentally due to shrinking logic gate size
• More gates, packed more tightly, increasing clock rate
• Propagation time for signals reduced
Ali Muhtaroğlu 24
Intel Microprocessor Performance
Ali Muhtaroğlu 25
Increased Cache Capacity
Ali Muhtaroğlu 26
More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line
– Different stages of execution of different instructions at
same time along pipeline
• Superscalar allows multiple pipelines within single
processor
– Instructions that do not depend on one another can be
executed in parallel
Ali Muhtaroğlu 27
Diminishing Returns
• Internal organization of processors complex
– Can get a great deal of parallelism
– Further significant increases likely to be relatively modest
Ali Muhtaroğlu 28
Uniprocessor Performance
10000
From Hennessy and Patterson, Computer
Architecture: A Quantitative Approach, 4th
edition, October, 2006 ??%/year
1000
Performance (vs. VAX-11/780)
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Ali Muhtaroğlu 31
Multiple Cores
• Multiple processors on single chip
– Large shared cache
• Within a processor, increase in performance proportional to
square root of increase in complexity
• If software can use multiple processors, doubling number of
processors almost doubles performance
• So, use two simpler processors on the chip rather than one
more complex processor
• With two processors, larger caches are justified
– Power consumption of memory logic less than processing logic
• Example: IBM POWER4
– Two cores based on PowerPC
Ali Muhtaroğlu 32
Some physics…
• Dynamic power dissipation is CV2f
Where C is the effective switching capacitance while running an
application
f is the switching frequency
V is the operating voltage
• Also f is roughly proportional to V
Some math…
• Assume a single core processor C = 10 nF, V=1.2 V, f = 2 GHz
Power = CV2f = 28.8 W
• What if we get to 2.4 GHz by increasing V to 1.3 V.
Power = 40.6 W
• What if we have 2 core processor operating at 1.6 GHz and 1.1 V
Power = 38.7 W
• What if we now build the original chip to be less complex i.e. smaller C ?
Ali Muhtaroğlu 33
POWER4 Chip Organization
Ali Muhtaroğlu 34
Problems with “sea change”?
• Algorithms, Programming Languages, Compilers,
Operating Systems, Architectures, Libraries, … not
ready to supply Thread-Level Parallelism or Data-
Level Parallelism for 1000 CPUs / chip,
Physics
Reinvigoration of
computer architecture,
mid-2000s onward.
Ali Muhtaroğlu 36
EEE-445
Our goal will be to acquire a good understanding of:
- Computer System components
- Instruction Set Architecture (ISA) design
- Single-Cycle and Multi-Cycle hardware organization/design to
support an ISA through
- We will do limited exercises on defining hardware through
- Software emulators
- VHDL/schematic capture description and simulation