Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit I Overview & Instructions: Cs6303-Computer Architecture

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 16



Eight ideas Components of a computer system Technology Performance Power wall
Uniprocessors to multiprocessors; Instructions operations and operands representing
instructions Logical operations control operations Addressing and addressing modes.
Machine Structures
Casses !" C!#$utin% A$$icati!ns an& Their Characteristics
Pers!na c!#$uters 'PCs(
Personal computers emphasie deli!ery of good performance to single users at low cost and
usually e"ecute third#party soft ware. A computer designed for use $y an indi!idual% usually
incorporating a graphics display% a &ey$oard% and a mouse.
A computer used for running larger programs for multiple users% oft en simultaneously% and
typically accessed only !ia a networ&. Ser)ers are the modern form of what were once much
larger computers% and are sually accessed only !ia a networ&. 'er!ers are oriented to carrying
large wor&loads% which may consist of either single comple" applications(usually a scientifi c
or engineering application(or handling any small )o$s% such as would occur in $uilding a large
we$ ser!er.
Th ese low#end ser!ers are typically used for fi le storage% small $usiness applications% or simple
we$ ser!ing. At the other e"treme are su$erc!#$uters% which at the present consist of tens of
thousands of processors and many tera*+tes of memory% and cost tens to hundreds of millions
of dollars.
A$$icati!ns !" Su$erc!#$uters,
high#end scientifi c and engineering calculations% such as weather forecasting% oil e"ploration%
protein structure determination% and other large#scale pro$lems.
E#*e&&e& c!#$uters
A computer inside another de!ice used for running one predetermined application or collection
of soft ware.
/0 .esi%n "!r M!!re1s 2a3
The numbers of transistors incorporated in a chip will approximately double every 24
M!!re1s a3 resute& "r!# a /465 $re&icti!n !" such %r!3th in IC ca$acit+ #a&e *+
-!r&!n M!!re6 c!-"!un&er !" Inte0
70 Use A*stracti!ns t! Si#$i"+ .esi%n
A ma)or producti!ity techni+ue for hardware and software is to use a$stractions to represent the
design at different le!els of representation% lower#le!el details are hidden to offer a simpler
model at higher e!els.
30 Ma8e the c!##!n case "ast
,a&ing the common case fast will tend to enhance performance $etter than optimiing the rare
case. Ironically% the common case is often simpler than the rare case and hence is often easier to
90 Per"!r#ance )ia Paraeis#
Computer architects ha!e offered designs that get more performance $y performing operations in
Parallel -e+uests Assigned to computer e.g. search ./arcia0
Parallel Threads Assigned to core e.g. loo&up% ads
Parallel Instructions 1 2 instruction 3 one time e.g. 4 pipelined instructions
Parallel 5ata 1 2 data item 3 one time e.g. add of 6 pairs of words
50 Per"!r#ance )ia Pi$einin%
Pipelining is an implementation techni+ue where multiple instructions are o!erlapped in
e"ecution. The computer pipeline is di!ided in stages. Each stage completes a part of an
instruction in parallel. The stages are connected one to the ne"t to form a pipe # instructions enter
at one end% progress through the stages% and e"it at the other end.
60 Per"!r#ance )ia Pre&icti!n
In some cases it can $e faster on a!erage to guess and start wor&ing rather than wait until you
&now for sure% assuming that the mechanism to reco!er from a misprediction is not too e"pensi!e
and your prediction is relati!ely accurate.
:0 Hierarch+ !" #e#!ries
Programmers want memory to $e fast% large% and cheap% as memory speed often shapes
performance% capacity limits the sie of pro$lems that can $e sol!ed% and the cost of memory
today is often the ma)ority of computer cost. Architects ha!e found that they can address these
conflicting demands with a hierarchy of memories% with the fastest% smallest% and most e"pensi!e
memory per $it at the top of the hierarchy and the slowest% largest% and cheapest per $it at the
$ottom. Caches gi!e the programmer the illusion that main memory is nearly as fast as the top of
the hierarchy and nearly as $ig and cheap as the $ottom of the hierarchy. A layered triangle icon
is used to represent the memory hierarchy. The shape indicates speed% cost% and sie7 the closer to
the top% the faster and more e"pensi!e per $it the memory; the wider the $ase of the layer% the
$igger the memory.
;0 .e$en&a*iit+ )ia Re&un&anc+
Computers not only need to $e fast; they need to $e dependa$le. 'ince any physical
de!ice can fail% we ma&e systems dependa$le $y including redundant components that
can ta&e o!er when a failure occurs and to help detect failures.
8 ################################################################# "
C!#$!nents !" a c!#$uter s+ste#
The underlying hardware in any computer performs the same $asic functions7 inputting
data% outputting data% processing data% and storing data.
In$ut &e)ice
A mechanism through which the computer is fed information% such as a &ey$oard.
Out$ut &e)ice
A mechanism that con!eys the result of a computation to a user% such as a display% or to another
'CPU( Also called processor. The acti!e part of the computer% which contains the datapath and
control and which adds num$ers% tests num$ers% signals I9: de!ices to acti!ate% and so on.
The component of the processor that performs arithmetic operations
C!ntr!, The component of the processor that commands the datapath% memory% and I9:
de!ices according to the instructions of the program.
Th e storage area in which programs are &ept when they are running and that contains the data
needed $y the running programs.
Th e #e#!r+ is where the programs are &ept when they are running; it also contains the data
needed $y the running programs. Th e memory is $uilt from 5-A, chips. DRAM stands for
&+na#ic ran&!# access #e#!r+. ,ultiple 5-A,s are used together to contain the
instructions and data of a program. In contrast to se+uential access memories% such as magnetic
tapes% the RAM portion of the term 5-A, means that memory accesses ta&e $asically the same
amount of time no matter what portion of the memory is read.
.+na#ic ran&!# access #e#!r+ '.RAM(
,emory $uilt as an integrated circuit; it pro!ides random access to any location. Access times
are 4; nanoseconds and cost per giga$yte in <;2< was =4 to =2;.
Hierarchica a+ers !" har&3are an& s!"t3are
>igure shows that the layers of software are organied primarily in a hierarchical fashion% with
applications $eing the outermost ring and a !ariety of systems software sitting $etween the
hardware and applications software.
System Software7 'oftware that pro!ides ser!ices that are commonly useful% including operating
systems% compilers% loaders% and assem$lers.
There are many types of systems software% $ut two types of systems software are central to e!ery
computer system software today7 an operating system and a compiler.
An :perating system interfaces $etween a user?s program and the hardware and pro!ides a
!ariety of ser!ices and super!isory functions. Among the most important functions are7
@andling $asic input and output operations
Allocating storage and memory
Pro!iding for protected sharing of the computer among multiple applications using it
E"amples of operating systems in use today are Linu"% i:'% and Aindows
Compiler: A program that translates high#le!el language statements into assem$ly language
Instruction7 A command that computer hardware understand and o$eys.
Assembler: A program that translates a sym$olic !ersion of instructions into the $inary !ersion
Assembly an!ua!e7 A sym$olic representation of machine instructions.
"achine an!ua!e7 A $inary representation of machine instructions.
Techn!!%ies "!r <ui&in% Pr!cess!rs an& Me#!r+
Processors and memory ha!e impro!ed at an incredi$le rate% $ecause computer designers ha!e
long em$raced the latest in electronic technology to try to win the race to design a $etter
A transistor is simply an on9off switch controlled $y electricity. The integrated circuitBICC
com$ined doens to hundreds of transistors into a single chip. To descri$e the tremendous
increase in the num$er of transistors from hundreds to million% the ad)ecti!e !ery large scale is
added to the term% creating the a$$re!iation DL'I% for !ery large scale integrated circuit.
The process starts with a silicon crystal ingot which loo&s li&e a giant sausage. An ingot is finely
sliced into wafers no more than ;.2 inches thic&. These wafers then go through a series of
processing steps% during which patterns of chemicals are placed on each wafer% creating the
transistors% conductors and insulators.
"anufacturin! #rocess of $nte!rated Circuits
The patterned wafer is then chopped up% or diced% into these components called dies and more
informally &nown as chi$s.
5icing ena$les you to discard only those dies that were unluc&y enough to contain the fl aws%
rather than the whole wafer. This concept is +uantified $y the +ie& of a process% which is defined
as the percentage of good dies from the total num$er of dies on the wafer.
The cost of an integrated circuit rises +uic&ly as the die sie increases% due $oth to the lower
yield and the smaller num$er of dies that fit on a wafer. To reduce the cost% using the ne"t
generation process shrin&s a large die as it uses smaller sies for $oth transistors and wires. This
impro!es the yield and the die count per wafer.
The cost of an IC can $e e"pressed in three simple e+uations7
Accurately measuring and comparing different computers is critical. Performance can $e
determined $y different ways.
Res$!nse ti#e Also called e=ecuti!n ti#e. Th e total time re+uired for the computer to
complete a tas&% including dis& accesses% memory accesses% I9: acti!ities% operating system
o!erhead% CPU e"ecution time% and so on.
5atacenter managers are oft en interested in increasing thr!u%h$ut or *an&3i&th(the total
amount of wor& done in a gi!en time.
Thr!u%h$ut an& Res$!nse Ti#e
5o the following changes to a computer system increase throughput% decrease response time% or
2. -eplacing the processor in a computer with a faster !ersion
<. Adding additional processors to a system that uses multiple processors
for separate tas&s(for e"ample% searching the we$
5ecreasing response time almost always impro!es throughput. @ence% in case 2% $oth response
time and throughput are impro!ed. In case <% no one tas& gets wor& done faster% so only
throughput increases.
Performance of computers primarily concerned with response time. To ma"imie performance%
minimie the response time of e"ecution time for some tas&. Thus% performance and e"ecution
time can $e related for a computer 8 as%
Performance of two different computers can $e related +uantitati!ely li&e .8 is n times faster
than F0 or e+ui!alently .8 is n times as fast as F0 to mean
Measurin% Per"!r#ance
Time is the measure of computer performance7 the computer that performs the same
amount of wor& in the least time is the fastest. Program execution time is measured in seconds
per program.
Th e most straightforward defi nition of time is called wall clock time% response time% or elapsed
time. Th ese terms mean the total time to complete a tas&% including dis& accesses% memory
accesses% input/output BI9:C acti!ities% operating system o!erhead(e!erything.
CPU e=ecuti!n ti#e Also called CPU ti#e. Th e actual time the CPU spends computing
for a specific tas&.
User CPU ti#e , The CPU time spent in a program itself.
S+ste# CPU ti#e The CPU time spent in the operating system performing tas&s on $ehalf of the
Almost all computers are constructed using a cloc& that determines when e!ents ta&e place in the
hardware. Th ese discrete time inter!als are called c!c8 c+ces Bor tic8s% c!c8 tic8s% c!c8
$eri!&s% c!c8s% c+cesC. 5esigners refer to the length of a c!c8 $eri!& $oth as the time for a
complete clock cycle Be.g.% <4; picoseconds% or <4; psC and as the clock rate Be.g.% 6 gigahert% or
6 /@C% which is the in!erse of the cloc& period.
CPU Per"!r#ance an& Its >act!rs
Users and designers oft en e"amine performance using different metrics. A simple formula
relates the most $asic metrics Bcloc& cycles and cloc& cycle timeC to CPU time7
This formula ma&es it clear that the hardware designer can impro!e performance $y reducing the
num$er of cloc& cycles re+uired for a program or the length of the cloc& cycle.
Instructi!n Per"!r#ance
The num$er of cloc& cycles re+uired for a program
The term c!c8 c+ces $er instructi!n% which is the a!erage num$er of cloc& cycles each
instruction ta&es to e"ecute% is oft en a$$re!iated as CPI. 'ince different instructions may ta&e
diff erent amounts of time depending on what they do% CPI is an a!erage of all the instructions
e"ecuted in the program. CPI pro!ides one way of comparing two diff erent implementations of
the same instruction set architecture% since the num$er of instructions e"ecuted for a program
will% of course% $e the same.
The performance of a program depends on the algorithm% the language% the compiler% the
architecture and the actual hardware. The following ta$le summaries how these components
affect the factors in the CPU performance e+uation.
A#&ah1s 2a3
Amdahl?s Law states that the performance impro!ement to $e gained from using some faster
mode of e"ecution is limited $y the fraction of the time the faster mode can $e used.
The P!3er 3a
Goth cloc& rate and power increased rapidly and grew together since they are correlated. Gattery
life can trump performance in the personal mo$ile de!ice% and the architects of warehouse scale
computers try to reduce the costs of powering and cooling 2;;%;;; ser!ers as the costs are high
at this scale. Hust as measuring time in seconds is a safer measure of program performance than a
rate li&e ,IP'% the energy metric )oules is a $etter measure than a power rate li&e watts% which is
)ust )oules9second.
The dominant technology for integrated circuits is called C,:' Bcomplementary metal o"ide
semiconductorC. >or C,:'% the primary source of energy consumption is so#called dynamic
energy(that is% energy that is consumed when transistors switch states from ; to 2 and !ice
!ersa. The dynamic energy depends on the capaciti!e loading of each transistor and the !oltage
>re+uency switched is a function of the cloc& rate. Th e capaciti!e load per transistor is a
function of $oth the num$er of transistors connected to an output Bcalled the fanoutC and the
technology% which determines the capacitance of $oth wires and transistors.
The switch from Uni$r!cess!rs t! Muti$r!cess!rs
-easons for switching from unicore processors to ,ulticore processors7
5ifficult to ma&e single#core cloc& fre+uencies e!en higher
5eeply pipelined circuits7
heat pro$lems
speed of light pro$lems
difficult design and !erification
large design teams necessary
ser!er farms need e"pensi!e air#conditioning
,any new applications are multithreaded
/eneral trend in computer architecture Bshift towards more parallelismC

You might also like