EE282 Final Exam: Solutions

1
LL282: CompuLer SysLems ArchlLecLure Sprlng 2010

SLanford unlverslLy !une 9
Lh
, 2010
LL282 llnal Lxam SoluLlons
!"#$ &'()*+,)-.'(/ Answer each of Lhe quesLlons lncluded ln Lhe exam. WrlLe all of your answers dlrecLly on Lhe examlnaLlon
paper, lncludlng any work LhaL you wlsh Lo be consldered for parLlal credlL. 1he examlnaLlon ls closed book, buL you can make
use of one page of noLes and a calculaLor. ?ou may noL use a compuLer or browser of any klnd.

0' 12+#)-.'(/ Wherever posslble, make sure Lo flrsL wrlLe Lhe equaLlon wlLh symbollc Lerms, Lhen Lhe equaLlon rewrlLLen wlLh
Lhe numerlcal values, and Lhen Lhe flnal soluLlon. arLlal credlL wlll be welghLed approprlaLely for each componenL of Lhe
problem, and provldlng more lnformaLlon lmproves Lhe llkellhood LhaL parLlal credlL can be awarded.

0' 3*-)-'4 ,.51/ unless oLherwlse sLaLed, for any answers LhaL requlre code examples or fragmenLs, you should wrlLe C-llke
pseudocode. ?ou do noL need Lo opLlmlze your code unless speclflcally lnsLrucLed Lo do so. CommenLs for any code are noL
sLrlcLly requlred on Lhe exam, buL are hlghly recommended. 1hey may help you recelve parLlal credlL on a problem, lf Lhey help
us deLermlne whaL you were Lrylng Lo do.

0' )-$1/ ?ou wlll have )6*11 6.+*( 789: $-'+)1(; Lo compleLe Lhls exam. 8udgeL your Llme and Lry Lo leave
some aL Lhe end Lo go over your work. 1he polnL welghLlngs correspond roughly Lhe dlfflculLy of each problem. lf you flnd a
problem Loo dlfflculL aL flrsL, move on Lo Lhe oLher problems and revlslL lL laLer.

name (prlnL) ___________________________________________________________________

Leland username ___________________________________ SCu SLudenL (?/n) ___________

<=! ><?@A0BC D@&E!B>&<F =0@0B G0C!
1he Ponor Code ls an underLaklng of Lhe sLudenLs, lndlvldually and collecLlvely:
(1) LhaL Lhey wlll noL glve or recelve ald ln examlnaLlons, LhaL Lhey wlll noL glve or recelve unpermlLLed ald ln class
work, ln Lhe preparaLlon of reporLs, or ln any oLher work LhaL ls Lo be used by Lhe lnsLrucLor as Lhe basls of
gradlng,
(2) LhaL Lhey wlll do Lhelr share and Lake an acLlve parL ln seelng Lo lL LhaL oLhers as well as Lhemselves uphold Lhe
splrlL and leLLer of Lhe Ponor Code.
l acknowledge and accepL Lhe Ponor Code.

name (slgn) __________________________________________________________

Score Crader
roblem 1 27 ________ ______
roblem 2 40 ________ ______
roblem 3 33 ________ ______

1oLal (100) HHHHHH

2
I*.JK1$ 8/ <61 <*+)6 3-KK >1) F.+ A*11 LMN O.-')(P

lndlcaLe lf Lhe followlng sLaLemenLs are 1rue or lalse. rovlde a slngle senLence [usLlflcaLlon for
your answer. Answers wlLhouL [usLlflcaLlon wlll recelve no credlL. [3 polnLs per sLaLemenL]

a) Assume LhaL you are dolng a uMA Lransfer from memory Lo an l/C devlce. Also assume LhaL,
ln addlLlon Lo readlng u8AM you also need Lo read Lhe processor caches LhaL may be cachlng
addresses lnvolved ln Lhe u8AM Lransfer. lf an L2 cache read produces a hlL, Lhen lL musL be
forward Lo Lhe L1 cache for an addlLlonal lookup aL LhaL level.
l - lL needs Lo be forward Lo Lhe L1 cache only lf Lhe cache llne ls dlrLy (or suspecLed Lo be dlrLy)

b) SofLware prefeLchlng can only be lmplemenLed lf Lhe hardware lmplemenLs non-blocklng
caches.
1 - oLherwlse Lhe processor would lmmedlaLely sLall on Lhe prefeLch lnsLrucLlon

c) 1he memory allocaLlon pollcles used by sofLware can affecL Lhe power consumpLlon of Lhe
sysLem.
1 - lL can affecL Lhe number of u8AM banks/ranks/ulMMs/channels LhaL are acLlve ln order Lo
serve a program (as opposed Lo belng ln sLandby or low power modes)

d) 1he mosL lmporLanL meLrlc when bulldlng a daLa cenLer ls low energy consumpLlon.
l - 1oLal cosL of ownershlp (1CC) ls Lhe mosL lmporLanL one

e) When scallng down Lhe volLage and clock frequency of a processor, Lhe rlghL order ls Lo flrsL
reduce Lhe power supply volLage and Lhen reduce Lhe clock frequency.
l - you flrsL need Lo reduce Lhe frequency as elecLronlcs work slower aL lower volLages

f) uslng vlrLually-addressed caches ln a processor leads Lo lower energy consumpLlon compared
Lo a processor wlLh physlcally-addressed caches.
1 - poLenLlally yes because you can sklp LranslaLlon for accesses LhaL hlL ln Lhe L1

g) lor 8Alu-3, Lhe acLual number of dlsk accesses necessary Lo wrlLe a slngle byLe ls 2.
l - lL's acLually 4 (read old value/parlLy, wrlLe new value/parlLy)

h) 8Alu-1 lmproves Lhe performance of read accesses.
1 - you can send a read access Lo elLher dlsk

l) ln a vlrLual machlne envlronmenL, l/C lnLerrupLs are flrsL processed by Lhe vlrLual machlne
monlLor and Lhen by Lhe lnLerrupL handler of Lhe guesL CS.
1 - Lhe oLher way around ls unsafe as an l/C lnLerrupL may acLually be for anoLher guesL

3
I*.JK1$ M/ &' )61.*QR O*#,)-,1 #'5 )61.*Q #*1 )61 (#$1 LS: O.-')(P

rovlde shorL answers Lo Lhe followlng quesLlons. 1yplcally, a few senLences or a shorL bulleLed
llsL wlll be sufflclenL. A long explanaLlon ls llkely Lo lnclude some lncorrecL sLaLemenL, so keep lL
shorL and Lo Lhe polnL.

a) rovlde one speclflc example for each of Lhe followlng Lop-10" approaches for lmprovlng
energy efflclency ln compuLer sysLems. Lach example should be no longer Lhan one senLence.
1he example can be from any class of sysLems (noLebooks, smarLphones, daLacenLers) and can
be a sysLem-level, chlp-level, or sofLware Lechnlque. [10 polnLs]

use energy-efflclenL Lechnologles: use flash lnsLead of dlsks

MaLch power Lo work: dynamlc volLage-frequency scallng when load ls low

MaLch work Lo power: reduce Lhe frame raLe for vldeo playback when low on baLLery

lggy back energy evenLs: lnLerrupL coalesclng Lo amorLlze overheads

Speclal-purpose soluLlons: use Cus, uSs, or oLher speclal funcLlon unlLs

Cross-layer efflclency: workload consolldaLlon and scale-down ln daLacenLer

1radeoff some oLher meLrlc: sLore 2 lnsLead of 3 coples of daLa ln a daLa cenLer

1radeoff Lhe uncommon case: provlslon for a lower performance load Lo avold excesslve energy
cosLs ln power supply or coollng

Spend somebody else's power: cllenL sends compuLaLlon Lo server (assumlng communlcaLlon
cosL ls lower Lhan compuLaLlon cosL)

Spend power Lo save power: compress daLa Lo be able Lo Lurn off some memory/dlsk
componenLs

4
b) rocessor vendors are uslng Lhe exponenLlally lncreaslng LranslsLor budgeLs Lo lnclude
mulLlple cores per chlp. Lven lf we assume LhaL we have a large number of lndependenL
programs or Lasks Lo run ln parallel on Lhe cores, whaL may be Lwo facLors LhaL llmlL Lhe
usefulness of Lhe a mulLl-core chlp? [4 polnLs - 2 polnLs each]

- ower consumpLlon and power denslLy: you may noL be able Lo provlde power or remove
heaL lf all Lhe processors ln Lhe chlp are worklng concurrenLly. lor example, Lhe heaL removal
capablllLles are proporLlonal Lo Lhe area of Lhe chlp so Lhey remalned flxed as we puL an
lncreaslng number of processors ln Lhe same space.

- Memory & l/C bandwldLh: Lhe collecLlve memory and l/C bandwldLh of all Lhe appllcaLlons
may exceed Lhe bandwldLh avallable for off-chlp communlcaLlon. 1he off-chlp bandwldLh
depends on Lhe number of plns of Lhe chlp, whlch ln Lurn depends on Lhe area of Lhe chlp.
Pence, Lhe bandwldLh does noL scale wlLh Lhe number of processor we squeeze ln one chlp.

c) CerLaln companles propose LhaL we should operaLe daLa-cenLers wlLhouL acLlve alr-
condlLlonlng (aka alr-slde economlzaLlon). 1hls lmplles LhaL Lhe servers ln Lhe daLa cenLer wlll
be operaLlng aL a hlgher LemperaLure. WhaL ls Lhe Lradeoff you should Lo sLudy Lo evaluaLe lf
Lhls ls a good ldea? [4 polnLs]

lL ls an lssue of balanclng cosLs. Cn Lhe one hand you have Lhe cosL of buylng machlnes. ln
classlcal daLa-cenLers wlLh alr condlLlonlng, you pay for new machlnes every 3 years. WlLhouL
alr condlLlonlng lL wlll be more ofLen. lf LhaL exLra cosL/year ls less Lhan whaL you save from noL
paylng for alr condlLlonlng (equlpmenL, energy, eLc), Lhen lL's a good ldea.

2 Lo reallze lL's a cosL lssue, 2 Lo explaln a Lradeoff beLween cosL of PW replacemenL and cosL of
coollng.

5
d) Messages ln lnLerconnecL neLworks Lyplcally use error deLecLlng buL noL error correcLlng
codes (as lL ls Lhe case ln memorles and dlsks). uescrlbe brlefly how you can provlde error
correcLlon ln neLworks wlLhouL Lhe use of error correcLlng codes. WhaL are Lhe advanLages of
your proposal over [usL uslng error-correcLlng codes for Lhe conLenLs of each message? WhaL
are Lhe lmplemenLaLlon requlremenLs of your proposal? [6 polnLs]

?ou can use reLransmlsslon Lo do error correcLlon. Cnce a message ls declded Lo be lncorrecL or
losL, Lhen we can reLransmlL lL. 8eLransmlsslon requlres bufferlng of messages aL Lhe sender,
reorderlng capablllLles ln Lhe recelver, an acknowledgemenL proLocol, and a LlmeouL
mechanlsm Lo deLecL losL messages. 1he advanLages are:
- lL works even lf you geL many errors ln one message (more Lhan whaL a cosL-effecLlve
error deLecLlng code can supporL)
- lL works even lf Lhe whole message ls losL

2 polnLs Lo menLlon reLransmlsslon, 2 polnLs Lo explaln a llLLle how lL works/requlremenLs
1 polnLs for each advanLage

e) A sysLem can recover from errors by Laklng perlodlcal checkpolnLs of lLs sLaLe and reverLlng
Lo one of Lhem when an error ls deLecLed. LlsL Lhe facLors you would conslder Lo selecL Lhe
frequency of checkpolnLlng Lhe sysLem sLaLe and Lhe number of acLlve checkpolnLs malnLalned
by Lhe sysLem. [6 polnLs]

- 1he laLency of error deLecLlon
- 1he sLorage overhead (for boLh a slngle checkpolnL and all Lhe checkpolnLs)
- 1he Llme Lo resLore one or more checkpolnLs for recovery.
6 polnLs, 2 for each polnL
6
f) Assume an l/C sysLem wlLh a uMA conLroller LhaL can supporL mulLlple, concurrenLly acLlve,
uMA requesLs. Slnce all uMA requesLs go over Lhe same memory bus, some arblLraLlon
mechanlsm ls necessary. LlsL Lhe facLors LhaL Lhe uMA conLroller could Lake lnLo accounL ln
arblLraLlng beLween Lhe requesLs and why Lhey are lmporLanL. noLe: you should noL explaln a
speclflc arblLraLlon pollcy, buL Lhe facLors LhaL could be Laken lnLo accounL ln varlous pollcles. [4
polnLs]

- Channel sLaLus: some uMA Lransfers may be blocked. L.g., one channel may be movlng daLa
from memory Lo dlsk. Slnce dlsks are slow, LhaL uMA channel wlll be blocked qulLe ofLen.
- LocallLy/CranularlLy: due Lo locallLy effecLs, lL may be fasLer Lo group requesLs from one
channel and execuLe Lhem back-Lo-back raLher Lhan swlLch beLween channels afLer every
requesL.
- SofLware-deflned prlorlLles
- lalrness (lf Lhere are no prlorlLles)

g) A processor uses 32-blL physlcal addresses and 32-blL vlrLual addresses wlLh 1-k8yLe pages.
1he processor's 1L8 has 128 enLrles and ls 4-way seL assoclaLlve. WhaL ls Lhe sLorage as ln Lhe
number of S8AM blLs (or kblLs = 1024 blLs) requlred Lo lmplemenL Lhe 1L8? Assume LhaL each
enLry lncludes Lhree permlsslon blLs (8, W, x) and LhaL replacemenL uses a randomlzed
algorlLhm. [6 polnLs]

1he 32-blL vlrLual address lncludes Lwo flelds: 10 blL page offseL and 22 blL vlrLual page number.

1he 22-blL vn ls Lhe address for Lhe 1L8. 1he 1L8 has 128 enLrles organlzed ln 4 ways of 32
enLrles each. So, we need Lo exLracL a 3-blL lndex from Lhe vn ln order Lo selecL one of Lhese
32-enLrlers. 1he remalnlng 22-3=17blLs of Lhe vn wlll be Lhe Lag for Lhe 1L8.

So, each 1L8 enLry has a valld blL, 17blLs of 1ag, 22 blLs of physlcal page number (n - Lhe
LranslaLlon resulL), and 3 permlsslon blLs. 1oLal 43 blLs. 1here ls no need for L8u blLs
(randomlzed replacemenL).

So Lhe LoLal cosL of Lhe 1L8 ls 128 enLrles * 43 blLs = 3304 blLs = 3.373kblLs

-1 polnL for forgeLLlng Lhe valld blL
-1 polnL for forgeLLlng Lhe permlsslon blLs
-1 for addlng oLher random blLs Lo Lhe enLry
-1 lf you geL Lhe n lengLh wrong, -2 lf you forgeL lL compleLely
-2 for a 22b Lag (-1 lf Lhe calculaLlon of a 17b Lag fleld ls wrong)
7
I*.JK1$ T/ U+) -' O*#,)-,1R )61Q #*1 5-VV1*1') LTT O.-')(P

a) Assume Lhe followlng C code LhaL scans a llnked-llsL daLa-sLrucLure.

current = head; // start from the head of the linked list
while (current!=NULL) { // while list is not empty
process (current->element); // do some work on the current element
current = current->next; // go to the next element
}

Clven a sysLem wlLh Lwo processors LhaL share Lhe same flrsL-level daLa cache, how would
you prefeLch Lhe llnked-llsL daLa for Lhe above Lraversal? under whaL condlLlons would Lhe
prefeLchlng scheme be successful? [10 polnLs]

A slmple prefeLchlng scheme ls Lo have Lhe second processor execuLe a slmpllfled" verslon of
Lhe loop LhaL does no work buL prefeLches elemenLs for Lhe flrsL processor.

1he prefeLch loop wlll look llke:
current = head; // start from the head of the linked list
while (current!=NULL) { // while list is not empty
fetch(current->element); // no work, just fetch in cache
current = current->next; // go to the next element
}

1hls approach wlll work well lf Lhe process() funcLlon ln Lhe flrsL processor lncludes slgnlflcanL
amounL of work Lo hlde Lhe memory laLency of Lhe mlss for each elemenL. 1hls allows Lhe 2nd
processor Lo be a few elemenLs ahead of Lhe flrsL one.

We should also noLe LhaL lf Lhe laLency of process() ls much hlgher Lhan LhaL of Lhe mlss, Lhe
2nd processor may run Loo far ahead causlng desLrucLlve lnLerference ln Lhe cache. lL probably
makes sense Lo synchronlze Lhe Lwo processors perlodlcally.

6 polnLs for descrlblng a scheme LhaL seems Lo work
4 polnLs for dlscusslng Lhe plus/mlnus
8
b) Assume you are deslgnlng a new, large-scale daLacenLer wlLh 100,000 servers. ?our goal ls Lo
operaLe Lhe daLacenLer wlLh a slngle Lechnlclan responslble for hardware repalrs. Lach server
repalr Lakes 1 hour and cosLs $130 ln labor and 10 of Lhe server's cosL for replacemenL parLs.
Assume a full-Llme Lechnlclan works 40 hours a week 48 weeks ouL of Lhe year. WhaL ls Lhe
maxlmum annual fallure raLe LhaL you can LoleraLe for Lhe servers? [4 polnLs]

Assume LhaL Lhe fallure raLe ls x. We wanL Lhe LoLal Llme needed Lo repalr servers Lo be less
Lhan Lhe Llme a full-Llme Lechnlclan can work ln a year.
8epalr Llme < Lechnlclan's Llme !
x*1h*100,000servers < 48 weeks * 40 hours/week !
x <0.0192 or x<1.9 fallure raLe.

Assume LhaL you have Lwo cholces of servers for your daLacenLer. Server A cosLs $2,000 per
unlL and has an annual fallure raLe of 0.03. Server 8 cosLs $2,300 has an annual fallure raLe
0.013. Server 8 also provldes hlgher performance so 90,000 servers wlll be sufflclenL Lo Lhe
daLacenLer. Assumlng a 3-year llfeLlme for servers, whlch server Lype should you use for your
daLa cenLer? Show your work. [6 polnLs]

lor each Lype of server, Lhere are Lwo cosLs Lo conslder: caplLal expenses (cosL of buylng Lhe
servers) and operaLlonal expenses (cosL of repalrlng Lhe servers). 1he operaLlonal expenses
lnclude replacemenL parLs and repalr Llme

CosL A = $2k/server*100,000servers + 3 years*3*100,000*(1h*$130/hour+10*$2,000) =
$200M + $3.23M = $203.23M
CosL 8 = $2.3/server*90,000servers + 3 years*1.3*100,000*(1h*$130/hour+10*2,300) =
$223M + $1.620M = $226.620M

Cbvlously, Lhe caplLal expenses domlnaLe so server A ls Lhe rlghL way Lo go.
9
c) Conslder Lhe followlng graph LhaL explores Lhe laLency of sLrlded accesses on Lhe cache
hlerarchy of a well-known mlcroprocessor chlp. Clven Lhls graph, answer Lhe followlng
quesLlons. rovlde a 1 senLence [usLlflcaLlon for each answer. [13 polnLs]

WhaL ls Lhe L1 u-cache llne slze? 648

WhaL ls Lhe assoclaLlvlLy of Lhe L1 u-cache? 4-way

WhaL ls Lhe slze of Lhe L1 u-cache? 16k8

WhaL ls Lhe slze of Lhe L2 cache? 236k8

WhaL ls Lhe assoclaLlvlLy of Lhe L2 cache? 8-way

EE282 Final Exam: Solutions

Uploaded by

Copyright:

Available Formats

EE282 Final Exam: Solutions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EE282 Final Exam: Solutions

Uploaded by

Copyright:

Available Formats

1

LL282: CompuLer SysLems ArchlLecLure Sprlng 2010

You might also like