Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 46

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SUBJECT: Advanced Computer Architecture SUBJECT CODE: CS903

Prepared By :

V.GOMATHI, AP/CSE

Verified by: Approved by :

UNIT – II
Principles of Scalable Performance - Performance Metrics and Measures, Parallel Processing
Applications, Speedup Performance Laws, Scalability Analysis and Approaches. Processor and
Memory Hierarchy - Advanced Processor Technology, Superscalar and Vector Processors,
Memory Hierarchy Technology, Virtual Memory Technology.

1. SPEEDUP PERFORMANCE LAWS, SCALABILITY ANALYSIS AND


APPROACHES

• Speedup is the ratio of the execution time of the best possible serial algorithm on a single
processor T(1) to the parallel execution time of the chosen algorithm on n-processor parallel
system T(n):

S(n) = T(1)/T(n)

• Speedup measure the absolute merits of parallel algorithms with respect to the “optimal”
sequential version.

Amdahl’s Law

• b pure sequential mode

• 1 - b ~ a probability that the system operates in a fully parallel mode using n processors.
S = T(1)/T(n)

T(n) = T(1)b + T(1)(1- b )/n

S = 1/ b + (1- b )/n

= n/ bn + (1- b )

Efficiency

• The system efficiency for an n-processor system:

• Efficiency is a measure of the speedup achieved per processor.

S (n ) T (1)
E(n )= =
n nT (n )
Communication overhead

• tc is the communication overhead

• Speedup

n + (1-  )+ntc/T(1)
S=n/

• Efficiency

S (n ) 1
E(n )= =
n βn+(1−β )+nt c /T (1)
Parallelism Profile in Programs

• Degree of Parallelism For each time period, the number of processors used to execute a
program is defined as the degree of parallelism (DOP).

• The plot of the DOP as a function of time is called the parallelism profile of a given program.

• Fluctuation of the profile during an observation period depends on the algorithmic structure,
program optimization, resource utilization, and run-time conditions of a computer system.

Average Parallelism

• The average parallelism A is computed by:

m m

• where:
A=
( i =1
) (∑ )
∑ i⋅t i /
i =1
ti

– m is the maximum parallelism in a profile


m
– ti is the total amount of time that DOP = i ∑i=1 ti =t 2−t 1
Scalability of Parallel Algorithms

• Scalability analysis determines whether parallel processing of a given problem can offer the
desired improvement in performance.

• Parallel system is scalable if its efficiency can be kept fixed as the number of processors is
increased assuming that the problem size is also increased.

• Example: Adding m numbers using n processors. Communication and computation take one unit
time.

• Steps:

1. Each processor adds m/n numbers

2.The processors combine their sums


m
S=
m/n+2 log 2 n
Scalability Example

• Efficiency for different values of m and n

n 2 4 8 16 32

64 0.94 0.8 0.57 0.33 0.167

128 0.97 0.888 0.73 0.5 0.285

256 0.985 0.94 0.84 0.67 0.444

512 0.99 0.97 0.91 0.8 0.062

1024 0.995 0.985 0.995 0.89 0.76

2. MEMORY HIERARCHY
3. VIRTUAL MEMORY

Provides illusion of very large memory

– sum of the memory of many jobs greater than physical memory

– address space of each job larger than physical memory

 Allows available (fast and expensive) physical memory to be very well utilized
 Simplifies memory management (main reason today)
 Exploits memory hierarchy to keep average access time low.
 Involves at least two storage levels: main and secondary
 Virtual Address -- address used by the programmer
 Virtual Address Space -- collection of such addresses
 Memory Address -- address of word in physical memory also known as “physical address” or
“real address”

Basic Issues in VM System Design

 size of information blocks that are transferred from secondary to main storage
 block of information brought into M, and M is full, then some region of M must be released to
make room for the new block -->

replacement policy

 which region of M is to hold the new block --> placement policy


 missing item fetched from secondary memory only on the occurrence

of a fault --> fetch/load policy

frame
Paging Organization

virtual and physical address space partitioned into blocks of equal size

actually, concatenation
is more likely
Page Table address Mapping
Address Mapping Algorithm

 Protection Fault: access rights violation; causes trap to hardware, microcode, or software fault
handler
 Page Fault: page not resident in physical memory, also causes a trap usually accompanied by a
context switch: current process
 suspended while page is fetched from secondary storage
 Fragmentation & Relocation

Fragmentation is when areas of memory space become unavailable for

External Fragmentation: Space left between blocks.


Optimal Page Size
-- memories get larger as the price of RAM drops
-- the gap between processor speed and disk speed grow wide
-- programmers desire larger virtual address spaces
Fragmentation

Page Table
Relocation: move program or data to a new region of the address

space (possibly fixing all the pointers)

Page Replacement Algorithms

Least Recently Used:

-- selects the least recently used page for replacement

-- requires knowledge about past references, more difficult to implement (thread thru page table
entries from most recently referenced to least recently referenced; when a page is referenced it is
placed at the head of the list; the end of the list is the page to replace)

-- good performance, recognizes principle of locality

Not Recently Used:

Associated with each page is a reference flag such that

ref flag = 1 if the page has been referenced in recent past

= 0 otherwise

-- if replacement is necessary, choose any page frame such that its

reference bit is 0. This is a page that has not been referenced in the recent past

-- clock implementation of NRU:


An optimization is to search for the a page that is both
not recently referenced AND not dirty.
Demand Paging and Prefetching Pages

Fetch Policy

when is the page brought into memory?

if pages are loaded solely in response to page faults, then the

policy is demand paging

An alternative is prefetching:

anticipate future references and load such pages before their actual use

+ reduces page transfer overhead

- removes pages already in page frames, which could adversely affect the page fault rate

- predicting future references usually difficult

Most systems implement demand paging without prepaging

Virtual Address and a Cache

data
It takes an extra memory access to translate VA to PA

This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as
possible

synonym problem: two different virtual addresses map to same

physical address => two different cache entries holding data for the same physical address!
for update: must update all cache entries with same physical address or memory becomes
inconsistent

determining this requires significant hardware, essentially an associative lookup on the physical
address tags to see if you have multiple hits

TLBs (Translation Look aside Buffer)

A way to speed up translation is to use a special cache of recently

used page table entries -- this has many names, but the most

frequently used is Translation Lookaside Buffer or TLB

TLB access time comparable to, though shorter than, cache ac


(still much less than main memory access time)
TLB: acts as a specialized cache for address translation

Translation Look-Aside Buffers


TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This
permits fully associative lookup on these machines. Most mid-range machines use small

n-way set associative organizations.

Translation
with a TLB

4. PERFORMANCE METRICS AND SCALABILITY ANALYSIS

Performance Testing

 System process speed (Max./Min./Average)

o system processes, tasks, transactions, responses.

o data retrieval, data loading

 System throughput (Max./Min./Average)

o loads, messages, tasks, processes

 System latency (Max./Min./Average)

o Message/event, task/process

 System utilization (Max./Min./Average)

o Network, server/client machines.

 System availability (component-level/system-level)

o component/system, services/functions

o system network, computer hardware/software

 System reliability (component-level/system-level)

o component/system, services/functions

o system network, computer hardware/software


 System scalability (component-level/system-level)

o load/speed/throughput boundary

o improvements on process speed, throughput

 System successes/failures rates for

o communications, transactions, connections

o call processing, recovery, …

 Domain-specific/application-specific

o agent performance

o real-time report generation speed

workflow performance

Performance Test - Tools

Performance test tools can be classified into:

 Simulators and data generators:

o Message-based or table-based simulators

o State-based simulators

o Model-based data generators, such as

o Pattern-based data generators

o Random data generators

 Performance data collectors and tracking tools

o Performance tracking tools

 Performance evaluation and analysis tool

o Performance metric computation


o Model-based performance evaluation tool

 Performance monitors

o For example, sniffer, Microsoft performance monitor

o External third-party tools

Performance report generators

Performance Evaluation

What is performance evaluation?

 Using a well-defined approach to study, analyze, and measure the performance of a given
system.

The basic tasks and scope:

 Collect system performance data

 Define system performance metrics

 Model system performance

 Measure, analyze, estimate system performance

 Present and report system performance

Performance Evaluation

– Objectives and Needs

The major objectives:

 Understand product capacity

 Discover system performance issues

 Measure and evaluate system performance

 Estimate and predict system performance

The basic needs are:


 Well-defined performance metrics

 Well-defined performance evaluation models

 Performance evaluation tools and supporting environment

Performance Evaluation - Approaches

 Performance testing: (during production)

o measure and analyze the system performance based on performance test data and
results

o Performance simulation: (pre-production)

o study and estimate system performance using a simulation approach

o Performance measurement at the customer site:

(post-production)

o measure and evaluation system performance during system operations

Performance Evaluation - Models

What is a performance evaluation model?

 A well-defined formal model which depicts different prospects of system performance of a


system.

Why do we need performance evaluation models?

 To present the system performance properties

 To provide a guideline for engineers to find the strategy on performance evaluation.

 To set up a foundation to define performance metrics.

 To identify the needs of the target performance environment.

Performance Evaluation - Models

Type of performance evaluation models:


 Queuing model

 Scenario-based evaluation model

 Architecture-based evaluation model

o Component-based evaluation model

 Process-based evaluation model

 Transaction-based evaluation model

 Transition-based models

o State-based evaluation model

o Domain-oriented evaluation model

Performance Evaluation - Models for Portal V5

Two performance evaluation models are defined:

 Event-Based Function Scenario Model for System Performance Evaluation

 Event-Based Transition Model for System Performance Evaluation

Portal V5 systems can be viewed as a component-based distributed system over the Internet. It accepts,
processes,and supports the following types of event messages.

 System-caller interaction event messages

 System-agent interaction event messages

Call-oriented event interactions between components

Event-Based Function Scenario Model

Model definition: An event-based function scenario model is an event-based scenario diagram with
tracked time stamps.

 We use G(N, E, T, O) to represent a functional scenario diagram.

• N is a set of component nodes.


• E represents a set of direct links. Each link e is a direct edge (Cr, Cd), which represents an
event that is sent by component Cr and received by the component Cd.

• T is a set of tracked time stamps for links. Each link has a pair of time stamps (ti, to),
where ti indicates the incoming time stamp of event e in Cr, and where to indicates the
outgoing time stamp of event Ep in Cd.

• O is a set of pairs (Ep, order), each of them represents the order of for an event occurred
in the scenario.

An event-based functional scenario Sq is a sequence of interaction events between components in a


system. Sq can be represented as a path in a functional scenario diagram. Thus, Sq can be denoted as Ei1,
…., Eim, where Ei1 is the starting event of Sq, and Eim is the ending event of Sq.

Event-Based Transition Model: An Example

Transition

Performance Evaluation - Metrics

 Performance metrics

o Call request process time


o Event interaction latency

 Throughput metrics (component and system level)

o Call processing throughput

o Call load throughput rate

 Availability metrics (component and system level)

 Reliability metrics (component and system level)

 Scalability metrics

 Utilization metrics

Common used performance metrics: (for components/systems)

 Functional or process speed metrics

 User response time metric

 Communication speed metric

 Transaction speed metric

 Latency metric

Performance metrics for Portal V.5 products:

 Call process time metric

 Call process speed metric

 Event latency metric

Performance Metric – Call Process Time

Outgoing
Event
Component Process Time Metric for call requests Req(media type):
Req(media type) = { req1, …., reqn }

Max-Process-TimeSq (Ck,, Req(media type))

= Maxi { Process-TimeSq (Ck,, reqi) } (i = 1,…n)

Min-Process-TimeSq (Ck,, Req(media type))

= Mini { Process-TimeSq (Ck,, reqi) } (i = 1,…n)

Avg-Process-TimeSq (Ck,, Req(media type))

= [Si Process-TimeSq (Ck,, reqi)] / n (i = 1,…n)

Portal V5

Agent
simulator

System Call Process Time Metric for a specific media load.

Req(media type) = { req1, …., reqn }

Max-Process-Time (S,, Req(media type))

= Maxi { Process-Time (S,, reqi) } (i = 1,…n)

Min-Process-Time(S,, Req(media type))

= Mini { Process-Time (S,, reqi) } (i = 1,…n)


Avg-Process-Time (S,, Req(media type))

= [Si Process-Time (S,, reqi)] / n (i = 1,…n)

Performance Metric – Latency

Latency metrics are used to measure the delays of:

- Messages, transactions, tasks, …..

We can define an event message latency metric for Portal V.5

to measure delay of event messages between components.

Event-LatencySq (Ei, reqj)

= Ej’s incoming timestamp – Ej’s outgoing timestamp = Ti - To

We can compute the Max, Min, and Average of Latency.

Outgoing
Event
Throughput Metrics

Objective: To measure the call processing capacity of a system.

The four types of throughput metrics are defined:

 System processing throughput for a special type of media requests

 System processing throughput for all types of media requests

 System-load throughput rate for a special type of media requests

 System-load throughput rate for all types of media requests


Concerns:
Maximum of the
throughput
Minimum of the
throughput
Average of the throughput
Throughput Metric - Component Process Throughput

Definition: In a given functional scenario, the component-process throughput for a special type of media
requests Req(media-type) in a given performance test period Tp refers to the total number of its
outgoing events which are generated by processing its incoming events for these media requests.

System-Process-Throughput Tp
(Req(media type))
= total number of outgoing Ek
Definition: In a given functional scenario, the component-process throughput rate for a special type of
media requests Req(media-type) in a given performance test period Tp refers to the ratio of the total
number of its outgoing events (which are generated by processing its incoming events for these media
requests) to the total number of its incoming events.

System-Process-Throughput-Rate Tp
(Req(media type))
= total number of outgoing Ek+1 / total
number of incoming Ek

Availability Metrics

Availability metrics are defined to evaluate the availability of components and systems in providing their
specified functions and services to system users and customers.
Several availability metrics are defined here:

 Component-level

 Component Availability Metrics

 Component Service Availability Metrics

 System-level

 System Availability Metric

System-Service Availability Metric

Definition: The component availability of a component Ck in a system during a time period T refers to the
ratio of the total available time of the component to the total time T, including both available and
unavailable time.

Component Availability Metric:

Component-Availability T (Ck)

= available-time(Ck) / (available-time(Ck) +unavailable-time(Ck))

= available-time(Ck) / |T|

 Where available-time(Ck) represents the available time of Ck during Tp, and unavailable-time(Ck) includes
the time when the component can not provide the specified functions and services.

Application: to measure the availability of components.

Availability Metrics - System Availability

Definition: For a system S, its system availability during a given test period Tp refers to the ratio of its
total available time to provide all specified function sets for its users in all given media types to the total
test time |Tp|.

Let use System-Availability Tp (S) to denote the system availability for system S. S = {C1,…., Cm} consists of
M components. To help engineers evaluate system availability, we define the following metric by
considering the single failure criterion:

System-Availability Tp (S)
= available-time(S) / (available-time(S) +unavailable-time(S))

= available-time(S) / |Tp|

available-time(S) = Min k { available-time(Ck) } k = 1,..M.

Where uptime(S) refers to the system uptime in Tp, and downtime(S) represents the system downtime in
TP.

System Service Availability Model

Event-Based Transition Path:


Scalability Metrics

The major applications of scalability metrics:

 To check the boundary and threshold of a system and its components.

 To check the speed-up of a system and its components when more hardware machines and
application servers are added into a system.

 To evaluate the throughput improvement of a system (or its components) when more hardware
machines and application servers are added into a system.

The scalability metrics defined here:

 Speed-up

 Throughput Improvement

Load Boundary and Threshold

Definition: The throughput improvement for a system (or a component cluster) for a fixed load Req(S) in
a given time Tp, denoted as Throughput-Improvement(A,B) refers to the throughput rate increase from
configuration setting A to configuration setting B. The detailed metric can be defined below:
Throughput-Improvement (A,B)

= (system load throughput under B

– system load throughput under A)/|Req(S)|

Application: to measure the improvement of system throughput

Speed-Up

Definition: The process speed-up for a system (or a component cluster) under a fixed load Req(S) in a
given time Tp, denoted as Speed-up(A,B), refers to the process speed increase from configuration setting
A to configuration setting B. The detailed metric can be defined below:

Speed-up (A,B)

= (process speed under B – process speed under A)/ process speed under A

Application: to measure the improvement of the system process speed


5. SUPERSCALAR AND VECTOR PROCESSORS

Pipelining is a method to realize, overlapped parallelism in the proposed solution of a problem, on a


digital computer in an economical way. To understand the concept of pipelining, we need to understand
first the concept of assembly lines in an automated production plant where items are assembled from
separate parts (stages) and output of one stage becomes the input to another stage. Taking the analogy of
assembly lines, pipelining is the method to introduce temporal parallelism in computer operations.
Assembly line is the pipeline and the separate parts of the assembly line are different stages through
which operands of an operation are passed.
To introduce pipelining in a processor P, the following steps must be followed:
• Sub-divide the input process into a sequence of subtasks. These subtasks will make stages of pipeline,
which are also known as segments.
• Each stage Si of the pipeline according to the subtask will perform some operation on a distinct set of
operands.
• When stage Si has completed its operation, results are passed to the next stage S i+1 for the next
operation.
• The stage Si receives a new set of input from previous stage Si-1 .
6.
7.
8.

You might also like