0% found this document useful (0 votes)

350 views

Parallelism in Computer Architecture

This document provides lecture notes on parallelism in computer architecture. It begins with an introduction to parallel processing and its advantages over serial processing. There are four main types of parallelism discussed: bit-level parallelism, instruction-level parallelism, task parallelism, and data-level parallelism. Architectural trends in parallel computing are also summarized, including the growth of bit-level parallelism through increasing processor word sizes and the dominance of instruction-level parallelism in the 1980s-1990s through pipelining. Flynn's taxonomy of parallel computer architectures is explained, categorizing systems as SISD, SIMD, MISD, or MIMD based on the number of concurrent instruction and data streams.

Uploaded by

Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

350 views

Parallelism in Computer Architecture

Uploaded by

Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Lecture notes for Parallelism in Computer Architecture

Subject : Computer Architecture

Prepared by:Dr.J.Vinothkumar

TableofContents PageNo.

S.No PageNo.
TableofContents
1 Aim & objective 2

2 Prerequisite 2

3 Pre Test- MCQtype 2

4 Parallelism theory 3

5 Post MCQ Test 26

6 Reference 27
Parallelism in Computer Architecture

1. Aim & objective:

To make students understand the basic structure and operation of digital
computer.
• Learn the concepts of parallel processing, pipelining.
• Understand the architecture and functionality of central processing
unit
• Discuss about different types of peripheral devices of computer
• Learn the different types of serial communication techniques.
• Explain different pipelining processes.
2. Prerequisite : Digital System Design, Microprocessors &
Microcontrollers

3. Pre Test- MCQtype

1. Execution of several activities at the same time.
a) Processing
b) parallel processing
c) serial processing
d) multitasking

Answer: parallel processing

2. A parallelism based on increasing processor word size.

a) Increasing
b) Count based
c) Bit based
d) Bit level

Answer: Bit level

3. The pipelining process is also called as ______

a) Superscalar operation
b) Assembly line operation
c) Von Neumann cycle
d) None of the mentioned

Answer: Assembly line operation

4. To increase the speed of memory access in pipelining, we make use of

_______
a) Special memory locations
b) Special purpose registers
c) Cache
d) Buffers

Answer: Cache

4. Parallelism

4.1 Introduction

Why Parallel Architecture?

 Parallel computer architecture adds a new dimension in the development of
computer system by using more and more number ofprocessors.
 In principle, performance achieved by utilizing large number of processors is
higher than the performance of a single processor at a given point oftime.

Parallel Processing
 Parallel processing can be described as a class of techniques which enables the
system to achieve simultaneous data-processing tasks to increase the
computational speed of a computersystem.
 A parallel processing system can carry out simultaneous data-processing to
achieve faster executiontime.
 For instance, while an instruction is being processed in the ALU component of the
CPU, the next instruction can be read frommemory.
 The primary purpose of parallel processing is to enhance the computer processing
capability and increase itsthroughput,
 A parallel processing system can be achieved by having a multiplicity of
functional units that perform identical or different operationssimultaneously.
 The data can be distributed among various multiple functionalunits.
 The following diagram shows one possible way of separating the execution unit
into eight functional units operating inparallel.
 The operation performed in each functional unit is indicated in each block if the
diagram:
 The adder and integer multiplier performs the arithmetic operation with integer
numbers.
 The floating-point operationsare separated into three circuits operating in parallel. 
 The logic, shift, and increment operations can be performed concurrently on
differentdata.
 All units are independent of each other, so one number can be shifted while
another number is beingincremented.
 Parallel computers can be roughly classified according to the level at which the
hardware supports parallelism, with multi-core and multi-processor computers
having multiple processing elements within a singlemachine.
 In some cases parallelism is transparent to the programmer, such as in bit-level or
instruction-levelparallelism.
 But explicitly parallel algorithms, particularly those that use concurrency, are more
difficult to write than sequential ones, because concurrency introduces several new
classesofpotentialsoftwarebugs,ofwhichraceconditionsarethemostcommon.
 Communication and synchronization between the different subtasks are typically
some of the greatest obstacles to getting optimal parallel programperformance.

Advantages of Parallel Computing over Serial Computing are as follows:

1. It saves time and money as many resources working together will reduce the time
and cut potentialcosts.
2. It can be impractical to solve larger problems on SerialComputing.
3. It can take advantage of non-local resources when the localresources are finite.
4. Serial Computing ‘wastes’ the potential computing power, thus Parallel
Computing makes better work ofhardware.

Types of Parallelism:
1. Bit-level parallelism: It is the form of parallel computing which is based on the
increasing processor’s size. It reduces the number of instructions that the system
must execute in order to perform a task on large-sized data.
Example: Consider a scenario where an 8-bit processor must compute the sum of
two 16-bit integers. It must first sum up the 8 lower-order bits, then add the 8
higher-order bits, thus requiring two instructions to perform the operation. A 16-
bit processor can perform the operation with just oneinstruction.
2. Instruction-level parallelism: A processor can only address less than one
instructionforeachclockcyclephase.Theseinstructionscanbere-orderedand
grouped which are later on executed concurrently without affecting the result of the
program. This is called instruction-levelparallelism.
3. Task Parallelism: Task parallelism employs the decomposition of a task into
subtasks and then allocating each of the subtasks for execution. The processors
perform execution of sub tasksconcurrently.
4. Data-level parallelism (DLP) – Instructions from a single stream operate
concurrently on several data – Limited by non-regular data manipulation
patterns and by memorybandwidth

Architectural Trends
 When multiple operations are executed in parallel, the number of cycles needed to
execute the program isreduced.
 However, resources are needed to support each of the concurrentactivities.
 Resources are also needed to allocate localstorage.
 The best performance is achieved by an intermediate action plan that uses
resources to utilize a degree of parallelism and a degree oflocality.
 Generally, the history of computer architecture has been divided into four
generations having following basic technologies−
 Vacuum tubes
 Transistors
 Integratedcircuits
 VLSI
 Till 1985, the duration was dominated by the growth in bit-levelparallelism.
 4-bit microprocessors followed by 8-bit, 16-bit, and soon.
 To reduce the number of cycles needed to perform a full 32-bit operation, the
widthofthedatapathwasdoubled.Lateron,64-bitoperationswereintroduced.
 The growth in instruction-level-parallelism dominated the mid-80s tomid-90s.
 The RISC approach showed that it was simple to pipeline the steps of instruction
processingsothatonanaverageaninstructionisexecutedinalmosteverycycle.
 Growth in compiler technology has made instruction pipelines more productive.
 In mid-80s, microprocessor-based computers consistedof
 An integer processingunit
 A floating-pointunit
 A cachecontroller
 SRAMs for the cachedata
 Tagstorage
 As chip capacity increased, all these components were merged into a singlechip.
 Thus, a single chip consisted of separate hardware for integer arithmetic, floating
point operations, memory operations and branchoperations.
 Other than pipelining individual instructions, it fetches multiple instructions at a
time and sends them in parallel to different functional units whenever possible.
This type of instruction level parallelism is called superscalarexecution.

FLYNN‘S CLASSIFICATION
 Flynn's taxonomy is a specific classification of parallel computer architectures that
are based on the number of concurrent instruction (single or multiple) and data
streams (single or multiple) available in thearchitecture.
 The four categories in Flynn's taxonomy are thefollowing:
1. (SISD) single instruction, singledata
2. (SIMD) single instruction, multipledata
3. (MISD) multiple instruction, singledata
4. (MIMD) multiple instruction, multipledata
 Instruction stream: is the sequence of instructions asexecuted by themachine
 Data Stream is a sequence of data including input, or partial or temporary result,
called by the instructionStream.
 Instructions are decoded by the control unit and then ctrl unit send the instructions
to the processing units for execution.•
 Data Stream flows between the processors and memory bidirectionally.
SISD
An SISD computing system is a uniprocessor machine which is capable of executing a
single instruction, operating on a single datastream.

 In SISD, machine instructions are processed in a sequential manner and computers

adopting this model are popularly called sequentialcomputers.
sequentialcomputers.
 Most conventionalcomp
uters have SISD architecture. All the instructionsand
data to be processed have to be stored in primary memory.
 The speed of the processing element in the SISD model is limited(dependent) by
the rate at which the computer can transfer informationinternally.
 Dominant representative SISD systems are IBM PC,workstations.

SIMD
• An SIMD system is a multiprocessor machine capable of executing the same
instruction on all the CPUs but operating on different datastreams

 Machines based on an SIMD model are well suited to scientific computing since
they involve lots of vector and matrixoperations.
 So that theinformation can be passed to all the processing elements (PEs)
organized data elements of vectors can be divided into multiple sets(N-sets for N PE
systems) and each PE can process one dataset.
 Dominant representative SIMD systems is Cray’s vector processingmachine.

MISD
 An MISD computing system is a multiprocessor machinecapable of executing
different instructions on different PEs but all of them operating on the same
dataset .
 The system performs different operations on the same data set. Machines built
using the MISD model are not useful in most of the application, a few machines
are built, but none of them are availablecommercially.

MIMD
 An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple datasets.

 Each PE in the MIMD model has separate instruction and data strea m s; therefore
machines built usingthism odel are capable to any kind ofapplication.
 Unlike SIMD and MISD machines, PEs in MIMD mac h ines work
asynchronously.
dly categorized into
 MIMD machines arebroa
 shared-memoryMIMD and
 distributed-memoryMIMD
based on the way PEs are coupled to the main memory.

In the shared memory MIMD model (tightly coupled multiprocessor systems), all the
PEs are connected to a single global memory and they all have access to it. The
communication between PEs in this model takes place through the shared memory,
modification of the data stored in the global memory by one PE is visible to all other PEs.
Dominant representative shared memory MIMD systems are Silicon Graphics machines
and Sun/IBM’s SMP (SymmetricMulti-Processing).

In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all

PEs have a local memory. The communication between PEs in this model takes place
through the interconnection network (the inter process communication channel, or IPC).
The network connecting PEs can be configured to tree, mesh or in accordance with
therequirement.

VECTORARCHITECTURES
 A multithreaded CPU is not a parallel architecture, strictly speaking; multithreading
is obtained through a single CPU, but it allows a programmer to design and develop
applications as a set of programs that can virtually execute in parallel:
namely,threads.
 Multithreading is solution to avoid waiting clock cycles as the missing data is
fetched: making the CPU manage more peer-threads concurrently; if a thread gets
blocked, the CPU can execute instructions of another thread, thus keeping
functional unitsbusy.
 Each thread must have a private Program Counter and a set of private registers,
separate from otherthreads.
 In a traditional scalar processor, the basic data type is an n-bitword.
 The architecture often exposes a register file of words, and the instruction set is
composed of instructions that operate on individualwords.
 In a vector architecture, there is support of a vector datatype, where a vector is a
collection of VL n-bit words (VL is the vectorlength).
 There may also be a vector register file, which was a key innovation of the Cray
architecture.
 Previously, vector machines operated on vectors stored in mainmemory.
 Figures 1 and 2 illustrate the difference between vector and scalar data types, and
the operations that can be performed onthem.

 Vector load/store instructions provide the ability to do strided and scatter / gather
memory accesses, which take data elements distributed throughout memory and
pack them into sequential vectors/streams placed in vector/streamregisters.
 This promotes datalocality.
 It results in less data pollution, since only useful data is loaded from the memory
system.
 It provides latency tolerance because there can be many simultaneous outstanding
memoryaccesses.
 Vector instructions such as VLD and VST provide thiscapability.
HARDWARE MULTITHREADING
Multithreading
• A mechanism by which the instruction streams is divided into several smaller
streams
(threads) and can be executed in parallel is calledmultithreading.
Hardware Multithreading
• Increasing utilization of a processor by switching to another thread when one
thread is stalled is known as hardwaremultithreading.
Thread
• A thread includes the program counter, the register state, and the
stack. It isa lightweight process; whereas threads commonly share a single
address space, processesdon't.
Thread Switch
• The act of switching processor control from one thread to another within
the same process. It is much less costly than a processorswitch.
Process
• A process includes one or more threads, the address space, and the
operating system state. Hence, a process switch usually invokes the operating
system, but not a threadswitch.

Types of Multi-threading
1. Fine-grainedMultithreading
2. Coarse-grainedMultithreading
3. SimultaneousMultithreading
Coarse-grained Multithreading
A version of hardware multithreading that implies switching between threads only
after significant events, such as a last-level cachemiss.
• This change relieves the need to have thread switching be extremely fast and
is much less likely to slow down the execution of an individual thread, since
instructions from other threads will only be issued when a thread encounters
a costlystall.
Advantage
• To have very fast threadswitching.
• Doesn't slow downthread.
Disadvantage
• It is hard to overcome throughput losses from shorter stalls, due to pipeline
start -upcosts.
• Since CPU issues instructions from 1 thread, when a stall occurs, the pipeline
must beemptied.
• New thread must fill pipeline before instructions cancomplete.
• Due to this start-up overhead, coarse-grained multithreading is much more
useful for reducing the penalty of high-cost stalls, where pipeline refill is
negligible compared to the stalltime.

Fine-grained Multithreading
• A version of hardware multithreading that implies switching between threads
after every instruction resulting in interleaved execution of multiple threads. It
switches from one thread to another at each clockcycle.
• This interleaving is often done in a round-robin fashion, skipping any threads
that are stalled at that clockcycle.
To make fine-grained multithreading practical, the processor must be able to switch
threads on every clockcycle.
Advantage
• Vertical waste iseliminated.
• Pipeline hazards cannotarise.
• Zero switchingoverhead
• Ability to hide latency within a thread i.e., it can hide the throughput losses
that arise from both short and longstalls.
• Instructions from other threads can be executed when one threadstalls.
• High executionefficiency
• Potentially less complex than alternative high performanceprocessors.
Disadvantage
• Clock cycles are wasted if a thread has little operation toexecute.
• Needs a lot of threads toexecute.
• It is expensive than coarse-grainedmultithreading.
• It slows down the execution of the individual threads, since a thread that is
ready to execute without stalls will be delayed by instructions from other
threads.
Simultaneous multithreading (SMT)
• It is a variation on hardware multithreading that uses the resources of a
multiple-issue, dynamically scheduled pipelined processor to exploit thread-
level parallelism at the same time it exploitsinstruction level parallelism.
• The key insight that motivates SMT is that multiple-issue processors often
have more functional unit parallelism available than most single threads can
effectively use.
Since SMT relies on the existing dynamic mechanisms, it does not switch resources
every cycle.
• Instead, SMT is always executing instructions from multiple threads, to
associate instruction slots and renamed registers with their properthreads.
Advantage
• It is ability to boost utilization by dynamically scheduling functional
units among multiplethreads.
• It increases hardware designfacility.
• It produces better performance and add resources to a fine grainedmanner.
Disadvantage
It cannot improve performance if any of the shared resources are the limiting
bottlenecks for theperformance.

MULTICORE AND OTHER SHARED MEMORYMULTIPROCESSORS

Multiprocessor: A computer system with at least two processors
• Multicore: More than one processor available within a singlechip.

The conventional multiprocessor system used iscommonly referred as shared

memory multiprocessorsystem.
• Shared Memory Multiprocessor (SMP) is one that offers the
programmer a single is physical address space across all processors which case
nearly always the for multicorechips.
• Processors communicate throughshared variables in memory, with all
processors capable of accessing any memory location via loads andstores.
• Systems can still run independent jobs in their own virtual address spaces,
even if they all share a physical addressspace.
• Use of shared data must be coordinated via synchronization primitives
(locks) that allow access to data to only one processor at atime

Shared Memory Multiprocessor System.[Tightly coupled processor]

• The conventional multiprocessor system used iscommonly referred as shared
memory multiprocessorsystem.
• Single address space shared by all processors. Because every processor
communicates through a shared globalmemory.
• For high speed real time processing, these systems are preferable as their
throughput is high as compared to loosely coupledsystems
• In tightly coupled system organization, multiple processors share a
global main memory, which may have manymodules.
• Tightly coupled systems use a common bus, crossbar, or multistage
network to connect processors, peripherals, andmemories.
 Two common styles of implementing Shared Memory Multiprocessors (SMP) are,

Uniform memory access (UMA) multiprocessors

• In this model, main memory is uniformly shared by all
processors in multiprocessor systems and each processor has equal
access time to sharedmemory.
• Thismodelisusedfortime-sharingapplicationsinamulti userenvironment
• Tightly-coupled systems (high degree of resource sharing) suitable for general
purpose and time-sharing applications by multipleusers

Physical memory uniformly shared by all processors, with equal access time to
all words.
• Processors may have ocal cache memories. Peripherals also shared in some
fashion.
• UMA architecture models are of two20types,
Symmetric:
• All processors have equal access to allperipheral
devices. All processors are identical.
Asymmetric:
• One processor (master) executes the operating system other
processors may be of different types and may be dedicated to
specialtasks.
Non Uniform Memory Access (NUMA) multiprocessors
• In shared memory multiprocessor systems, local memories can be connected
with every processor. The collections of all local memories form the global
memory beingshared.
• In this way, global memory is distributed to all the processors. In this case, the
access to a local memory is uniform for its corresponding processor as itisattached
to the localmemory.
• But if one reference is to the local memory of some other remote processor,
then the access is notuniform.
• It depends on the location of the memory. Thus, all memory words are not
accessed uniformly. All local memories form a global address space accessible
by allprocessors
• Programming NUMAs are harder but NUMAs can scale to larger sizes and
have lower latency to localmemory
• Memory is common to all the processors. Processors easily communicate by
means of sharedvariables.
• These systems differ in how the memory and peripheral resources are
shared ordistributed
• The access time varies with the location of the memoryword.

Distributed Memory (NUMA)

• Cache Only Memory Architecture. The COMA model is a special case of the
NUMA
Here all the distributed memories are converted to cachememories.
• The local memories for the processor at each node are used as cache instead
ofactual

Distributed Memory [Loosely Coupled Systems]

• These systems do not share the global memory because shared memory
concept gives rise to the problem of memory conflicts, which in turn slows
down the execution ofinstructions.
• Therefore, to alleviate this problem, each processor in loosely coupled
systems is having a large local memory (LM), which is not shared by any
otherprocessor.
• Thus, such systems have multiple processors with their own local
memory and a set of I/Odevices.
• This set of processor, memory and I/O devices makes a computersystem.
 Therefore, these systems are also called multi-computersystems.
 These computer systems are connected together via message passing
interconnection network through which processes communicate by passing
messages to oneanother.
Since every computer system or node inmulticomputersystemshasa separate
memory, they are called distributed multicomputer systems. These are also called
loosely coupledsystems.

GPU (Graphics Processing Unit)

 A graphics processing unit (GPU) is a computer chip that performs rapid
mathematical calculations, primarily for the purpose of renderingimages.
 In the early days of computing, the central processing unit (CPU) performed these
calculations.
 As more graphics-intensive applications such as AutoCAD were developed,
however, their demands put strain on the CPU and degradedperformance.
 GPUs came about as a way to offload those tasks from CPUs and free up
processingpower.
 Today, graphics chips are being adapted to share the work of CPUs and train deep
neural networks for AIapplications.
 A GPU may be found integrated with a CPU on the same circuit, on a graphics
card or in the motherboard of a personal computer orserver.
 NVIDIA,AMD,IntelandARMaresomeofthemajorplayersintheGPUmarket.

GPU vs. CPU

 A GPU is able to render images more quickly than a CPU because of its parallel
processing architecture, which allows it to perform multiple calculations at the
sametime.
 A single CPU does not have this capability, although multicore processors can
perform calculations in parallel by combining more than one CPU onto the same
chip.
 A CPU also has ahigh er clock speed, meaning it can perform an individual
calculation faster than a GPU so it is often better equippedto handle basic
computing tasks.
 In general, a GPU is designed for data
data-parallelism
parallelism and applying the same operation
to multiple data-items(SIMD).
items(SIMD).
 A CPU is designed fortas
fortas-parallelism and doing different operations.

How a GPU works

 CPU and GPUarchitectur
GPUarchitectures are also differentiated by the number of cores.

 The core is essentially the processor within theprocessor.

 MostCPUshavebetweenfourandeightcores,thoughsomehaveupto32cores.
MostCPUshavebetweenfourandeightcores,thoughsomehaveupto32cores.
 Each core can process its own tasks, orthreads.
orthreads.
 Because some processors have multithreading capability -- inwhic h the core is
dividedvirtually,allowin g asinglecoretoprocesstwothreads--thenumberof
thenumberof
threads can be much higher than the number of cores.
 This can be useful in video editing andtranscoding.
andtranscoding.
 CPUs can run two threads (independent instructions) per core (the independent
processor unit). GPUs can have four to 10 threads percore.
percore.
 GPU computing is the use of a GPU (graphics processing unit) as a co-
processor to accelerate CPUs for general-purpose scientific and engineering
computing.
 The GPU accelerates applications running on the CPU by offloading some of
the compute-intensive and time consuming portions of thecode.
 The rest of the application still runs on the CPU. From a user's perspective, the
application runs faster because it's using the massively parallel processing
power of the GPU to boost performance. This is known as "heterogeneous" or
"hybrid"computing.
 A CPU consists of four to eight CPU cores, while the GPU consists of
hundreds of smallercores.
 Together, they operate to crunch through the data in theapplication.
 This massively parallel architecture is what gives the GPU its high compute
performance.
 There are a number of GPU-accelerated applications that provide an easy way
to access high-performance computing(HPC).

CLUSTER SYSTEM
 ClusteredsystemsaresimilartoparallelsystemsastheybothhavemultipleCPUs.
 However a major difference is that clustered systems are created by two or more
individual computer systems mergedtogether.
 Basically, they have independent computer systems with a common storage and
the systems worktogether.

A diagram to better illustrate this is:

The clustered systems are a combination of hardware clusters and software clusters. The
hardware clusters help in sharing of high performance disks between the systems. The
software clusters makes all the systems work together.

Each node in the clustered systems contains the cluster software. This software monitors the
cluster system and makes sure it is working as required. If any one of the nodes in the
clustered system fail, then the rest of the nodes take control of its storage and resources and
try torestart.

Types of Clustered Systems

• High performanceCluster
– 1000 nodes, high level parallelprocess
• Load BalancingCluster
– Balance the workloads
• Web service Cluster
– Web pages &applications
• StorageCluster
– Parallel filesystems
• DatabaseCluster
– Oracle parallelserver
WSC
 Warehouse-scale computers (WSCs) form the foundation of internet services that
people use for search, social networking, online maps, video sharing, online
shopping, email, cloud computing,etc.
 The ever increasing popularity of internet services has necessitated the creation of
WSCs in order to keep up with the growing demands of thepublic.
 Although WSCs may seem to be large datacenters, their architecture and operation
are different fromdatacenters.
 The WSC is a descendant of the supercomputer. Today’s WSCs act as one giant
machine.
 The main parts of a WSC are the building with the electrical and cooling
infrastructure, the networking equipment and the servers, about 50000 to 100000
of them.
 The costs are of the order of $150M to build such an infrastructure . WSCs have
many orders of magnitude more users than high performance computing and play a
very important roletoday.

Message Passing Multiprocessor

Communicating between multiple processors by explicitly sending and receiving
information.
Send messageroutine: A routine used by a processor in machines with
private memories to pass a message to anotherprocessor.
• Receive messageroutine: A routine used by a processor in machines with
private memories to accept a message from anotherprocessor.
• Distributed memory multicomputer system consists of multiple computers, known as
nodes, inter-connected by message passingnetwork.
• Each node acts as an autonomous computer having a processor, a local memory and
sometimes I/Odevices.

• In this case, all local memories are private and are accessible only to the local
processors.
• This is why, the traditional machines are called no-remote-memory-access
(NORMA) machines.
6. Post MCQ Test

1. Which of the following processor has a fixed length of instructions?

a) CISC
b) RISC
c) EPIC
d) Multi-core
Answer: RISC
2. Which one is not benefit of multiprocessors?
(A) Multiple independent jobs can be made to operate in parallel
(B) A single job can be partitioned into multiple parallel tasks
(C) Multiple jobs can be made to operate in serial
(D) All are benefits
Answer: Multiple jobs can be made to operate in serial
3. MISD data stream is the abbreviation of
a) Multiple instruction single data stream
b) Multiple instruction streams, single data stream
c) Multiple instruction streams, data stream
d) Many instruction streams, single data stream
Answer: Multiple instruction streams, single data stream
4. Data-level parallelism/task-level parallelism in a tightly coupled hardware which
allows interaction among parallel threads, are processed by
a) instruction-Level Parallelism
b) Request-Level Parallelism
c) Thread-Level Parallelism
d) Vector Architectures and Graphic Processor Units
Answer: Thread-Level Parallelism
5. An alternative towards the fine-grained multithreading, the devised technique was
a) Buffer-grained multi-threading
b) Miss-grained multi-threading
c) Coarse-grained multi-threading
d) Coarse-grained single threading
Answer: Coarse-grained multi threading

7.Reference:

1. Miles J. Murdocca and Vincent P. Heuring, ―Computer Architecture and Organization: An

Integrated approach‖, Second edition, Wiley India Pvt Ltd, 2015.
2. William Stallings, “Computer Organization and Architecture – Designing for Performance”,
Eighth Edition, Pearson Education, 2010
3. John P. Hayes, Computer Architecture and Organization, Third Edition, Tata McGraw Hill, 2012.

Architect Operation Manual PDF
100% (20)
Architect Operation Manual PDF
2,770 pages
Chapter3 Control Unit
No ratings yet
Chapter3 Control Unit
23 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
Slides Chapter 5 Basic Processing Unit
No ratings yet
Slides Chapter 5 Basic Processing Unit
44 pages
Register Transfer and Micro Operations
No ratings yet
Register Transfer and Micro Operations
49 pages
Hardwired and Microprogrammed Control2
No ratings yet
Hardwired and Microprogrammed Control2
4 pages
CS2354 Advanced Computer Architecture
No ratings yet
CS2354 Advanced Computer Architecture
14 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
Unit-6: Pipeline & Vector Processing
No ratings yet
Unit-6: Pipeline & Vector Processing
41 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
Distributed System
No ratings yet
Distributed System
162 pages
COA_Module4
No ratings yet
COA_Module4
19 pages
Cache Memory
No ratings yet
Cache Memory
12 pages
ACA Notes TechJourney PDF
No ratings yet
ACA Notes TechJourney PDF
206 pages
R20-Atcd-Q.p - Model Paper.
100% (1)
R20-Atcd-Q.p - Model Paper.
3 pages
COA Unit 1
No ratings yet
COA Unit 1
33 pages
Chapter 7 Memory Organization
No ratings yet
Chapter 7 Memory Organization
10 pages
Module 5 Notes - COA
No ratings yet
Module 5 Notes - COA
6 pages
COA Module 2 Notes
No ratings yet
COA Module 2 Notes
46 pages
WilliamStallings Chp3 PDF
No ratings yet
WilliamStallings Chp3 PDF
60 pages
28-5-I O Fundamentals Handshaking, Buffering-20!10!2021 (20-Oct-2021) Material I 20-10-2021 Unit-5-Lecture1
100% (1)
28-5-I O Fundamentals Handshaking, Buffering-20!10!2021 (20-Oct-2021) Material I 20-10-2021 Unit-5-Lecture1
15 pages
Modes of Transfer
No ratings yet
Modes of Transfer
13 pages
Module 2 ACA Notes
100% (1)
Module 2 ACA Notes
31 pages
L2: Internal Organization of Memory Chip
No ratings yet
L2: Internal Organization of Memory Chip
16 pages
15cs81-Iot Syllabus
No ratings yet
15cs81-Iot Syllabus
3 pages
Unit 3 Control Unit: Computer Architecture
No ratings yet
Unit 3 Control Unit: Computer Architecture
12 pages
Chapter 08 - Pipeline and Vector Processing
No ratings yet
Chapter 08 - Pipeline and Vector Processing
14 pages
Module 1 PDF
100% (1)
Module 1 PDF
33 pages
MCES-21CS43 Module-1 Notes
No ratings yet
MCES-21CS43 Module-1 Notes
14 pages
Operating System: Operating Systems: Internals and Design Principles
No ratings yet
Operating System: Operating Systems: Internals and Design Principles
81 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
36 pages
Computer Architecture Questions
No ratings yet
Computer Architecture Questions
1 page
Ec8552-Cao Unit 5
No ratings yet
Ec8552-Cao Unit 5
72 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
Phases of Compiler
No ratings yet
Phases of Compiler
17 pages
UNIT1-CHAPTER 1 INTRODUCTION TO OS
No ratings yet
UNIT1-CHAPTER 1 INTRODUCTION TO OS
85 pages
BE - HPC - MCQ 1 - 6 Unit
No ratings yet
BE - HPC - MCQ 1 - 6 Unit
45 pages
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
No ratings yet
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
10 pages
Computer Organization and Architecture PDF
No ratings yet
Computer Organization and Architecture PDF
63 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
2.write A Program For Frame Sorting Technique Used in Buffers
No ratings yet
2.write A Program For Frame Sorting Technique Used in Buffers
2 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
TYPES OF SCHEDULING ALGORITHMS in Cloud
100% (1)
TYPES OF SCHEDULING ALGORITHMS in Cloud
4 pages
Multiprocessor Architecture System
100% (1)
Multiprocessor Architecture System
10 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
Software Engineering Chapter 5 PPT Pressman
No ratings yet
Software Engineering Chapter 5 PPT Pressman
22 pages
Csa Mod 2
100% (1)
Csa Mod 2
28 pages
Cache Memory: Computer Architecture Unit-1
No ratings yet
Cache Memory: Computer Architecture Unit-1
54 pages
Microprocessor Interview Questions
No ratings yet
Microprocessor Interview Questions
7 pages
CH 5 Basic Computer Organization and Design
100% (1)
CH 5 Basic Computer Organization and Design
50 pages
Coa Unit - 1 Important Questions
No ratings yet
Coa Unit - 1 Important Questions
10 pages
Input/Output Organization in Computer Organisation and Architecture
No ratings yet
Input/Output Organization in Computer Organisation and Architecture
99 pages
Input Output Organization Question Answer
No ratings yet
Input Output Organization Question Answer
33 pages
Unit IV Memory System Notes
No ratings yet
Unit IV Memory System Notes
13 pages
Cpu Bus
No ratings yet
Cpu Bus
31 pages
Anna University QP COA
No ratings yet
Anna University QP COA
3 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
arciticher
No ratings yet
arciticher
6 pages
Assignment 1st PC
No ratings yet
Assignment 1st PC
12 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
Term Paper Cse 211
No ratings yet
Term Paper Cse 211
20 pages
Analog IC Layout Services 1v2
No ratings yet
Analog IC Layout Services 1v2
14 pages
Noise Margins For The CMOS Inverter: - Noise Margin Related To K - When K 1, NM NM 0.93 V (Better Than NMOS)
No ratings yet
Noise Margins For The CMOS Inverter: - Noise Margin Related To K - When K 1, NM NM 0.93 V (Better Than NMOS)
16 pages
Lecture 21
No ratings yet
Lecture 21
21 pages
Lecture 28
No ratings yet
Lecture 28
15 pages
Quiz - Unit III
No ratings yet
Quiz - Unit III
4 pages
Read Me
No ratings yet
Read Me
4 pages
Download ebooks file Observability Engineering: Achieving Production Excellence (Early Release) 1st Edition Charity Majors all chapters
100% (3)
Download ebooks file Observability Engineering: Achieving Production Excellence (Early Release) 1st Edition Charity Majors all chapters
34 pages
Adnan Habeeb: Professional Summary
No ratings yet
Adnan Habeeb: Professional Summary
2 pages
6ES75101DK030AB0 Datasheet en
No ratings yet
6ES75101DK030AB0 Datasheet en
8 pages
Nurse Call System
No ratings yet
Nurse Call System
2 pages
PIC Job Advertisement July 2020
No ratings yet
PIC Job Advertisement July 2020
3 pages
Make Up - V - VII - Sem B.E. Time Table - 2021-22
No ratings yet
Make Up - V - VII - Sem B.E. Time Table - 2021-22
3 pages
dbms11
No ratings yet
dbms11
36 pages
Sr Final Board Revision Cs Part Test 3 Qp
No ratings yet
Sr Final Board Revision Cs Part Test 3 Qp
5 pages
030903-Automatic Checksum Generator
No ratings yet
030903-Automatic Checksum Generator
8 pages
Final Poster Ashoka
No ratings yet
Final Poster Ashoka
1 page
Isometric Projection Drawing
No ratings yet
Isometric Projection Drawing
5 pages
Digital Microscope: Instruction Manual
No ratings yet
Digital Microscope: Instruction Manual
72 pages
Dbmsi Unit 4
No ratings yet
Dbmsi Unit 4
20 pages
Chapter 1
No ratings yet
Chapter 1
58 pages
Escudero t3
No ratings yet
Escudero t3
4 pages
ICS2O
No ratings yet
ICS2O
2 pages
Class Notes Test Automation PDF
No ratings yet
Class Notes Test Automation PDF
4 pages
Nano Trader FX Manual
No ratings yet
Nano Trader FX Manual
11 pages
Espresso Cookie's Gallery Cookie Run Kingdom Wiki Fandom
No ratings yet
Espresso Cookie's Gallery Cookie Run Kingdom Wiki Fandom
1 page
NUGEN MasterCheck Manual
No ratings yet
NUGEN MasterCheck Manual
25 pages
Olsina Et Al. (2001) Specifying Quality Characteristics and Attributes For Websites. 3-540-45144-7 - 26
No ratings yet
Olsina Et Al. (2001) Specifying Quality Characteristics and Attributes For Websites. 3-540-45144-7 - 26
2 pages
Link To Publication in University of Groningen/UMCG Research Database
No ratings yet
Link To Publication in University of Groningen/UMCG Research Database
21 pages
Detecting Cross-Site Scripting (XSS) Using Machine Learning
No ratings yet
Detecting Cross-Site Scripting (XSS) Using Machine Learning
4 pages
Physical Samples Management in SAP QM: 12 Likes 15,090 Views 4 Comments
No ratings yet
Physical Samples Management in SAP QM: 12 Likes 15,090 Views 4 Comments
12 pages
Processing Programming Language
No ratings yet
Processing Programming Language
16 pages
MLR2-E: The Evolution of The Central Station Receiver
No ratings yet
MLR2-E: The Evolution of The Central Station Receiver
2 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages