COA Complete Notes
COA Complete Notes
COA Complete Notes
concepts.
Accumulator-based Architecture: In this design, the CPU has a dedicated register called the accumulator,
which is used for storing intermediate results during computations. The accumulator serves as the
primary working register for arithmetic and logical operations.
Memory Organization: The CPU interacts with memory to fetch instructions and data. The memory is
typically organized into addressable units, such as bytes or words. The CPU uses memory addresses to
access specific locations in memory for reading or writing data.
Instruction Set: The CPU's instruction set defines the operations that the CPU can perform. It includes
instructions for arithmetic operations (addition, subtraction, etc.), logical operations (AND, OR, etc.),
data movement (load, store), and control flow (branching, jumping). The instructions are encoded in
binary format and executed by the CPU.
Control Unit: The control unit is responsible for coordinating the execution of instructions. It decodes
the instructions, determines the required operations, and controls the flow of data between different
components of the CPU. It ensures that instructions are executed in the correct sequence.
Arithmetic and Logic Unit (ALU): The ALU performs arithmetic and logical operations on data stored in
the accumulator and other registers. It can perform operations such as addition, subtraction, AND, OR,
and shift operations. The ALU generates the result of the operation and stores it back in the
accumulator.
Registers: In addition to the accumulator, the CPU may have other registers for storing data and
intermediate results. These registers can be used for temporary storage, addressing calculations,
or holding operands for arithmetic operations.
Data Path: The data path connects the different components of the CPU, allowing the flow of data
between them. It includes buses for transferring data and control signals between the registers, ALU,
and memory.
A typical CPU with general register organization consists of several components. One of the main
components is the register file, which is a small and fast memory block that holds general-
purpose registers. These registers can hold either memory addresses or data operands.
The register file is designed with access circuitry that allows data to be read from or written into any
register. It also enables two registers to be read at the same time, making their contents available at
separate outputs.
In addition to the general-purpose registers, there are also special registers such as the program counter
(PC) and the status register. The program counter holds the memory address of the next instruction to
be fetched and executed, while the status register holds condition code flags and other control bits.
The CPU also includes an ALU (Arithmetic Logic Unit) that performs arithmetic and logical operations
on data. It fetches instructions from memory, decodes them, and reads registers from the register file
as needed. The ALU then performs the required operation and writes the result back to the destination
register, if necessary.
Overall, the general register organization in a typical CPU allows for efficient data storage
and manipulation, enabling the execution of instructions in a sequential manner.
Cache memory is a small and fast memory that is placed between the processor and the main
memory. Its purpose is to improve the performance of the memory system by reducing the time it
takes for the processor to access instructions and data.
Cache memory takes advantage of the principle of locality of reference, which suggests that recently
accessed instructions and data are likely to be accessed again in the near future. It stores copies of these
frequently accessed instructions and data, allowing the processor to retrieve them quickly without
having to access the slower main memory.
Cache memory operates based on two types of locality: temporal locality and spatial locality. Temporal
locality refers to the idea that recently accessed instructions and data are likely to be accessed again
soon. Spatial locality suggests that instructions and data located close to recently accessed ones are
also likely to be accessed soon.
In a computer system without a cache memory, the communication between the processor and the
main memory is straightforward. The processor sends requests to the memory to read or write data
using memory addresses. When a read request is sent, the memory retrieves the requested data and
sends it back to the processor. Similarly, when a write request is sent, the memory stores the data at
the specified memory address.
Without a cache, every memory access requires direct communication between the processor and the
main memory. This means that the processor has to wait for the memory to respond before it can
proceed with its operations. The memory access time becomes a significant bottleneck in the system, as
the processor's speed is often much faster than the memory's access time.
Overall, without a cache, the processor and memory communicate directly through memory requests
and responses. The processor waits for the memory to retrieve or store data, leading to potential
delays in the execution of instructions.
Cache memory plays a crucial role in improving the performance of a computer system by reducing the
time taken to access data from the main memory. The processor communicates with the memory
through a processor-memory interface, which manages the transfer of data between the main memory
and the processor.
Cache Hierarchy
The memory hierarchy consists of multiple levels of cache, with each level having different access times
and sizes. The primary cache, also known as the level 1 (L1) cache, is located on the processor chip and
provides fast access to frequently used instructions and data. A larger, slower level 2 (L2) cache is
placed between the primary cache and the main memory. Some systems may even have a level 3 (L3)
cache. The main memory, which is larger but slower than the cache memories, is implemented using
dynamic memory components.
When the processor issues a read or write request, the cache control circuitry determines whether the
requested data is present in the cache. If the data is found in the cache, it is referred to as a cache hit,
and the data can be accessed quickly. However, if the data is not present in the cache, it is referred to
as a cache miss. In the case of a cache miss, the block of memory words containing the requested data
is transferred from the main memory to the cache.
Cache Coherence
An architecture extension refers to the addition of new features or capabilities to an existing computer
architecture. It involves expanding the instruction set and functionality of the processor to support
specific tasks or requirements. Architecture extensions can include enhancements such as support for
multimedia operations, vector floating-point operations, or specialized instructions for specific
applications. These extensions aim to improve the performance and efficiency of the processor by
providing additional instructions and capabilities tailored to specific needs.
Multipurpose register set for storing data and addresses(register file)
• Program control stack -Stack Pointer keeps track of stacks entry point -A part of external mem is used
as push down stack mem
Functional units are the main components of a computer system that perform specific tasks. A computer
consists of five functionally independent main parts: input, memory, arithmetic and logic, output, and
control units.
Input Unit: The input unit accepts coded information from human operators or other computers over
digital communication lines. It receives data from devices like keyboards and stores it in the
computer's memory.
Memory Unit: The memory unit stores data and instructions that are currently being used or will be
used by the computer. It consists of semiconductor storage cells called words, which can be accessed
quickly using addresses.
Arithmetic and Logic Unit (ALU): The ALU performs arithmetic operations (such as addition, subtraction,
multiplication, and division) and logical operations (such as comparisons) on data stored in the memory.
It uses registers to store data temporarily.
Output Unit: The output unit sends processed results from the computer to the outside world. It can
include devices like printers or graphic displays, which provide both output and input functions.
Control Unit: The control unit coordinates the operations of all the other units. It sends control signals
to initiate and control input/output transfers, and it ensures that the instructions are executed in the
correct sequence.
An instruction set architecture (ISA) is a set of instructions that a computer processor can execute. It
specifies the operations that can be performed and the operands involved in those operations. In the
given document, it is mentioned that instructions can be specified using English words or mnemonics in
assembly language. Different processors may use different mnemonics for the same operation. The
document also introduces the concepts of Reduced Instruction Set Computers (RISC) and Complex
Instruction Set Computers (CISC), which are two different approaches to designing instruction sets. RISC
instruction sets have each instruction fit in a single word and use a load/store architecture, while CISC
instruction sets can have more complex instructions that may span multiple words and are not limited
to the load/store architecture. The document also briefly mentions addressing modes, which are
methods for specifying the address of operands in memory.
Addressing modes are a set of rules and techniques used by processors to access data operands
in memory. They determine how the processor calculates the effective address of an operand.
Immediate Mode: In this mode, the operand is contained within the instruction itself. It can be a signed
8-bit or 32-bit number, with the length specified by a bit in the OP code of the instruction. Immediate
mode is used when the operand value is known at compile time.
Direct Mode: Also known as Absolute mode, the memory address of the operand is given by a 32-bit
value in the instruction. Direct mode is used when the operand is located at a specific memory address.
Register Mode: In this mode, the operand is contained within one of the eight general-purpose registers
specified in the instruction. Register mode is used when the operand is stored in a register.
Register Indirect Mode: The operand is accessed indirectly through a register. The effective address of
the operand is obtained by dereferencing the contents of the register. Register indirect mode is used
when the operand is stored in memory and the address is stored in a register.
Index Mode: This mode provides flexibility in accessing operands, particularly in dealing with lists and arrays.
The effective address of the operand is generated by adding a constant value to the contents of a
register. Index mode is used when the operand is located at an address calculated by adding an offset to
the contents of a register.
In computer systems, accessing I/O devices involves treating them as if they are memory locations.
Each I/O device is assigned specific addressable locations in the processor's address space. These
locations are implemented as registers, which are organized as bit storage circuits. This arrangement is
known as memory-mapped I/O and is commonly used in most computers.
With memory-mapped I/O, any machine instruction that can access memory can be used to transfer
data to or from an I/O device. For example, a load instruction can read data from an input device's
register and load it into a processor register. Similarly, a store instruction can send the contents of a
processor register to an output device's register
• The interconnection network consists of circuits needed to transfer information between
the processor, the memory unit, and a number of I/O devices.
• Load and Store instructions use addressing modes to generate effective addresses that identify the
desired locations.
• Each I/O device must appear to the processor as consisting of some addressable locations, just like
the memory.
• The I/O devices and the memory share the same address space, this arrangement is called memory-
mapped I/O.
• All instructions and data for a particular program are specified in a single source file from which the
assembler generates an object program
• Each program may contain references to external names, which are address labels defined in other
source files.
• A utility program called the linker is used to combine the contents of separate object files into
one object program.
• The linker needs the relative positions of address labels defined in the source files.
• The linker uses the information in each object file and the known sizes of machine language
programs to build a memory map of the final combined object file.
• A source file in a high-level language is prepared by the programmer and stored on the disk.
• The compiler generates assembly-language instructions and directives, and writes them into an output
file.
• It is often convenient to partition a high-level source program into multiple files, grouping subroutines
together based on related tasks.
• For each source file, the compiler generates an assembly-language file, then invokes the assembler to
generate an object file.
• Benefit of programming in a high-level language is that the compiler automates many of the
tedious tasks that a programmer has to do when programming in assembly language
Compiler optimization refers to the techniques used by a compiler to improve the performance of the
generated assembly-language program. When a compiler translates a high-level language program into
assembly language, it may not produce the most efficient program in terms of execution time or size.
Compiler optimization techniques, such as reordering instructions and applying optimizations
specifically designed for loops, can significantly improve the performance of the program.
One common optimization technique used by compilers is to recognize that certain variables used in
loops can be maintained in registers, eliminating the need for load and store instructions within the
loop. By minimizing the number of instructions executed within a loop, the overall execution time of
the program can be reduced.
• Improved performance can be achieved if the compiler uses techniques such as reordering
the instructions produced from a straightforward approach.
• Because much of the execution time of a program is spent in loops, compilers may apply
optimizations that are particularly effective for loops.
A debugger is a utility program that helps programmers identify and fix errors in their programs. It
allows the programmer to stop the execution of an object program at specific points of interest and
examine the contents of processor registers and memory locations. This enables the programmer
to compare computed values with expected results and identify programming errors. The debugger
provides two important facilities: trace mode and breakpoints.
Trace Mode
When a processor is operating in trace mode, an interrupt occurs after the execution of every
instruction. This interrupt allows the debugger to assume execution control and enables the user to
enter commands for examining the contents of registers and memory locations. The trace-mode
interrupt is automatically disabled when the debugger routine is entered and re-enabled upon return
to the object program.
Breakpoints
Breakpoints provide a similar interrupt-based debugging facility. The object program being debugged is
interrupted only at specific points indicated by the programmer. For example, a breakpoint can be set to
determine whether a particular subroutine in the object program is ever reached. When the breakpoint
is encountered, the debugger is activated through an interrupt, allowing the programmer to examine
the state of processing at that point. The advantage of using a breakpoint is that execution proceeds at
full speed until the breakpoint is encountered.
Storage Capacity: Memory devices have a wide range of storage capacities, ranging from a few kilobytes
to several terabytes. The capacity of a memory device determines how much data it can store.
Access Modes: Memory devices can be accessed in different modes, such as random access or
sequential access. Random access allows for direct access to any location in the memory, while
sequential access requires accessing data in a specific order.
Access Time: The access time of a memory device refers to the time it takes to retrieve or store data.
It is an important measure of the speed of the memory unit. Faster access times result in quicker data
retrieval and storage.
Permanence of Storage: Memory devices can be categorized as volatile or non-volatile. Volatile memory
loses its stored data when power is removed, while non-volatile memory retains data even when power
is turned off.
Cycle Time: The cycle time of a memory device is the minimum time delay required between two
successive memory operations. It is usually slightly longer than the access time and depends on
the implementation details of the memory unit.
Data Transfer Rate: The data transfer rate of a memory device refers to the speed at which data can
be read from or written to the memory. It is measured in bytes per second and determines how
quickly data can be transferred.
Physical Characteristics: Memory devices come in various physical forms, such as integrated circuits,
magnetic disks, or optical disks. The physical characteristics of a memory device include its size,
weight, and form factor, which determine its compatibility and usability in different systems.
The memory and the processor are connected through the processor-memory interface. This interface
manages the transfer of data between the main memory and the processor. When a word is to be read
from the memory, the interface sends the address of that word to the memory along with a Read
control signal. The interface waits for the word to be retrieved and then transfers it to the appropriate
processor register. Similarly, when a word is to be written into memory, the interface transfers both
the address and the word to the memory along with a Write control signal.
The connection between the processor and the main memory consists of address, data, and control
lines. The processor uses the address lines to specify the memory location involved in a data transfer
operation, and uses the data lines to transfer the data. The control lines carry the command indicating
Read or a Write operation and whether a byte or a word is to be transferred. The control lines also
provide the necessary timing information and are used by the memory to indicate when it has
completed the requested operation.
The memory controller circuit plays a crucial role in the connection between the memory and the
processor. It accepts a complete address and the Read/Write signal from the processor, under the
control of a Request signal indicating a memory access operation. The controller forwards the
Read/Write signals and the row and column portions of the address to the memory. It also generates
the RAS (Row Address Strobe) and CAS (Column Address Strobe) signals with the appropriate timing. The
RAS and CAS signals are used to latch the high-order and low-order address bits, respectively, in the
memory chips.
Volatile Memory: Both static and dynamic RAM chips are examples of volatile memory. They retain
information only while power is turned on. This type of memory is commonly used in computer
systems for storing active portions of programs in the main memory.
Nonvolatile Memory: Nonvolatile memories are memory devices that retain the stored information even
when power is turned off. They are used in applications where it is necessary to store data permanently,
such as in embedded systems. Nonvolatile memories require a special writing process to store
information and can be read in the same way as volatile memories.
Read-Only Memory (ROM): ROM is a type of nonvolatile memory that is primarily used for reading
stored data. It is commonly used in embedded applications where the software needs to be stored
permanently. ROM requires a special writing process to store information and is designed for reading
rather than writing.
Cache Memory: Cache memory is a smaller, faster RAM unit that is used to hold sections of a
program and associated data that are currently being executed. It is tightly coupled with the
processor and is used to facilitate high instruction execution rates. Cache memory is typically located
on the same integrated-circuit chip as the processor.
Virtual Memory: Virtual memory is a technique that allows only the active portions of a program to
be stored in the main memory, while the rest is stored on a secondary storage device. This technique
increases the apparent size of the main memory and allows for efficient memory management.
Memory Hierarchy: The memory hierarchy refers to the organization of different types of memory units
in a computer system. It includes registers, cache memory, main memory, and secondary storage
devices such as magnetic disks. The hierarchy is designed to provide a balance between speed, size, and
cost-effectiveness in memory storage.
Volatility: RAM is volatile memory, which means it loses its data when the power is turned off or
the computer is restarted.
Purpose: RAM is used for temporarily storing data that is actively being processed by the CPU. It
provides high-speed access for quick read and write operations, making it ideal for running applications
and multitasking.
Read/Write Access: RAM allows both read and write operations. Data can be read from RAM by the
CPU, and new data can be written to RAM during computer operation.
Data Persistence: Data in RAM is not persistent and is lost when the power is turned off. It is meant
for short-term data storage.
Volatility: ROM is non-volatile memory, meaning it retains its data even when the power is turned off.
Purpose: ROM is used to store firmware or software that is essential for the computer's operation. This
includes the BIOS in a computer and critical software in other devices.
Read/Write Access: ROM is typically read-only and is not designed for write operations. The data
stored in ROM is permanent and cannot be easily modified by end-users.
Data Persistence: Data in ROM is persistent and remains intact even when the power is removed. It is
designed for long-term storage of essential software and firmware.
Memory chips are typically organized in the form of an array, with each cell capable of
storing one bit of information. The cells are arranged in rows and columns, forming
memory words and connected to common lines called word lines. The cells in each
column are connected to Sense/Write circuits through bit lines, which are in turn
connected to the data input/output lines of the chip.
The memory chip can be organized in different configurations, such as a 128 × 8 memory.
In this configuration, a 10-bit address is required, but there is only one data line, resulting
in a total of 15 external connections. The 10-bit address is divided into two groups of 5 bits
each to form the row and column addresses for the cell array. A row address selects a row
of 32 cells, all of which are accessed in parallel. However, only one of these cells is
connected to the external data line, based on the column address.
During a Read operation, the Sense/Write circuits sense the information stored in the
selected cells and place it on the output data lines. During a Write operation, the
Sense/Write circuits receive input data and store it in the cells of the selected word.
Commercially available memory chips contain a much larger number of memory cells
and have different configurations and organization depending on their specific
design and purpose.
SRAM (Static Random Access Memory)
SRAM is a type of memory that can retain its state as long as power is applied. It
consists of circuits that form a latch using two inverters cross-connected. The latch is
connected to two bit lines through transistors. When the word line is at ground level,
the transistors are turned off, and the latch retains its state.
To read the state of an SRAM cell, the word line is activated, closing switches T1 and T2.
If the cell is in state 1, the signal on bit line b is high, and the signal on bit line b is low.
The opposite is true if the cell is in state 0. The Sense/Write circuit at the end of the bit
lines monitors their state and sets the corresponding output accordingly.
During a write operation, the Sense/Write circuit drives bit lines b and b, instead of
sensing their state. It places the appropriate value on bit line b and its complement
on b and activates the word line. This forces the cell into the corresponding state,
which it retains when the word line is deactivated.
DRAM (Dynamic Random Access Memory)
Read Operation
During a read operation, the transistor in a selected DRAM cell is turned on. A sense
amplifier connected to the bit line detects whether the charge stored in the capacitor
is above or below a threshold value. If the charge is above the threshold, the sense
amplifier drives the bit line to the full voltage, representing the logic value 1. If the
charge is below the threshold, the sense amplifier pulls the bit line to ground level to
discharge the capacitor fully. The contents of the addressed location are then
transferred from the storage cell unit to a data buffer and from there to the data bus.
Write Operation
During a write operation, the word to be stored is transferred from the data bus to the
selected location in the storage unit. The charge on the capacitor is adjusted accordingly
based on the logic value of the data being written.
Read-Only Memory, or ROM, is a type of memory where information can be written into
it only once at the time of manufacture. It is a non-volatile memory, meaning that the
stored data is retained even when the power is turned off. ROM is primarily used for
storing permanent or unchangeable data, such as firmware or software instructions that
are essential for the operation of a device. Unlike other types of memory, ROM can only
be read, and the stored data cannot be modified or erased. ROM cells store a logic value
of 0 or 1 based on the connection of the transistor to ground. To read the state of a
ROM cell, the word line is activated to close the transistor switch, and the bit line
indicates the stored value.
Programmable ROM (PROM)
Programmable ROM (PROM) is a type of ROM that allows the user to load data into it.
This programmability is achieved by inserting a fuse at a specific point. The user can burn
out the fuses at the required locations using high-current pulses to insert 1s at those
locations. PROMs provide flexibility and convenience that are not available with regular
ROMs. However, the cost of preparing the masks needed for storing a particular
information pattern makes ROMs cost-effective only in large volumes.
Erasable Reprogrammable ROM, also known as EPROM, is a type of memory that allows
stored data to be erased and new data to be written into it. It provides considerable
flexibility during the development phase of digital systems. EPROMs can retain stored
information for a long time and can be used in place of ROMs or PROMs while software
is being developed.
In summary, EPROMs are a type of memory that can be erased and reprogrammed,
providing flexibility during the development phase of digital systems. They use a special
transistor to store and retain information.
Electrically Erasable PROM (EEPROM)
EEPROM, also known as electrically erasable PROM, is a type of memory chip that can
be programmed, erased, and reprogrammed electrically. Unlike EPROM, which
requires physical removal from the circuit for reprogramming, EEPROM allows for
convenient and flexible data modification without the need for physical intervention.
One advantage of EEPROM is its ability to selectively erase and rewrite specific portions of
the chip's contents. This feature provides greater flexibility and efficiency compared to
EPROM, where the entire chip needs to be erased when reprogramming is required.
However, EEPROMs do have a disadvantage in that they require different voltages for
erasing, writing, and reading the stored data. This complexity in voltage requirements
can increase circuit complexity.
Examples of EEPROM include Flash Memory, Flash Card, and Flash Drives, which are
widely used in various electronic devices for data storage and modification
DMA is an alternative approach used to transfer blocks of data directly between the
main memory and I/O devices, such as disks. It involves a special control unit called a
DMA controller, which manages the transfer without continuous intervention by the
processor. The DMA controller performs functions that would normally be carried out
by the processor when accessing the main memory. Although the DMA controller
transfers data without intervention by the processor, its operation must be under the
control of a program executed by the processor, usually an operating system routine.
Two registers are used for storing the starting address and the word count.
When transferring a block of data, instructions are needed to increment the memory
address and keep track of the word count. The DMA controller sets the Done flag to
1 when it has completed transferring a block of data and is ready to receive another
command. This flag causes the controller to raise an interrupt after completing the
transfer. The disk controller, which controls two disks, also has DMA capability and
provides two DMA channels. To start a DMA transfer of a block of data from the main
memory to one of the disks, an OS routine writes the address and word count
information into the registers of the disk controller.
DMA offers the advantage of transferring data directly between the main memory and
I/O devices without continuous intervention by the processor. This reduces the burden
on the processor and allows for faster data transfer. However, DMA operation must be
under the control of a program executed by the processor, and several program
instructions must be executed involving many memory accesses for each data word
transferred. Additionally, DMA requires the use of a DMA controller and specific
registers for storing the starting address and word count.
Memory Hierarchy
Main Memory
The main memory is a large memory implemented using dynamic memory components.
It provides a relatively fast access to data and instructions during program execution.
Disk Storage
Disk devices, such as hard drives, provide a very large amount of inexpensive memory.
They are commonly used as secondary storage in computer systems.
Processor Cache
The processor cache is a small amount of memory that is implemented directly on the
processor chip. It is divided into multiple levels, with the primary cache (L1 cache) being
the closest to the processor and having the fastest access time. The secondary cache (L2
cache) is placed between the primary cache and the main memory.
Cache Memory
The cache memory stores a reasonable number of memory blocks at any given time.
It is used to temporarily store frequently accessed data and instructions, reducing the
need to access the slower main memory. The cache's replacement algorithm
determines which blocks should be removed from the cache when it is full.
Cache Memory
Cache memory is a small and very fast memory that sits between the processor and the
main memory. It takes advantage of the locality of reference property of computer
programs, where instructions in localized areas of the program are executed repeatedly.
When the processor issues a read request, a block of memory words containing the
specified location is transferred into the cache. The cache memory can store a limited
number of blocks compared to the main memory. The correspondence between the
main memory blocks and those in the cache is determined by a mapping function.
When the cache is full and a memory word that is not in the cache is referenced, the
cache control hardware must decide which block should be removed to make space for
the new block. This decision is made based on the cache's replacement algorithm. The
processor does not need to know explicitly about the cache's existence, as the cache
control circuitry determines whether the requested word is in the cache. If a read or
write operation is performed on a word that is in the cache, it is called a cache hit. If the
word is not in the cache and needs to be brought in from the main memory, it is called
a cache miss.
Performance Considerations
Performance considerations in computer systems are crucial for efficient and fast
execution of instructions. Several factors affect performance, including cache
utilization, hit rate, miss penalty, and prefetching.
Cache Utilization
The hit rate represents the fraction of successful cache accesses, while the miss rate
indicates the fraction of unsuccessful accesses. When a cache miss occurs, additional
time is required to retrieve the data from a slower memory unit, resulting in a
performance penalty.
Write Buffer
Prefetching
Prefetching is a technique used to avoid processor stalls by loading data into the cache
before they are needed. It can be done through special prefetch instructions inserted by
the programmer or compiler, or through hardware circuitry that predicts memory access
patterns.
Lockup-Free Cache
Most processor chips have at least one level 1 (L1) cache, with separate caches for
instructions and data. High-performance processors often have two levels of caches
including a larger level 2 (L2) cache. The average access time experienced by the
processor is determined by the hit rates and access times of these caches.
Memory device characteristics, such as storage capacity, access modes, access time,
permanence of storage, cycle time, and data transfer rate, also impact system
performance. These factors determine the efficiency and speed of memory operations
in a computer system.
Virtual Memory
VM Address Translation
In a virtual memory system, programs and data are divided into fixed-length units called
pages. These pages consist of blocks of words that occupy contiguous locations in the
main memory. The main memory is divided into page frames, each capable of holding
one page. The virtual addresses generated by the processor for instruction fetch or
operand load/store operations are translated into physical addresses using a page table.
The page table contains information about the main memory location of each page, and
the starting address of the page table is stored in a page table base register. Each page
in the page table has a validity bit that indicates whether the page is actually loaded in
the main memory.
Page Faults
When a program generates an access request to a page that is not in the main memory,
a page fault occurs. In such cases, the entire page needs to be brought from the disk
into the memory before access can proceed. The processing of the program is
interrupted, and control is transferred to the operating system. If a modified page is
being replaced, it needs to be written back to the disk before being removed from the
main memory. Due to the long access time of the disk, it is not efficient to frequently
access it for writing small amounts of data.
Secondary Storage
Secondary storage refers to the non-volatile storage devices used in computer systems
to store data for long-term use. It is used to supplement the primary memory (RAM)
and provides a larger storage capacity.
One type of secondary storage is the magnetic hard disk. It consists of one or more disk
platters mounted on a spindle and placed in a drive that rotates them at a constant
speed. The magnetized surfaces of the disks store digital information, which can be read
or written by read/write heads that move radially across the disks. The disk system also
includes a disk controller that controls the operation of the system.
Optical Disks
Another type of secondary storage is optical disks, such as CDs. These disks use optical
technology to store data. Laser light is focused on the surface of the disk, and the
indentations on the disk reflect the light toward a photodetector, which detects the
stored binary patterns. The laser emits a coherent light beam that is sharply focused on
the disk surface. Optical disks are commonly used for storing audio and digital data.
CD Technology:
CD technology is based on the use of optical technology, which takes advantage of the ability of laser
light to be focused on a small spot. The surface of the CD is programmed with indentations that reflect
the focused laser beam towards a photodetector, which detects the stored binary patterns. The laser
emits a coherent light beam that is sharply focused on the surface of the disk. This coherent light
consists of synchronized waves with the same wavelength. The CD is made up of a bottom layer of
transparent polycarbonate plastic, which serves as a clear glass base. The surface of this plastic is
programmed with pits to store data, while the unindented parts are called lands. A thin layer of
reflecting aluminum material is placed on top of the programmed disk. The total thickness of the CD is
1.2 mm, with most of it contributed by the polycarbonate plastic. The laser source and the
photodetector are positioned below the polycarbonate plastic layer. A reflective aluminum layer
enhances data detection, forming a 1.2mm thick CD. The laser source and photodetector below the
plastic layer enable precise reading of binary patterns, making CDs a popular digital storage medium.
Fixed-point numbers are a type of numerical representation that allows for a limited
range of values and has relatively simple hardware requirements. These numbers are
derived from the ordinary representation of numbers, where the left side represents
the whole number part and the right side represents the fraction part. This positional
notation means that each digit has the same weight. Fixed-point numbers can be
represented in binary or decimal form, with binary numbers being more commonly
used. Decimal numbers often use Binary Coded Decimal (BCD) representation.
Floating-Point Numbers
Floating-point numbers are a type of numerical representation that allows for a much
larger range of values compared to fixed-point numbers. They are commonly used in
computers and are typically represented in binary form. A floating-point number consists
of three components: the mantissa (M), the exponent (E), and the base (B). The
mantissa represents the significant digits of the number, the exponent determines the
scale of the number, and the base specifies the number system being used. However,
implementing floating-point numbers can require costly processing hardware or lengthy
software implementations.
IEEE Floating-Point Numbers, defined by the IEEE Standard for Floating-Point Arithmetic
(IEEE 754), is a widely used technical standard for representing and performing
computations with real numbers on computers. It was established in 1985 by the
Institute of Electrical and Electronics Engineers (IEEE). The standard addresses various
issues found in different floating-point implementations, making them more reliable
and portable.
The standard defines arithmetic formats, which include finite numbers, signed zeros,
subnormal numbers, infinities, and special "not a number" values (NaNs). It also specifies
interchange formats, which are efficient and compact encodings used to exchange
floating-point data. Rounding rules are provided to ensure consistent behavior during
arithmetic and conversions. Additionally, the standard covers various operations, such as
arithmetic and trigonometric functions, and exception handling for exceptional
conditions like division by zero or overflow.
In the 32-bit single-precision floating-point representation, the sign bit (S) is the most
significant bit, indicating whether the number is positive or negative. The next 8 bits
represent the exponent (E), and the remaining 23 bits represent the mantissa (M). This
format allows for a larger range of values but may require costly hardware or lengthy
software implementations.
Generic Model of an I/O Module
The I/O module acts as a bridge between the processor and the external environment,
allowing for the transfer of control, status, and data between the computer and
peripheral devices. It provides a means for the processor to communicate with external
devices in a coordinated and efficient manner.
The module performs various functions, including control and timing, processor
communication, device communication, data buffering, and error detection. It
coordinates the flow of traffic between internal resources (such as main memory and the
system bus) and external devices, ensuring smooth data transfer.
The I/O module communicates with the processor through command decoding,
accepting commands sent as signals on the control bus. It also exchanges data with the
processor over the data bus. The module may hide the details of timing, formats, and
electromechanics of external devices, allowing the processor to interact with them
through simple read and write commands.
The complexity of an I/O module can vary, with some modules taking on most of the
detailed processing burden and presenting a high-level interface to the processor
(referred to as an I/O channel or I/O processor), while others require detailed control
and are referred to as I/O controllers or device controllers. The module's design and
capabilities depend on the specific requirements of the system and the devices it
controls.
External devices are devices that are connected to a computer to exchange data with the
external environment. External devices can be broadly classified into three categories
based on their purpose and functionality:
3. Communication Devices
Communication devices enable the computer system to exchange data with remote
devices. These devices can communicate with human-readable devices, machine-
readable devices, or even other computers. They facilitate data transfer between the
computer system and external devices over communication lines.
I/O Modules: Overview and Functions
I/O modules play a crucial role in the input/output (I/O) operations of a computer
system. These modules serve as an interface between the processor and memory on one
side and the external devices on the other side. They facilitate the exchange of data,
control signals, and status information between the computer and the external
environment.
The major functions or requirements of an I/O module can be categorized into the
following areas:
1. Control and Timing: I/O modules coordinate the flow of traffic between
the internal resources (such as main memory and the system bus) and the
external devices. They ensure that data transfer occurs in a controlled and
timely manner.
2. Processor Communication: I/O modules receive commands from the
processor and communicate with it via control and data buses. They
decode the commands and exchange data between the processor and the
module.
3. Device Communication: I/O modules establish communication with the
external devices. They send commands, receive status information, and
exchange data with the devices.
4. Data Buffering: I/O modules have data registers that buffer the data being
transferred between the processor and the external devices. This buffering
helps in managing the speed mismatch between the processor/memory
and the peripherals.
5. Error Detection: I/O modules are responsible for detecting and reporting
errors. They can detect mechanical and electrical malfunctions reported by
the devices, as well as unintentional changes in the transmitted data.
I/O modules have evolved over time to handle increasing complexity and improve
system performance. They have transitioned from simple controllers to more advanced
processors capable of executing I/O programs. This evolution has led to the
development of I/O channels and I/O processors.
I/O channels are I/O modules that can execute programs and perform most of the I/O
tasks without CPU intervention. They allow the CPU to specify a sequence of I/O
activities and are particularly useful.
1. Processor Interrogation: The processor checks the status of the attached device
by interrogating the I/O module.
2. Device Status Retrieval: The I/O module returns the status of the device to the
processor.
3. Transfer Request: If the device is operational and ready to transmit, the
processor requests the transfer of data by sending a command to the I/O
module.
4. Data Acquisition: The I/O module obtains a unit of data (e.g., 8 or 16 bits) from
the external device.
5. Data Transfer: The acquired data is then transferred from the I/O module to the
processor.
1. Programmed I/O is one of the three techniques used for input/output (I/O)
operations. In this technique, data is exchanged between the processor and the
I/O module. The processor executes a program that gives it direct control over
the I/O operation. It includes sensing device status, sending read or write
commands, and transferring the data. However, the processor must wait until the
I/O operation is complete, which can be wasteful of processor time if the
processor is faster than the I/O module.
2. I/O Commands
3. To execute an I/O-related instruction, the processor issues an address that
specifies the particular I/O module and external device, along with an I/O
command. There are four types of I/O commands that an I/O module may
receive: control, status, read, and write. The control command is used to activate
a peripheral and tell it what to do, while the status command is used to check the
status of the I/O module. The read and write commands are used to transfer data
between the I/O module and the main memory.
4. I/O Instructions
5. With programmed I/O, there is a close correspondence between the I/O-related
instructions fetched by the processor from memory and the I/O commands
issued to the I/O module. The instructions are easily mapped into I/O commands,
and there is often a simple one-to-one relationship. The form of the instruction
depends on how external devices are addressed. Each I/O device connected
through I/O modules to the system is given a unique identifier or address. When
the processor issues an I/O command, the command contains the address of the
desired device. Each I/O module interprets the address lines to determine if the
command is for itself.
I/O Commands
I/O commands are used by the processor to communicate with I/O modules and control
peripheral devices. There are four types of I/O commands: control, test, read, and write.
In interrupt-driven I/O, when the processor issues a READ command for input, it goes
on to perform other tasks. At the end of each instruction cycle, the processor checks for
interrupts. When an interrupt from the I/O module occurs, the processor saves the
context of the current program and processes the interrupt. It reads the data from the
I/O module and stores it in memory before resuming the execution of the program it
was working on.
Interrupt-driven I/O allows the processor to execute other instructions while an I/O
operation is in progress. When the I/O module is ready to be serviced, it sends an
interrupt request signal to the processor. The processor suspends the current program,
branches off to the interrupt handler program for that specific I/O device, and resumes
the original execution after servicing the device.
To identify devices for interrupt-driven I/O, various techniques are used, including
multiple interrupt lines, software poll, daisy chain, and bus arbitration. These techniques
help manage the interrupt signals between the processor and the I/O modules
efficiently.
In DMA, a special module called a DMA module controls the exchange of data between
main memory and an I/O module. The CPU initiates the transfer by sending a request to
the DMA module, specifying the starting location in memory, the number of words to
be transferred, and the address of the I/O device involved. The DMA module then
transfers the entire block of data directly to or from memory, without the involvement of
the CPU.
Once the transfer is complete, the DMA module sends an interrupt signal to the CPU.
However, it is important to note that the CPU is only involved at the beginning and end
of the transfer, and it is not interrupted during the transfer itself. This makes DMA more
efficient than interrupt-driven or programmed I/O for multiple-word data transfers.
DMA can be configured in different ways, such as sharing the same system bus with
other modules or using a separate I/O bus. The Intel 8237A DMA controller is an
example of a DMA controller that interfaces with processors and DRAM memory to
provide DMA capabilities.
I/O Channels
I/O channels are an extension of the DMA (Direct Memory Access) concept. They have
the ability to execute I/O instructions, which gives them complete control over I/O
operations. In a computer system with I/O channels, the CPU does not execute I/O
instructions. Instead, these instructions are stored in main memory to be executed by a
special-purpose processor in the I/O channel itself. The CPU initiates an I/O transfer by
instructing the I/O channel to execute a program in memory. The program specifies the
device or devices, the area or areas of memory for storage, priority, and actions to be
taken for certain error conditions.
There are two types of I/O channels commonly used: selector channels and multiplexor
channels. A selector channel controls multiple high-speed devices and is dedicated to
the transfer of data with one of those devices at any given time. On the other hand, a
multiplexor channel can handle I/O with multiple devices simultaneously. It can accept
or transmit characters as fast as possible to multiple devices, interleaving blocks of data
from several high-speed devices.
I/O channels serve as an interface between the CPU and I/O controllers. They allow for
efficient control of I/O operations, relieving the CPU of the details of I/O operations and
improving overall system performance.
I/O processors are specialized modules that enhance the functionality of I/O operations
in a computer system. They have their own local memory and are capable of executing
I/O programs independently, without CPU intervention. This allows the CPU to specify a
sequence of I/O activities and be interrupted only when the entire sequence has been
performed.
With the introduction of I/O processors, a major change occurs in the computer system
architecture. The I/O module, which is enhanced to become a processor, can control a
large set of I/O devices with minimal CPU involvement. This architecture is commonly
used to control communication with interactive terminals.
I/O processors relieve the CPU of I/O-related tasks, improving overall system
performance. They execute I/O instructions stored in main memory and have complete
control over I/O operations. The CPU initiates an I/O transfer by instructing the I/O
processor to execute a program in memory, specifying the devices and memory areas
involved.
I/O Channel Architecture
The I/O channel architecture involves the use of controllers or I/O modules to handle
devices. These controllers are similar to the I/O modules themselves. The I/O channel
acts as a replacement for the CPU in controlling these I/O controllers. For low-speed
devices, a byte multiplexor is used to accept or transmit characters as quickly as possible
to multiple devices.
FireWire is a high-speed external interface that provides a standardized way for the host
system to interact with peripheral devices over a serial bus. It is designed to meet the
I/O demands of personal computers, workstations, and servers, offering advantages
such as high speed, low cost, and ease of implementation.
One of the key features of FireWire is its three-layer protocol stack. The physical layer
defines the permissible transmission media and the electrical and signaling
characteristics of each. The link layer describes the transmission of data in packets, while
the transaction layer defines a request-response protocol that hides the lower-layer
details from applications.
FireWire uses a daisy-chain configuration, allowing up to 63 devices to be connected off
a single port. It also supports hot plugging, which means peripherals can be connected
and disconnected without powering down or reconfiguring the system. Automatic
configuration eliminates the need for manual device ID settings or concerns about
relative device positions.
The physical layer of FireWire specifies various transmission media and connectors, with
data rates ranging from 25 to 3200 Mbps. It also provides an arbitration service to
ensure that only one device transmits data at a time. The arbitration can be based on a
tree-structured arrangement of nodes on the FireWire bus, with a root node acting as a
central arbiter.
FireWire is widely used not only in computer systems but also in consumer electronics
products like digital cameras, DVD players/recorders, and televisions. Its serial
transmission approach offers advantages over parallel interfaces, such as SCSI, by
reducing the number of wires and simplifying synchronization between them.
InfiniBand Architecture
InfiniBand is an interface that enables servers, remote storage, and other network
devices to be attached in a central fabric of switches and links. It provides a switch-
based architecture that can connect up to 64,000 servers, storage systems, and
networking devices. The key elements of the InfiniBand architecture include the host
channel adapter (HCA), target channel adapter (TCA), InfiniBand switch, links, subnet,
and router.
The HCA is a single interface that links a server to an InfiniBand switch. It attaches to the
server at a memory controller, which has access to the system bus and controls traffic
between the processor and memory, as well as between the HCA and memory. The HCA
uses direct-memory access (DMA) to read and write memory.
A TCA is used to connect storage systems, routers, and other peripheral devices to an
InfiniBand switch. It allows these devices to communicate with the switch and other
devices in the network.
InfiniBand Switch
A link refers to the connection between a switch and a channel adapter or between two
switches. A subnet consists of one or more interconnected switches and the links that
connect other devices to those switches. Subnets allow administrators to confine
broadcast and multicast transmissions within the subnet.
Router
A DMA (Direct Memory Access) controller is a hardware device that allows data to be
transferred between peripheral devices and memory without the involvement of the
CPU. It provides a more efficient way of transferring data compared to interrupt-driven
or programmed I/O.
When a peripheral device, such as a disk controller, needs to transfer a block of data to
or from memory, it requests the service of the DMA controller by pulling the DREQ
(DMA request) line high. The DMA controller then signals the CPU through its HOLD pin
by putting a high on its HRQ (hold request) line, indicating that it needs to use the
buses.
The CPU finishes the current bus cycle and responds to the DMA request by putting a
high on its HDLA (hold acknowledge) line, allowing the DMA controller to use the buses.
The DMA controller then activates the DACK (DMA acknowledge) line, indicating to the
peripheral device that it can start transferring the data.
The DMA controller transfers the data one word at a time directly to or from memory,
without involving the CPU. It decrements the counter and increments the address
pointer after each transfer until the count reaches zero and the task is finished. Once the
DMA controller completes its job, it deactivates the HRQ line, signaling the CPU that it
can regain control over the buses.
The Intel 8237A DMA controller, which is a commonly used DMA controller, is known as
a fly-by DMA controller. This means that the data being moved from one location to
another does not pass through the DMA chip and is not stored in the DMA chip. The
DMA controller can only transfer data between an I/O port and a memory address, not
between two I/O ports or two memory locations.
The Intel 8237A DMA controller interfaces with the 80x86 family of processors and
DRAM memory to provide DMA capabilities. It contains four DMA channels that can be
programmed independently, and any one of the channels can be active at any moment.
The 8237A DMA controller has control/command registers that can be used to program
and control DMA operations. These registers include the command register, which is
used to control the operation of the DMA, and the status register, which indicates the
status of the DMA channels.
MICRO-OPERATIONS
Micro-operations are the smaller units that make up an instruction cycle in the
execution of a program. Each instruction cycle consists of a sequence of micro-
operations, with one machine instruction per cycle. These micro-operations involve a
series of steps that utilize the processor registers.
The execution of a program involves the sequential execution of instructions, with each
instruction being executed during an instruction cycle. Each instruction cycle is
composed of shorter sub cycles, such as fetch, indirect, execute, and interrupt. These sub
cycles further consist of one or more micro-operations.
The fetch, indirect, and interrupt cycles are simple and predictable, with each involving a
fixed sequence of micro-operations that are repeated each time around. The fetch cycle
is always followed by the execute cycle, while the interrupt cycle is always followed by
the fetch cycle. The next cycle in the fetch and execute phases depends on the state of
the system.
1. Fetch Cycle: The fetch cycle is the first phase of the instruction cycle. It occurs at
the beginning of each instruction cycle and involves fetching an instruction from
memory. The memory address register (MAR) specifies the address in memory for
the read operation, and the instruction register (IR) holds the last instruction
fetched.
2. Indirect Cycle: If an instruction specifies an indirect address, an indirect cycle
must precede the execute cycle. During the indirect cycle, the address field of the
instruction is transferred to the memory address register (MAR). This address is
then used to fetch the address of the operand. Finally, the address field of the
instruction register (IR) is updated from the memory buffer register (MBR),
converting it from an indirect address to a direct address.
3. Execute Cycle: The execute cycle is where the actual execution of the instruction
takes place. It follows the fetch and indirect cycles. The execute cycle involves
performing the necessary operations specified by the instruction. The micro-
operations within the execute cycle are repeated for each instruction.
4. Interrupt Cycle: The interrupt cycle occurs when an interrupt signal is received
by the processor. It interrupts the normal instruction cycle and handles the
interrupt request. After the interrupt cycle, the processor resumes the instruction
cycle by going back to the fetch cycle.
5. INSTRUCTION CYCLE
The control unit of the processor is responsible for performing certain functions. These
functional requirements serve as the basis for designing and implementing the control
unit. By breaking down the processor's operation into elementary micro-operations, we
can precisely define what the control unit needs to make happen. This three-step
process involves defining the basic elements of the processor, describing the micro-
operations it performs, and determining the functions the control unit must carry out to
execute these micro-operations.
The control of the processor involves the management and coordination of its basic
functional elements. These elements include the ALU (Arithmetic Logic Unit), registers,
internal data paths, external data paths, and the control unit.
Basic Functional Elements of the Processor
1. ALU: The ALU performs arithmetic and logic operations using input from
registers.
2. Registers: Registers store data and instructions that are used by the processor.
3. Internal Data Paths: These paths allow data to be transferred between different
components within the processor.
4. External Data Paths: These paths enable data transfer between the processor and
external devices.
5. Control Unit: The control unit is responsible for sequencing and executing micro-
operations in the proper sequence based on the program being executed.
The execution of a program involves operations that utilize these processor elements.
These operations can be categorized into four types: transferring data between
registers, transferring data between a register and an external interface, transferring
data from an external interface to a register, and performing arithmetic or logic
operations using registers as input and output.
By defining the basic elements of the processor and describing the micro-operations it
performs, we can determine the functions that the control unit must perform to cause
these micro-operations to be executed. The control unit performs two main tasks:
sequencing, which involves stepping through micro-operations in the correct order
based on the program, and execution, which involves performing each micro-operation.
The control unit is responsible for two main tasks in the processor: sequencing and
execution.
Sequencing: The control unit ensures that the processor follows the correct sequence
of micro-operations based on the program being executed. It controls the order in
which these micro-operations are performed.
The instruction register is used by the control unit to determine the opcode and
perform different actions based on the instruction. This can be achieved using a
decoder, which takes an encoded input and produces a single output. A decoder
typically has n binary inputs and 2^n binary outputs, with each input pattern activating a
unique output.
The clock portion of the control unit generates a repetitive sequence of pulses, which is
useful for measuring the duration of micro-operations. Overall, a hardwired
implementation involves the internal logic of the control unit producing output control
signals based on its input signals.
Hardwired implementation Control Unit Inputs:
The key inputs of a hardwired implementation control unit are the instruction register,
the clock, flags, and control bus signals. The instruction register is used to decode the
opcode and perform different actions for different instructions. The clock issues a
repetitive sequence of pulses, which is useful for measuring the duration of micro-
operations.
The internal logic of the control unit in a hardwired implementation produces output
control signals based on its input signals. It functions as a state machine circuit,
transforming input logic signals into a set of output logic signals, which are the control
signals. Each control signal is derived from a Boolean expression as a function of the
input signals.
Microinstruction Formats
There are different types of pipelines, such as arithmetic pipelines and instructional
pipelines. Arithmetic pipelines handle different stages of arithmetic operations, while
instructional pipelines handle stages of instruction fetch and execution.
Types of Pipelining
1. Arithmetic Pipeline: This type of pipeline is used for handling different stages of
an arithmetic operation. It is commonly used for floating point operations and
multiplication of fixed-point numbers. The four stages of floating-point addition
and subtraction in an arithmetic pipeline are: comparing the exponents, aligning
the mantissas, adding or subtracting the mantissas, and producing the result.
2. Instructional Pipeline: This type of pipeline is used for handling different stages of
an instruction fetch and execution. It allows for the overlapping of fetch, decode,
and execute phases of an instruction cycle. Instructions are read from memory
while previous instructions are being executed in other segments of the pipeline.
Pipelining Organization
The pipeline organization typically includes stages such as instruction fetch, decode,
execute, memory access, and write back. Each stage processes a different instruction,
and interstage buffers are used to transfer data between stages. The pipeline allows for
the overlapping of instruction execution, improving overall system performance.
In the pipeline organization, interstage buffers play a crucial role in transferring data
between stages. For example, buffer B1 feeds the decode stage with a newly-fetched
instruction, while buffer B2 feeds the compute stage with operands and other relevant
data. Buffer B3 holds the result of the ALU operation, and buffer B4 feeds the write stage
with data to be written into the register file.
Pipelining Issues
There are two types of memory-related stalls that can cause delays in the pipeline. The
first type occurs when an instruction or data is not found in the cache, resulting in a
cache miss. This can significantly increase the number of cycles required for memory
access.
The second type of memory-related stall occurs when there is a data dependency
involving a Load instruction. In this case, the compiler can eliminate the one-cycle stall
by reordering instructions to insert a useful instruction between the Load instruction
and the instruction that depends on the data read from the memory.
Overall, memory delays can have a significant impact on the performance of a pipeline,
and optimizing memory accesses is crucial for improving the overall efficiency and
throughput of the pipeline.
Branch Delay
Branch delays refer to the impact of branch instructions on the execution sequence in a
pipelined processor. When a branch instruction is encountered, it needs to be executed
to determine whether and where to branch. This introduces a delay in the pipeline,
known as the branch penalty.
Unconditional branches typically have a two-cycle delay as a branch penalty. This means
that the execution time of a program can increase by as much as 40 percent due to the
relatively high frequency of branch instructions. To reduce the branch penalty, the
branch target address needs to be computed earlier in the pipeline.
Conditional branches, on the other hand, require the branch condition to be tested as
early as possible to limit the branch penalty. By moving the branch decision to the
Decode stage, a common branch penalty of only one cycle can be achieved for all
branch instructions.
Performance Evaluation in Pipelining
1. Single Instruction, Single Data (SISD): In this type, a single processor executes
a single instruction stream to operate on data stored in a single memory.
Uniprocessors fall into this category.
2. Multiple Instruction, Single Data (MISD): This type involves a sequence of data
transmitted to a set of processors, each executing a different instruction
sequence. However, this structure is not commercially implemented.
3. Single Instruction, Multiple Data (SIMD): SIMD processing involves a single
machine instruction controlling the simultaneous execution of multiple
processing elements on a lockstep basis. Each processing element has its own
associated data memory, allowing each instruction to be executed on a different
set of data by different processors. Vector and array processors fall into this
category.
4. Multiple Instruction, Multiple Data (MIMD): MIMD systems consist of a set of
processors simultaneously executing different instruction sequences on different
data sets. Symmetric multiprocessors (SMPs), clusters, and NUMA systems fall
into this category.
These parallel processor systems enable efficient and concurrent processing of tasks,
leading to improved performance and scalability in various real-world applications.
Symmetric Multiprocessors
In SMPs, each processor may have its own private main memory and I/O channels in
addition to the shared resources. The time-shared bus is a common mechanism used to
construct SMP systems. The structure and interfaces of SMPs are similar to those of
single-processor systems that use a bus interconnection.
SMPs provide a scalable and efficient solution for parallel processing tasks. They allow
for the simultaneous execution of independent tasks by multiple processors. The shared
memory in SMPs enables tasks running on different processors to access shared
variables using the same addresses.
A vectorizing compiler can recognize such loops and generate vector instructions, as
long as the loops are not too complex. The compiler must identify that the computation
in each pass through the loop is independent of the other passes, and that the same
operation can be performed simultaneously on multiple elements. It is assumed that the
number of passes is evenly divisible by the vector length for simplicity.
Shared-memory multiprocessors
There are two types of shared-memory multiprocessors: Uniform Memory Access (UMA)
and Non-Uniform Memory Access (NUMA). In a UMA multiprocessor, all accesses from
the processors to the memory modules have the same network latency. On the other
hand, in a NUMA multiprocessor, accessing local memory has lower latency compared
to accessing remote memory, which requires passing through the network.
Cache Coherence
Cache coherence refers to the issue of maintaining a consistent view of shared data in
multiple caches in a shared-memory multiprocessor system. In such systems, each
variable in a program has a unique address location in the memory, which can be
accessed by any processor. However, copies of shared data may reside in several caches,
leading to the possibility of inconsistent values.
When a processor writes to a shared variable in its own cache, all other caches that
contain a copy of that variable will have the old, incorrect value. To ensure cache
coherence, these other caches must be informed of the change so that they can either
update their copy to the new value or invalidate it.
Cache coherence can be maintained through different protocols, such as the write-
through protocol and the write-back protocol. The write-through protocol updates the
values in other caches, while the write-back protocol is based on the concept of
ownership of a block of data in the memory.
Overall, cache coherence is essential for ensuring a consistent view of shared data in a
shared-memory multiprocessor system, allowing for correct and reliable execution of
parallel programs.
Write-Through Protocol
Write-Back Protocol
The write-back protocol is another cache coherence protocol that is based on the
concept of ownership of a block of data in memory. Initially, the memory is the owner of
all blocks, and when a processor reads a block, it places a copy in its cache. If a
processor wants to write to a block in its cache, it must first become the exclusive owner
of that block. When another processor wants to read a modified block, the request is
forwarded to the current owner, and the data is sent to the requesting processor by the
current owner. The data is also sent to the appropriate memory module, which
reacquires ownership and updates the block. This protocol ensures coherence by
managing ownership and updating of modified blocks.