Computer Architechture Notes
Computer Architechture Notes
Computer architecture is a set of rules and methods that describe the functionality, organization,
and implementation of computer systems.
Computer Architecture
Computer Architecture is a blueprint for design and implementation of a computer system. It
provides the functional details and behaviour of a computer system and comes before
computer organization. Computer architecture deals with 'What to do?'
Computer Organization
Computer Organization is how operational parts of a computer system are linked together. It
implements the provided computer architecture. Computer organization deals with 'How to
do?'
The system performs different operations on the same data set. Machines built
using the MISD model are not useful in most of the application, a few machines
are built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets. Each PE in the MIMD model has
separate instruction and data streams; therefore machines built using this
model are capable to any kind of application. Unlike SIMD and MISD
machines, PEs in MIMD machines work asynchronously.
Computer Registers
Registers are a type of computer memory used to quickly accept, store, and transfer data
and instructions that are being used immediately by the CPU. The registers used by the CPU
are often termed as Processor registers.
A processor register may hold an instruction, a storage address, or any data (such as bit
sequence or individual characters).
The computer needs processor registers for manipulating data and a register for holding a
memory address. The register holding the memory location is used to calculate the address
of the next instruction after the execution of the current instruction is completed.
Following is the list of some of the most common registers used in a basic computer:
Memory
Count in words
4096 x 16
000000000000
000000000001
000000000010
Instruction
I Opcode Operand
0 or 1 3bit 12 bit
R2 <- R1 R3 <- R1 + R2
In a computer system, data transfer takes place between processor registers and memory
and between processor registers and input-output systems. These data transfer can be
represented by standard notations given below:
o Input-output registers are represented by names such as DATA IN, DATA OUT and so
on.
Micro-operations :
The operation executed on the data store in registers are called micro-operations.
They are detailed low-level instructions used in some designs to implement
complex machine instructions.
Register Transfer :
The information transformed from one register to another register is represented in
symbolic form by replacement operator is called Register Transfer.
Replacement Operator :
<- acts as a replacement operator. This statement defines the transfer of content of
register R1 into register R2.
4. A 16-bit register PC is divided into 2 parts- Bits (0 to 7) are assigned with lower
byte of 16-bit address and bits (8 to 15) are assigned with higher bytes of 16-bit
address as shown in (d).
MAR, R1,
Letters and Numbers Denotes a Register R2
R1(8-bit)
R1(0-7)
() Denotes a part of register
R1 <- R2
R2 <- R1
P : R2 <- R1
if P=1
: Denotes conditional operations
The operation performed on the data stored in the registers are referred to as
register transfer operations.
The content of R1 are copied into R2 without affecting the content of R1. It is an
unconditional type of transfer operation.
2. Conditional Transfer –
Register Transfer
The term Register Transfer refers to the availability of hardware logic circuits that can
perform a given micro-operation and transfer the result of the operation to the same or
another register.
Most of the standard notations used for specifying operations on various registers are stated
below.
o R1 (Processor Register).
o We can also indicate individual bits by placing them in parenthesis. For instance, PC
(8-15), R2 (5), etc.
o Data Transfer from one register to another register is represented in symbolic form
by means of replacement operator. For instance, the following statement denotes a
transfer of the data of register R1 into register R2.
1. R2 ← R1
o Typically, most of the users want the transfer to occur only in a predetermined
control condition. This can be shown by following if-then statement:
If (P=1) then (R2 ← R1); Here P is a control signal generated in the control section.
o It is more convenient to specify a control function (P) by separating the control
variables from the register transfer operation. For instance, the following statement
defines the data transfer operation under a specific control function (P).
1. P: R2 ← R1
The following image shows the block diagram that depicts the transfer of data from R1 to
R2.
Arithmetic Micro-operations
In general, the Arithmetic Micro-operations deals with the operations performed on numeric
data stored in the registers.
1. Addition
2. Subtraction
3. Increment
4. Decrement
3. Transfer/Load, etc.
The following table shows the symbolic representation of various Arithmetic Micro-
operations.
Binary Adder
The Add micro-operation requires registers that can hold the data and the digital
components that can perform the arithmetic addition.
A Binary Adder is a digital circuit that performs the arithmetic sum of two binary numbers
provided with any length.
A Binary Adder is constructed using full-adder circuits connected in series, with the output
carry from one full-adder connected to the input carry of the next full-adder.
The following block diagram shows the interconnections of four full-adder circuits to provide
a 4-bit binary adder.
o The augend bits (A) and the addend bits (B) are designated by subscript numbers
from right to left, with subscript '0' denoting the low-order bit.
o The carry inputs starts from C0 to C3 connected in a chain through the full-adders. C4
is the resultant output carry generated by the last full-adder circuit.
o The output carry from each full-adder is connected to the input carry of the next-
high-order full-adder.
o The sum outputs (S0 to S3) generates the required arithmetic sum of augend and
addend bits.
o The n data bits for the A and B inputs come from different source registers. For
instance, data bits for A input comes from source register R1 and data bits
for B input comes from source register R2.
o The arithmetic sum of the data inputs of A and B can be transferred to a third register
or to one of the source registers (R1 or R2).
Binary Adder-Subtractor
The Arithmetic micro-operations like addition and subtraction can be combined into one
common circuit by including an exclusive-OR gate with each full adder.
The block diagram for a 4-bit adder-subtractor circuit can be represented as:
o When the mode input (M) is at a low logic, i.e. '0', the circuit act as an adder and
when the mode input is at a high logic, i.e. '1', the circuit act as a subtractor.
o The exclusive-OR gate connected in series receives input M and one of the inputs B.
Logic Microoperations:
Logic Microoperations specify binary operations performed for strings of bits in registers. These
operations consider each bit of the register separately and treat them as binary variables.
Example R3←R1 ⊕ R2
R1 1010
R2 ⊕ 1100
In this table, each of the 16 columns F0 through F15 represents a truth table of one possible Boolean
function for the
Note that the functions are determined from the 16 binary combinations that can be assigned to F.
The 16 Boolean functions of two variables x and y are expressed in algebraic form in the first column
of Table below.
The 16 logic microoperations are derived from these functions by replacing variable x by the binary
content of register A and variable y by the binary content of register B.
It is important to realize that the Boolean functions listed in the first column of Table below represent
a relationship between two binary variables x and y.
The logic microoperations listed in the second column represent a relationship between the binary
content of two registers A and B.
Each bit of the register is treated as a binary variable and the microoperation is performed on the
string of bits stored in the registers.
Hardware Implementation
The hardware implementation of logic rnicrooperations requires that logic gates be inserted
for each bit or pair of bits in the registers to perform the required logic function.
Although there are 16 logic rnicrooperations, most computers use only four-AND, OR,
XOR (exclusive-OR), and complement from which all others can be derived.
Figure below shows one stage of a circuit that generates the four basic logic
rnicrooperations .
It consists of four gates and a multiplexer. Each of the four logic operations is generated
through a gate that performs the required logic.
The outputs of the gates are applied to the data inputs of the multiplexer. The two selection
inputs S1 and S0 choose one of the data inputs of the multiplexer and direct its value to the
output.
The diagram shows one typical stage with subscript i. For a logic circuit with n bits, the
diagram must be repeated n times for i = 0, 1, 2, ... , n - 1.
The selection variables are applied to all stages. The function table in Fig. below lists the
logic rnicrooperations obtained for each combination of the selection variables.
* V Imp Bus and Memory Transfers
A digital system composed of many registers, and paths must be provided to transfer information
from one register to another. The number of wires connecting all of the registers will be excessive if
separate lines are used between each register and all other registers in the system.
A bus structure, on the other hand, is more efficient for transferring information between registers in
a multi-register configuration system.
A bus consists of a set of common lines, one for each bit of register, through which binary
information is transferred one at a time. Control signals determine which register is selected by the
bus during a particular register transfer.
The following block diagram shows a Bus system for four registers. It is constructed with the help of
four 4 * 1 Multiplexers each having four data inputs (0 through 3) and two selection inputs (S1 and
S2).
We have used labels to make it more convenient for you to understand the input-output configuration
of a Bus system for four registers. For instance, output 1 of register A is connected to input 0 of
MUX1.
The two selection lines S1 and S2 are connected to the selection inputs of all four multiplexers. The
selection lines choose the four bits of one register and transfer them into the four-line common bus.
When both of the select lines are at low logic, i.e. S1S0 = 00, the 0 data inputs of all four multiplexers
are selected and applied to the outputs that forms the bus. This, in turn, causes the bus lines to receive
the content of register A since the outputs of this register are connected to the 0 data inputs of the
multiplexers.
Similarly, when S1S0 = 01, register B is selected, and the bus lines will receive the content provided
by register B.
The following function table shows the register that is selected by the bus for each of the four
possible binary values of the Selection lines.
Unit - 2
Machine Language
Machine language, or machine code, is a low-level language comprised of binary digits (ones and
zeros). High-level languages, such as C, C++ must be compiled into machine language before the
code is run on a computer.
Since computers are digital devices, they only recognize binary data. Every program, video, image,
and character of text is represented in binary. This binary data, or machine code, is processed
as input by the CPU. The resulting output is sent to the operating system or an application, which
displays the data visually. For example, the ASCII value for the letter "A" is 01000001 in machine
code, but this data is displayed as "A" on the screen. An image may have thousands or even millions
of binary values that determine the color of each pixel.
While machine code is comprised of 1s and 0s, different processor architectures use different
machine code. For example, a PowerPC processor, which has a RISC architecture, requires different
code than an Intel x86 processor, which has a CISC architecture. A compiler must compile high-
level source code for the correct processor architecture in order for a program to run correctly.
Machine language and assembly language are both low-level languages, but machine code is below
assembly in the hierarchy of computer languages.
Assembly language includes human-readable commands, such as mov, add, and sub, while machine
language does not contain any words or even letters. Some developers manually write assembly
language to optimize a program, but they do not write machine code. Only developers who write
software compilers need to worry about machine language.
Assembly Language
While assembly languages differ between processor architectures, they often include
similar instructions and operators. Below are some examples of instructions supported
by x86 processors.
The following assembly language can be used to add the numbers 3 and 4:
Writing assembly language is a tedious process since each operation must be performed at
a very basic level. While it may not be necessary to use assembly code to create a
computer program, learning assembly language is often part of a Computer
Science curriculum since it provides useful insight into the way processors work.
ADD AH, BH ; Add the content of the ; BH register into the AH register
AND MASK1, 128 ; Perform AND operation on the ; variable MASK1 and 128
ADD MARKS, 10 ; Add 10 to the variable MARKS
The Hello World Program in Assembly The following assembly language code displays the string
'Hello World' on the screen:
section .text
global main ;must be declared for linker (ld)
main:
mov edx, len ;message length
mov ecx, msg ;message to write
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
When the above code is compiled and executed, it produces following result:
Hello, world!
Assembly - Arithmetic Instructions
The INC instruction is used for incrementing an operand by one. It works on a single operand that
can be either in a register or in memory.
Syntax
The INC instruction has the following syntax −
INC destination
The operand destination could be an 8-bit, 16-bit or 32-bit operand.
Example
INC EBX ; Increments 32-bit register
INC DL ; Increments 8-bit register
INC [count] ; Increments the count variable
The DEC instruction is used for decrementing an operand by one. It works on a single operand that
can be either in a register or in memory.
Syntax
The DEC instruction has the following syntax −
DEC destination
The operand destination could be an 8-bit, 16-bit or 32-bit operand.
The ADD and SUB instructions are used for performing simple addition/subtraction of binary data
in byte, word and doubleword size, i.e., for adding or subtracting 8-bit, 16-bit or 32-bit operands,
respectively.
Syntax
The ADD and SUB instructions have the following syntax −
ADD/SUB destination, source
The ADD/SUB instruction can take place between −
Register to register
Memory to register
Register to memory
Register to constant data
Memory to constant data
However, like other instructions, memory-to-memory operations are not possible using ADD/SUB
instructions. An ADD or SUB operation sets or clears the overflow and carry flags.
Example
The following example will ask two digits from the user, store the digits in the EAX and EBX
register, respectively, add the values, store the result in a memory location 'res' and finally display
the result.
SYS_EXIT equ 1
SYS_READ equ 3
SYS_WRITE equ 4
STDIN equ 0
STDOUT equ 1
segment .data
segment .bss
num1 resb 2
num2 resb 2
res resb 1
section .text
global _start ;must be declared for using gcc
; moving the first number to eax register and second number to ebx
; and subtracting ascii '0' to convert it into a decimal number
2 OR OR operand1, operand2
The first operand in all the cases could be either in register or in memory. The second operand could
be either in register/memory or an immediate (constant) value. However, memory-to-memory
operations are not possible. These instructions compare or match bits of the operands and set the CF,
OF, PF, SF and ZF flags.
*V V Imp Subroutine
Small program written and and stored separately and it can be called whenever required.
A subroutine is a sequence of program instructions that performs a specific task, packaged as a unit.
This unit can then be used in programs wherever that particular task should be performed.
Subroutines may be defined within programs, or separately in libraries that can be used by many
programs. In different programming languages, a subroutine may be called
a routine, subprogram, function, method, or procedure.
FIGURE 1: A main program and its set of subroutines (the main program calls two
subroutines; each of these may then call other subroutines).
Instruction format:
R 16 bit
Int a;
a=5;
int b=10;
int c=a+b;
A program residing in the memory unit of a computer consists of a sequence of instructions. These
instructions are executed by the processor by going through a cycle for each instruction.
1. Fetch Stage: The next instruction is fetched from the memory address that is currently stored
in the program counter and stored into the instruction register. At the end of the fetch
operation, the PC points to the next instruction that will be read at the next cycle.
2. Decode Stage: During this stage, the encoded instruction presented in the instruction register
is interpreted by the decoder.
o Read the effective address: In the case of a memory instruction (direct or indirect), the
execution phase will be during the next clock pulse. If the instruction has an indirect
address, the effective address is read from main memory, and any required data is fetched
from main memory to be processed and then placed into data registers (clock pulse: T 3).
If the instruction is direct, nothing is done during this clock pulse. If this is an I/O
instruction or a register instruction, the operation is performed during the clock pulse.
3. Execute Stage: The control unit of the CPU passes the decoded information as a sequence of
control signals to the relevant functional units of the CPU to perform the actions required by
the instruction, such as reading values from registers, passing them to the ALU to perform
mathematical or logic functions on them, and writing the result back to a register. If the ALU
is involved, it sends a condition signal back to the CU. The result generated by the operation
is stored in the main memory or sent to an output device. Based on the feedback from the
ALU, the PC may be updated to a different address from which the next instruction will be
fetched.
4. Repeat Cycle
The operation field of an instruction specifies the operation to be performed. This operation will be
executed on some data which is stored in computer registers or the main memory. The way any
operand is selected during the program execution is dependent on the addressing mode of the
instruction. The purpose of using addressing modes is as follows:
In implied addressing the operand is specified in the instruction itself. In this mode the data is 8
bits or 16 bits long and data is the part of instruction. Zero address instruction is designed with
implied addressing mode.
Example: INC AC
Immediate Mode
In this mode, the operand is specified in the instruction itself. An immediate mode instruction has an
operand field rather than the address field.
For example: ADD 7, which says Add 7 to contents of accumulator. 7 is the operand here.
Register Mode
In this mode the operand is stored in the register and this register is present in CPU. The instruction
has the address of the Register where the operand is stored.
In this mode, the instruction specifies the register whose contents give us the address of operand
which is in memory. Thus, the register contains the address of operand rather than the operand itself.
For Example: ADD R1, 4000 - In this the 4000 is effective address of operand.
In this, the address field of instruction gives the address where the effective address is stored in
memory. This slows down the execution, as this includes multiple memory lookups to find the
operand.
In this mode, operand is at the top of the stack. For example: ADD, this instruction will POP top two
items from the stack, add them, and will then PUSH the result to the top of the stack.
Examples of RISC processors are SUN's SPARC, PowerPC, Microchip PIC processors, RISC-V.
Advantages of RISC Processor
1. The RISC processor's performance is better due to the simple and limited number of the instruction
set.
2. It requires several transistors that make it cheaper to design.
3. RISC allows the instruction to use free space on a microprocessor because of its simplicity.
4. RISC processor is simpler than a CISC processor because of its simple and quick design, and it can
complete its work in one clock cycle.
1. The RISC processor's performance may vary according to the code executed because subsequent
instructions may depend on the previous instruction for their execution in a cycle.
2. Programmers and compilers often use complex instructions.
3. RISC processors require very fast memory to save various instructions that require a large collection
of cache memory to respond to the instruction in a short time.
CISC Processor
The CISC Stands for Complex Instruction Set Computer, developed by the Intel. It has a large
collection of complex instructions that range from simple to very complex and specialized in the
assembly language level, which takes a long time to execute the instructions. So, CISC approaches
reducing the number of instruction on each program and ignoring the number of cycles per
instruction. It emphasizes to build complex instructions directly in the hardware because the hardware
is always faster than software. However, CISC chips are relatively slower as compared to RISC chips
but use little instruction than RISC. Examples of CISC processors are VAX, AMD, Intel x86 and the
System/360.
The CISC architecture helps reduce program code by embedding multiple operations on each
program instruction, which makes the CISC processor more complex. The CISC architecture-based
computer is designed to decrease memory costs because large programs or instruction required large
memory space to store the data, thus increasing the memory requirement, and a large collection of
memory increases the memory cost, which makes them more expensive.
1. The compiler requires little effort to translate high-level programs or statement languages into
assembly or machine language in CISC processors.
2. The code length is quite short, which minimizes the memory requirement.
3. To store the instruction on each CISC, it requires very less RAM.
4. Execution of a single instruction requires several low-level tasks.
5. CISC creates a process to manage power usage that adjusts clock speed and voltage.
6. It uses fewer instructions set to perform the same instruction as the RISC.
1. CISC chips are slower than RSIC chips to execute per instruction cycle on each program.
2. The performance of the machine decreases due to the slowness of the clock speed.
3. Executing the pipeline in the CISC processor makes it complicated to use.
4. The CISC chips require more transistors as compared to RISC design.
5. In CISC it uses only 20% of existing instructions in a programming event.
RISC has simple decoding of instruction. CISC has complex decoding of instruction.
Uses of the pipeline are simple in RISC. Uses of the pipeline are difficult in CISC.
It uses a limited number of instruction that requires less It uses a large number of instruction that requires
time to execute the instructions. more time to execute the instructions.
It uses LOAD and STORE that are independent instructions It uses LOAD and STORE instruction in the
in the register-to-register a program's interaction. memory-to-memory interaction of a program.
RISC has more transistors on memory registers. CISC has transistors to store complex instructions.
The execution time of RISC is very short. The execution time of CISC is longer.
RISC architecture can be used with high-end applications CISC architecture can be used with low-end
like telecommunication, image processing, video applications like home automation, security
processing, etc. system, etc.
The program written for RISC architecture needs to take Program written for CISC architecture tends to
more space in memory. take less space in memory.
Example of RISC: ARM, PA-RISC, Power Architecture, Alpha, Examples of CISC: VAX, Motorola 68000 family,
AVR, ARC and the SPARC. System/360, AMD and the Intel x86 CPUs.
This Memory Hierarchy Design is divided into 2 main types:
1. External Memory or Secondary Memory –
Comprising of Magnetic Disk, Optical Disk, Magnetic Tape i.e. peripheral
storage devices which are accessible by the processor via I/O Module.
Auxillary memory access time is generally 1000 times that of the main memory,
hence it is at the bottom of the hierarchy.
The cache memory is used to store program data which is currently being executed in
the CPU. Approximate access time ratio between cache memory and main memory is
about 1 to 7~10
Main Memory
The memory unit that communicates directly within the CPU, Auxillary memory and
Cache memory, is called main memory. It is the central storage unit of the computer
system. It is a large and fast memory used to store data during computer operations.
Main memory is made up of RAM and ROM, with RAM integrated circuit chips holing the
major share.
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For
example: Magnetic disks and tapes are commonly used auxiliary devices. Other
devices used as auxiliary memory are magnetic drums, magnetic bubble memory and
optical disks.
It is not directly accessible to the CPU, and is accessed using the Input/Output channels.
Cache Memory
The data or contents of the main memory that are used again and again by CPU, are
stored in the cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the
data is not found in cache memory then the CPU moves onto the main memory. It also
transfers block of recent data into the cache and keeps on deleting the old data in
cache to accomodate the new one.
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio.
When the CPU refers to memory and finds the word in cache it is said to produce a hit.
If the word is not found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in which
each bit position can be compared. In this the content is compared in each bit cell which
allows very fast table lookup. Since the entire chip can be compared, contents are
randomly stored without considering addressing scheme. These chips have less storage
capacity than regular memory chips.
Associative Mapping
Direct Mapping
Set Associative Mapping
Associative Mapping
The associative memory stores both address and data. The address value of 15 bits is 5
digit octal numbers and data is of 12 bits word in 4 digit octal number. A CPU address of
15 bits is placed in argument register and the associative memory is searched for
matching address.
The above two modes of data transfer are not useful for transferring a large block
of data. But, the DMA controller completes this task at a faster rate and is also
effective for transfer of large data block.
1. Burst Mode: Here, once the DMA controller gains the charge of the system bus,
then it releases the system bus only after completion of data transfer. Till then
the CPU has to wait for the system buses.
2. Cycle Stealing Mode: In this mode, the DMA controller forces the CPU to stop
its operation and relinquish the control over the bus for a short term to DMA
controller. After the transfer of every byte, the DMA
controller releases the bus and then again requests for the system bus. In this
way, the DMA controller steals the clock cycle for transferring every byte.
3. Transparent Mode: Here, the DMA controller takes the charge of system bus
only if the processor does not require the system bus.
Now the DMA controller can be a separate unit that is shared by various I/O
devices, or it can also be a part of the I/O device interface.