Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

COA Important Questions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

 Unit 1

 von Neumann vs Harvard


 CPU performance sums
 Amdahl’s law derive and sums
 MIPS theory n sums

 UNIT 2
 Multiplication
 Division
 Instructions fetch from memory
 Register organization
 CPU structure
 RISC vs CISC

 Von Neumann vs Harvard


 RISC vs CISC
 State and derive Amdahl's law
 MIPS theory
 Register organization
 CPU structure
 Instructions fetch from memory
 ISA structure
 evolution of x86

 Multiplication
 Division
 CPU performance sums
 Amdahl’s law sums
 MIPS sums

SOLVE https://drive.google.com/drive/folders/1Lb8Lvirsto04N3FniTGcfl8-YC_JQopU

https://drive.google.com/drive/folders/1UQyBmLkp3t3nOd4rZsdM5_SN3oi3zf05

Aspect Von Neumann Architecture Harvard Architecture


Memory Single memory for both data and Separate memory for data and instructions
Structure instructions (program code)
Also called as stored program concept
Interference Interference between data and program No interference between data and program
Security Program and code is not protected as Program and code is protected due to no
data and program can interfere interference
Instruction and Instructions and data share the same bus Separate buses for instructions and data (no
Data Flow (bottleneck possible) bottleneck)
Memory Access Slower due to shared memory access Faster since data and instructions can be
Speed fetched simultaneously
Complexity Simpler design, as there is only one More complex due to separate memory and
memory system bus systems
Cost Generally less expensive to implement More expensive due to additional hardware
for separation
Flexibility More flexible, as memory can be Less flexible, as memory allocation is fixed
dynamically allocated between data and between data and instructions
instructions
Usage Commonly used in general-purpose Commonly used in specialized systems like
computers (laptops, desktops) embedded systems, DSPs and
Microcontrollers

Feature RISC (Reduced Instruction Set CISC (Complex Instruction Set


Computer) Computer)
Instruction Set Small, simple set of instructions. Large, complex set of instructions.
Instruction Length Fixed length (usually same size for all Variable length (different sizes for
instructions). different instructions).
Instruction Simple instructions that are executed in Complex instructions that can perform
Complexity a single cycle. multiple operations.
Addressing Modes Few addressing modes.(3 to 5) Many addressing modes.(12 to 24)
Pipelining Easily supports pipelining due to Difficult to implement pipelining due to
uniform instructions. variable instruction lengths.
Memory Access Load/store architecture (only load/store Many instructions can directly access
instructions access memory). memory.
Registers Large number of general-purpose Fewer general-purpose registers.
registers.
Code Size Larger code size, as more instructions Smaller code size, as complex instructions
are required for a task. can perform multiple tasks.
Improvement in Frequency can be increased to improve Frequency cannot be increased to improve
performance performance performance as it is limited by slowness of
slowest unit
Examples of ARM, MIPS, PowerPC. Intel x86, VAX, IBM System/360.
Processors

Amdahl's law is used to quantify performance improvement of entire system


when there is an improvement in fraction of the system

The overall performance improvement gained by optimizing a single part of a


system is limited by the fraction of time that the improved part is actually
used.
Execution time before improvement : (1-F)+F
Execution time after improvement : (1-F)+F/S

Speedup= 1/(1-F)+F/S

MIPS (Million Instructions Per Second) is a measure of a computer's processor speed. It refers to how many
millions of instructions a processor can execute in one second. It's often used as a performance metric to
compare processors based on how fast they execute instructions.

 Instruction execution speed: MIPS represents the rate at which a processor executes instructions. A
higher MIPS value generally indicates faster performance.
 Simple comparison metric: It provides a simple way to compare processors, but it doesn't give a
complete picture of overall system performance because it only counts instruction execution, without
considering the complexity of different instructions.
 CPU-dependent: MIPS is highly dependent on the Instruction Set Architecture (ISA) of the
processor and the efficiency of the instructions being executed.

We need to satisfy the following conditions if we want to use MIPS for comparison of 2 microprocessors

1. Compilers must be same


2. Program must be same (IC must be same)
3. ISA must be same
In a computer system, buses are essential components used for communication between various parts of the system,
such as the CPU, memory, and input/output (I/O) devices. They serve as pathways that carry data, addresses, and
control signals, enabling components to work together in harmony.

Data Bus:

 Purpose: Carries data between the CPU, memory, and I/O devices.

Address Bus:

 Purpose: Carries the memory addresses from the CPU to memory or I/O devices.

Control Bus:

 Purpose: Carries control signals from the CPU to other components, determining how the system functions.
General Purpose Registers

• AX
— Accumulator Register
— Preferred register to use in arithmetic, logic and data transfer instructions because it generates the
shortest Machine Language Code
— Must be used in multiplication and division operations
— Must also be used in 1/0 operations
 BX
— Base Register
— Also serves as an address register
 CX
— Count register
— Used as a loop counter
— Used in shift and rotate operations
 DX
— Data register
— Used in multiplication and division
— Also used in 1/0 operations
Pointers and indexes

 BP: stack segment


 SP: top of stack segment
 All 16 bits wide, L/H bytes are not accessible
 Used as memory pointers
 Example: MOV AH, [SII]
 Move the byte stored in memory location whose address is contained in register Sl to register AH
 IP is not under direct control of the programmer

IAS Structure
X86 Evolution

 8080
 First General Purpose Microprocessor
 8-bit data path
 8086
 16 bit
 Instruction cache
 8088
 Used in first IBM PC
 80286
 16MB memory instead of 1MB
 80386
 First processor to support multitasking
 Translated 32-bit virtual address to 32-bit physical address
 80486
 Co-processor for math
 Pentium
 Used super scalar technique as multiple instructions started executing in parallel
 Pentium Pro
 32 bit virtual to 36-bit physical
 Pentium II
 Able to process audio, video and graphics efficiently
 Pentium 4
 48 bit to 48 bit
 Core
 First with 2 processors in single chip
 Core 2
 64 bit
 Core 2 Quad
 4 processors in single chip

1. Computer Architecture

 How to integrate components to build a computer system to achieve desired results


 Definition: It refers to the design of a computer system's overall structure, including the instruction set,
data formats, hardware-software interaction, and how the system performs high-level tasks.
 Focus: It focuses on the functionality and capabilities of the computer, including performance,
efficiency, and scalability.
 Concerned with:
o Instruction set architecture (ISA)
o Memory addressing modes
o Data types and structures
o Input/output mechanisms
 Example: A decision on whether a system uses 32-bit or 64-bit architecture, RISC (Reduced Instruction
Set Computer) vs. CISC (Complex Instruction Set Computer).
2. Computer Organization

 Actual implementation of computer in hardware.


 Definition: It deals with the operational aspects of a computer system and how hardware components
are interconnected and controlled to execute instructions defined by the architecture.
 Focus: It focuses on the implementation of architectural features and how the system is physically
structured.
 Concerned with:
o CPU design (ALU, registers, control unit)
o Data paths
o Memory hierarchy (cache, RAM, registers)
o How instructions are executed, fetched, and decoded
 Example: How a CPU uses its ALU to execute operations, or how the memory is accessed during
program execution.

1. Hardware (Technology):

 Impact on Clock Time: Advances in hardware technology, such as smaller transistor sizes and better
fabrication techniques, can reduce the time it takes to complete a single clock cycle. Faster transistors
mean quicker switching, which lowers the clock cycle time, allowing for higher clock speeds.

2. CPU Organization:

 Impact on Clock Time: Improved parallelism or pipelining can often offset this and lead to better
overall performance.
 Impact on CPI: The organization of the CPU can have a significant impact on CPI. Techniques like
pipelining, superscalar execution, and out-of-order execution can reduce the number of cycles per
instruction by allowing multiple instructions to be processed simultaneously.

3. Instruction Set Architecture (ISA):

 Impact on CPI: A RISC architecture typically has a lower CPI because its instructions are designed to
execute in a fixed number of cycles (often one cycle). In contrast, CISC instructions may take multiple
cycles to complete, leading to a higher CPI.
 Impact on Instruction Count: A CISC architecture might reduce the instruction count since complex
operations can be performed in a single instruction. On the other hand, RISC architectures may require
more instructions to perform the same task, increasing the instruction count.

4. Compiler:
 Impact on CPI: A well-optimized compiler can reduce CPI by generating efficient machine code that
takes advantage of the CPU’s architecture, such as minimizing cache misses, pipeline stalls, and making
efficient use of hardware resources.
 Impact on Instruction Count: The compiler has a direct impact on instruction count. A good compiler
can optimize the code, reducing the number of instructions required to perform a task through
techniques like loop unrolling, in lining functions, and eliminating unnecessary instructions.

5. Program (Workload or Code):

 Impact on CPI: Programs with frequent cache misses, branch mispredictions, or complex dependencies
between instructions can increase CPI. For example, memory-bound programs or programs with poor
locality of reference (requiring frequent memory accesses) will result in higher CPI due to delays in
fetching data.
 Impact on Instruction Count: Different algorithms or program designs can lead to vastly different
instruction counts. A well-optimized program can reduce the number of instructions required to
complete a task, while inefficient code can unnecessarily increase instruction count
 Set Computer) in table format:

MSE22

1.

Given Data:

 P1: Clock rate = 3 GHz, CPI = 1.5


 P2: Clock rate = 2.5 GHz, CPI = 1.0
 P3: Clock rate = 4.0 GHz, CPI = 2.2

(a) Which processor has the highest performance expressed in instructions per second?

The performance in terms of instructions per second can be calculated using the formula:

Instructions per second=Clock RateCPI\text{Instructions per second} = \frac{\text{Clock Rate}}{\


text{CPI}}Instructions per second=CPIClock Rate
For P1:
Instructions per second=3×109 cycles/second1.5=2×109 instructions/second\text{Instructions per second} = \
frac{3 \times 10^9 \text{ cycles/second}}{1.5} = 2 \times 10^9 \text{
instructions/second}Instructions per second=1.53×109 cycles/second=2×109 instructions/second
For P2:
Instructions per second=2.5×109 cycles/second1.0=2.5×109 instructions/second\text{Instructions per second} =
\frac{2.5 \times 10^9 \text{ cycles/second}}{1.0} = 2.5 \times 10^9 \text{
instructions/second}Instructions per second=1.02.5×109 cycles/second=2.5×109 instructions/second
For P3:
Instructions per second=4×109 cycles/second2.2≈1.818×109 instructions/second\text{Instructions per second}
= \frac{4 \times 10^9 \text{ cycles/second}}{2.2} \approx 1.818 \times 10^9 \text{
instructions/second}Instructions per second=2.24×109 cycles/second≈1.818×109 instructions/second

Conclusion for (a):

 P2 has the highest performance with 2.5 billion instructions per second.

(b) If the processors each execute a program in 10 seconds, find the number of cycles and the number of
instructions.

The number of cycles can be calculated using:

Number of cycles=Clock Rate×Execution Time\text{Number of cycles} = \text{Clock Rate} \times \


text{Execution Time}Number of cycles=Clock Rate×Execution Time

The number of instructions can be calculated using:

Number of instructions=Number of cyclesCPI\text{Number of instructions} = \frac{\text{Number of cycles}}{\


text{CPI}}Number of instructions=CPINumber of cycles
For P1:

 Number of cycles: Cycles=3×109×10=30×109 cycles\text{Cycles} = 3 \times 10^9 \times 10 = 30 \


times 10^9 \text{ cycles}Cycles=3×109×10=30×109 cycles
 Number of instructions: Instructions=30×1091.5=20×109 instructions\text{Instructions} = \frac{30 \
times 10^9}{1.5} = 20 \times 10^9 \text{ instructions}Instructions=1.530×109=20×109 instructions

For P2:

 Number of cycles: Cycles=2.5×109×10=25×109 cycles\text{Cycles} = 2.5 \times 10^9 \times 10 = 25 \


times 10^9 \text{ cycles}Cycles=2.5×109×10=25×109 cycles
 Number of instructions: Instructions=25×1091.0=25×109 instructions\text{Instructions} = \frac{25 \
times 10^9}{1.0} = 25 \times 10^9 \text{ instructions}Instructions=1.025×109=25×109 instructions

For P3:

 Number of cycles: Cycles=4×109×10=40×109 cycles\text{Cycles} = 4 \times 10^9 \times 10 = 40 \


times 10^9 \text{ cycles}Cycles=4×109×10=40×109 cycles
 Number of instructions: Instructions=40×1092.2≈18.18×109 instructions\text{Instructions} = \
frac{40 \times 10^9}{2.2} \approx 18.18 \times 10^9 \text{ instructions}Instructions=2.240×109
≈18.18×109 instructions
Conclusion for (b):

 P1: 30 billion cycles, 20 billion instructions.


 P2: 25 billion cycles, 25 billion instructions.
 P3: 40 billion cycles, ~18.18 billion instructions.

(c) We are trying to reduce the execution time by 30% but this leads to a 20% increase in the CPI. What
clock rate should we have to get this time reduction?

Given:

 Target reduction in execution time = 30%, so new execution time = 70% of original time.
New Execution Time=0.7×10 seconds=7 seconds\text{New Execution Time} = 0.7 \times 10 \
text{ seconds} = 7 \text{ seconds}New Execution Time=0.7×10 seconds=7 seconds
 CPI increases by 20%, so new CPI = 1.2 \times original CPI.

We need to find the required clock rate using the equation:

Execution Time=CPI×Number of InstructionsClock Rate\text{Execution Time} = \frac{\text{CPI} \times \


text{Number of Instructions}}{\text{Clock Rate}}Execution Time=Clock RateCPI×Number of Instructions

Rearranging for Clock Rate:

Clock Rate=CPI×Number of InstructionsExecution Time\text{Clock Rate} = \frac{\text{CPI} \times \


text{Number of Instructions}}{\text{Execution
Time}}Clock Rate=Execution TimeCPI×Number of Instructions

Using the number of instructions calculated for each processor, we calculate the required clock rate for each
case.

For P1:

 Original CPI = 1.5, so new CPI = 1.2 × 1.5 = 1.8.


 Number of instructions = 20 billion (from part b).
 New Execution Time = 7 seconds.

Clock Rate=1.8×20×1097≈5.14×109 Hz=5.14 GHz\text{Clock Rate} = \frac{1.8 \times 20 \times 10^9}{7} \


approx 5.14 \times 10^9 \text{ Hz} = 5.14 \text{ GHz}Clock Rate=71.8×20×109≈5.14×109 Hz=5.14 GHz
For P2:

 Original CPI = 1.0, so new CPI = 1.2 × 1.0 = 1.2.


 Number of instructions = 25 billion.
 New Execution Time = 7 seconds.

Clock Rate=1.2×25×1097≈4.29×109 Hz=4.29 GHz\text{Clock Rate} = \frac{1.2 \times 25 \times 10^9}{7} \


approx 4.29 \times 10^9 \text{ Hz} = 4.29 \text{ GHz}Clock Rate=71.2×25×109≈4.29×109 Hz=4.29 GHz
For P3:

 Original CPI = 2.2, so new CPI = 1.2 × 2.2 = 2.64.


 Number of instructions = 18.18 billion.
 New Execution Time = 7 seconds.

Clock Rate=2.64×18.18×1097≈6.86×109 Hz=6.86 GHz\text{Clock Rate} = \frac{2.64 \times 18.18 \times


10^9}{7} \approx 6.86 \times 10^9 \text{ Hz} = 6.86 \text{ GHz}Clock Rate=72.64×18.18×109
≈6.86×109 Hz=6.86 GHz

Conclusion for (c):

 P1 needs a clock rate of 5.14 GHz.


 P2 needs a clock rate of 4.29 GHz.
 P3 needs a clock rate of 6.86 GHz.

3.

In a computer system, an instruction goes through a series of states from the moment it is fetched from memory
until it is executed. The sequence of these states can be represented using a Finite State Machine (FSM),
which helps visualize the flow of control and data in a processor’s datapath during instruction execution.

States of Instruction Execution:

The execution of an instruction can generally be divided into the following states:

1. Instruction Fetch (IF):


o The processor fetches the instruction from memory using the Program Counter (PC) and loads it
into the Instruction Register (IR).
o Data Flow:
 PC -> Memory -> Instruction Register (IR)
 Increment PC for the next instruction.
2. Instruction Decode (ID):
o The instruction in the IR is decoded to determine what action needs to be performed. Operands
(if any) are identified, and control signals are generated.
o Data Flow:
 Control unit decodes IR and generates appropriate control signals.
 Registers are accessed to read the operands.
3. Operand Fetch (OF):
o In some cases, the operands may be fetched from memory or registers. If needed, memory
addresses are calculated using registers or immediate values.
o Data Flow:
 Register or Memory -> Operand registers (if applicable).
4. Execution (EX):
o The actual operation specified by the instruction is performed. This could be an arithmetic
operation, a logic operation, or an address calculation.
o Data Flow:
 ALU performs the necessary operations, such as addition, subtraction, or logic
operations.
Operands -> ALU -> Result

5. Memory Access (MEM) (only if required):
o For instructions that involve memory operations (e.g., load/store), the calculated memory address
is used to either read data from or write data to memory.
o Data Flow:
 Load: Memory -> Register
 Store: Register -> Memory
6. Write Back (WB):
o The result of the operation (either from memory or the ALU) is written back to the destination
register.
o Data Flow:
 Result -> Destination Register
7. Instruction Commit:
o The instruction's result is committed, updating the program state if needed. The instruction
completes its execution.

+-------------------+
| | (Start)
| Instruction Fetch |
+-------------------+
|
V
+-------------------+
| |
| Instruction Decode |
+-------------------+
|
V
+-------------------+ --> Memory Operation? -------> No
| | |
| Operand Fetch | V
+-------------------+ +-------------------+
| | |
V | Memory Access |
+-------------------+ +-------------------+
| | |
| Execution | V
+-------------------+ +-------------------+
| | |
V | Write Back |
+-------------------+ +-------------------+
| | |
| Write Back/Commit | <------------------------------+
+-------------------+
|
(End)

4.
+--------------------+
| Instruction |
| Register (IR) |
+--------------------+
|
V
+------------------+---------------------+
| | |
V V V
+--------------+ +----------------+ +--------------+
| Program | | Memory Address | | Memory Data |
| Counter (PC)| | Register (MAR) | | Register (MDR)|
+--------------+ +----------------+ +--------------+
| | |
V V |
+--------------+ +--------------------+ |
| ALU |<----| Memory Unit |<-------+
+--------------+ +--------------------+
| |
V |
+--------------+ |
| General-Purpose |<-------------------+
| Registers (R1, |
| R2, etc.) |
+--------------+

(a) Conceptual Difference and Necessity of Visible Register (PC) and Hidden Register (MAR)

 Program Counter (PC):


o Visible: The PC is a visible register because it holds the address of the next instruction to be
executed. It's used in the instruction-fetch phase and is visible to the programmer as it directly
influences program execution.
o Necessity: The PC is essential because it controls the sequence of instruction execution by
keeping track of which instruction to fetch next.
 Memory Address Register (MAR):
o Hidden: The MAR is consideredx a hidden register because it is not directly accessible by the
programmer. It temporarily holds the memory address that is being accessed for a load/store
operation.
o Necessity: The MAR is necessary for proper memory operations. It holds the memory address
for fetching/storing data, allowing the CPU to interact with memory. Without the MAR, memory
accesses would be inefficient and hard to manage, especially with multiple data-fetching steps in
complex operations.

(b) Control Sequence for ADD R2, LOCA (Direct Addressing Mode)

In this instruction (ADD R2, LOCA), the content at the memory location LOCA is added to the contents of
register R2, and the result is stored back into R2. The instruction uses direct addressing mode, meaning the
operand is located directly at the specified memory address (LOCA).

Control Sequence for ADD R2, LOCA (Direct Addressing Mode)

1. Fetch the instruction:


o MAR ← PC
(Load the memory address from the Program Counter (PC) into the Memory Address Register
(MAR)).
o MDR ← Mem[MAR]
(Read the instruction from memory into the Memory Data Register (MDR)).
o IR ← MDR
(Load the instruction from the MDR into the Instruction Register (IR)).
o PC ← PC + 4
(Increment the PC to point to the next instruction).
2. Decode the instruction:
oMAR ← IR[Operand]
(Load the memory address of the operand from the instruction into the MAR).
3. Fetch the operand:
o MDR ← Mem[MAR]
(Fetch the operand (content at LOCA) from memory into the MDR).
4. Execute the addition:
o R2 ← R2 + MDR
(Add the contents of R2 and MDR, and store the result in R2).

Given:

 VAX11/780:
o Frequency: 5 MHz
o Performance: 1 MIPS
o CPU Time: 12 * xxx seconds

 IBM RS/6000:
o Frequency: 25 MHz
o Performance: 18 MIPS
o CPU Time: xxx seconds

We are tasked with answering two questions based on this data:

(a) What is the relative size of the instruction count of the machine code for this benchmark program
running on the two machines?

 The number of instructions executed (Instruction Count) is calculated as:

Instruction Count=MIPS×CPU Time\text{Instruction Count} = \text{MIPS} \times \text{CPU


Time}Instruction Count=MIPS×CPU Time

Let the instruction count of VAX11/780 be IVAXI_{\text{VAX}}IVAX and the instruction count of IBM
RS/6000 be IIBMI_{\text{IBM}}IIBM.

For VAX11/780:

IVAX=1×(12⋅x)=12x instructionsI_{\text{VAX}} = 1 \times (12 \cdot x) = 12x \text{ instructions}IVAX


=1×(12⋅x)=12x instructions

For IBM RS/6000:

IIBM=18×x=18x instructionsI_{\text{IBM}} = 18 \times x = 18x \text{ instructions}IIBM


=18×x=18x instructions

Now, the relative size of the instruction count is the ratio IVAXIIBM\frac{I_{\text{VAX}}}{I_{\
text{IBM}}}IIBMIVAX:
IVAXIIBM=12x18x=1218=23\frac{I_{\text{VAX}}}{I_{\text{IBM}}} = \frac{12x}{18x} = \frac{12}{18}
= \frac{2}{3}IIBMIVAX=18x12x=1812=32

Thus, the relative size of the instruction count of VAX11/780 to IBM RS/6000 is 2:3.

(b) What are the CPI values for the two machines?

The CPI (Cycles Per Instruction) can be calculated using the formula:

CPI=Clock CyclesInstruction Count\text{CPI} = \frac{\text{Clock Cycles}}{\text{Instruction


Count}}CPI=Instruction CountClock Cycles

The number of clock cycles is given by:

Clock Cycles=CPU Time×Clock Frequency\text{Clock Cycles} = \text{CPU Time} \times \text{Clock


Frequency}Clock Cycles=CPU Time×Clock Frequency
For VAX11/780:
Clock Cycles (VAX)=(12⋅x)×5×106=60×106⋅x\text{Clock Cycles (VAX)} = (12 \cdot x) \times 5 \times 10^6
= 60 \times 10^6 \cdot xClock Cycles (VAX)=(12⋅x)×5×106=60×106⋅x

Since the instruction count for VAX is 12x12x12x:

CPI (VAX)=60×106⋅x12x=5\text{CPI (VAX)} = \frac{60 \times 10^6 \cdot x}{12x} =


5CPI (VAX)=12x60×106⋅x=5
For IBM RS/6000:
Clock Cycles (IBM)=x×25×106=25×106⋅x\text{Clock Cycles (IBM)} = x \times 25 \times 10^6 = 25 \times
10^6 \cdot xClock Cycles (IBM)=x×25×106=25×106⋅x

Since the instruction count for IBM is 18x18x18x:

CPI (IBM)=25×106⋅x18x=2518≈1.39\text{CPI (IBM)} = \frac{25 \times 10^6 \cdot x}{18x} = \frac{25}{18} \


approx 1.39CPI (IBM)=18x25×106⋅x=1825≈1.39

Conclusion:

 The relative instruction count is 2:3 (VAX to IBM).


 The CPI for VAX11/780 is 5.
 The CPI for IBM RS/6000 is approximately 1.39.

You might also like