Python Programming Answer Key
Python Programming Answer Key
PART – B
(5 x 13 = 65 Marks)
Q.N Scheme of
Answer
o Evaluation
11. a. Explain in detail about Control flow structures.
ANSWER: [3 Marks]
Definition: A control structure (or flow of control) is a block of programming that
analyses variables and chooses a direction in which to go based on given parameters.
[5Marks]
Selection
if
if...else
Repetition
2
while loop
Arithmetic operators
Assignment operators
Comparison operators
Logical operators
Identity operators
Membership operators
Bitwise operators
12. a. Explain String Module
**The concept of various string modules should be explained with an example. [13Marks]
String Modules are
String constants
string.ascii_letters
string.ascii_lowercase
string.ascii_uppercase
string.digits
string.hexdigits
string.octdigits
string.punctuation
string.printable
string.whitespace
String Format
String Template
ii) Explain in detail the hardwired implementation of a control unit. (6)
Hardwired Control Unit.
Hardwired control units are generally faster than micro-programmed designs. In
[2 MARKS]
hardwired control, we saw how all the control signals required inside the CPU can be
generated using a state counter and a PLA circuit.
HARDWIRED CONTROL UNIT
It is implemented as logic circuits (gates, f lip-f lops, decoders etc.) in the
hardware.
This organization is very complicated if we have a large control unit.
In this organization, if the design has to be modified or changed, requires
changes in the wiring among the various components.
Thus the modification of all the combinational circuits may be very difficult.
3
ADVANTAGES:
Hardwired Control Unit is fast because control signals are generated by
combinational circuits.
[2 MARKS]
The delay in generation of control signals depends upon the number of gates.
DISADVANTAGES
More is the control signals required by CPU; more complex will be the design of
control unit.
Modifications in control signal are very difficult. That means it requires
rearranging of wires in the hardware circuit.
It is difficult to correct mistake in original design or adding new feature in
existing design of control unit.
Hardwired control unit consist of an
Instruction Register.
Number of Control Logic Gates.
Two Decoders.
4-bit Sequence Counter.
[2 MARKS]
12. b. What is a control unit in a processor? What is its role during instruction
execution? (6)
i)
A control unit coordinates how data moves around a CPU. The control unit (CU) is
Definition
a component of a computer's central processing unit (CPU) that directs operation of [2 MARKS]
the processor. It tells the computer's memory, arithmetic/logic unit and input and output
devices how to respond to a program's instructions.
Functions of the control unit
The Control unit (CU) is digital circuitry contained within the processor that coordinates
the sequence of data movements into, out of, and between a processor's many sub-units.
The result of these routed data movements through various digital circuits (sub-units)
within the processor produces the manipulated data expected by a software instruction
(loaded earlier, likely from memory). It controls (conducts) data flow inside the
processor and additionally provides several external control signals to the rest of the
computer to further direct data and instructions to/from processor external destinations.
More precisely, the Control Unit (CU) is generally a sizable collection of complex
digital circuitry interconnecting and directing the many execution units (i.e. ALU, data
4
buffers, registers) contained within a CPU. The CU is normally the first CPU unit to
accept from an externally stored computer program, a single instruction (based on the
CPU's instruction set). The CU then decodes this individual instruction into several
sequential steps (fetching addresses/data from registers/ memory, managing execution [4
MARKS]
[i.e. data sent to the ALU or I/O], and storing the resulting data back into
registers/memory) that controls and coordinates the CPU's inner works to properly
manipulate the data. The design of these sequential steps are based on the needs of each
instruction and can range in number of steps, the order of execution, and which units are
enabled.
Thus by only using a program of set instructions in memory, the CU will configure all
the CPU's data flows as needed to manipulate the data correctly between instructions.
This result in a computer that could run a complete program and require no human
intervention to make hardware changes between instructions (as had to be done when
using only punch cards for computations before stored programmed computers with
CUs were invented).
These detailed steps from the CU dictate which of the CPU's interconnecting
hardware control signals to enable/disable or which CPU units are selected/de-selected
and the unit's proper order of execution as required by the instruction's operation to
produce the desired manipulated data. Additionally, the CU's orderly hardware
coordination properly sequences these control signals then configures the many
hardware units comprising the CPU, directing how data should also be moved, changed,
and stored outside the CPU (i.e. memory) according to the instruction's objective.
Depending on the type of instruction entering the CU, the order and number of
sequential steps produced by the CU could vary the selection and configuration of which
parts of the CPU's hardware are utilized to achieve the instruction's objective (mainly
moving, storing, and modifying data within the CPU).
ii) Draw the block diagram of a micro-programmed control unit that is capable of
executing micro-instructions with next address field. (7)
Definition
Micro programmed control unit:
[2
A control unit whose binary control variables are stored in memory is called a micro MARKS]
programmed control unit.
Micro instruction:
A symbolic micro program can be translated into its binary equivalent by means of an
assembler.
Each symbolic microinstruction is divided into five fields: label, micro operations, CD,
BR, and AD.
[3 MARKS]
5
[2 MARKS]
6
13. a. Suppose you want to achieve a speed-up of 90 times faster with 100 processors.
i) What percentage of the original computation can be sequential? (6)
[2 MARKS]
7
ii) What are major Hazards in a pipeline? Explain data Hazard and methods to
minimize it with examples. (7)
[2 MARKS]
Data hazards occur when instructions that exhibit data dependence modify data
in different stages of a pipeline.
[3 MARKS]
There are three situations in which a data hazard can occur: read after write
(RAW), a true dependency.
Write after read (WAR), anti-dependency.
There are several main solutions and algorithms used to resolve data hazards:
insert a pipeline bubble whenever a read after write (RAW) dependency is
encountered, guaranteed to increase latency, or
use out-of-order execution to potentially prevent the need for pipeline bubbles
use operand forwarding to use data from later stages in the pipeline
In the case of out-of-order execution, the algorithm used can be:
score boarding, in which case a pipeline bubble is needed only when there is no
functional unit available
Example:
For example, to write the value 3 to register 1, (which already contains a 6), and then
add 7 to register 1 and store the result in register 2, i.e.:
i0: R1 = 6 [2 MARKS]
i1: R1 = 3
i2: R2 = R1 + 7 = 10
Following execution, register 2 should contain the value 10. However, if i1 (write 3 to
register 1) does not fully exit the pipeline before i2 starts executing, it means that R1
does not contain the value 3 when i2 performs its addition. In such an event, i2 adds 7 to
the old value of register 1 (6), and so register 2 contains 13 instead, i.e.:
i0: R1 = 6
i2: R2 = R1 + 7 = 13
i1: R1 = 3
This error occurs because i2 reads Register 1 before i1 has committed/stored the result
of its write operation to Register 1. So when i2 is reading the contents of Register 1,
register 1 still contains 6, not 3.
8
13. b. Distinguish between an arithmetic pipeline and an instruction pipeline. What are
i) important features of each? (5)
Arithmetic pipelines differ from instruction pipelines in some important ways. They are
generally synchronous. This means that each stage executes in a fixed number of clock
cycles. In a synchronous pipeline, moreover, no buffering between stages is provided. [2 MARKS]
Each stage must be ready to accept the data passed from a previous stage when that data
is produced.
Another important difference is that an arithmetic pipeline may be nonlinear. The
"stages" in this type of pipeline are associated with key processing components such as
adders, shifters, etc. Instead of a steady progression through a fixed sequence of stages, a
task in a nonlinear pipeline may use more than one stage at a time, or may return to the
same stage at several points in processing.
Some functions of the arithmetic logic unit of a processor can be pipelined to maximize
performance. An arithmetic pipeline is used for implementing complex arithmetic
functions like floating-point addition, multiplication, and division. These functions can
be decomposed into consecutive sub functions. The floating-point addition can be
divided into three stages: mantissas alignment, mantissas addition, and result
normalization.
features of an arithmetic pipeline
Each stage performs only its specific function, it does not have to be capable of
performing the task of any other stage. An individual stage might be an adder or
multiplier or other hardware to perform some arithmetic function. [3 MARKS]
It is not very useful. Unless the exact function performed by the pipeline is
required, the CPU cannot use the fixed arithmetic pipeline. Configurable
arithmetic pipeline.
It is better suitable as it uses multiplexer as its input. The control unit of CPU sets
the select signals of the multiplexer to control flow of data (i.e. pipeline is
configurable). Vectored arithmetic unit.
A CPU may include a vectored arithmetic unit. A vectored arithmetic unit
contains multiple functional units to perform addition, multiplication, shifting,
division etc) to operate different arithmetic operations in parallel.
Used to implement floating point operations, multiplication of fixed point
numbers and similar computations encountered in scientific operations.
Although arithmetic pipelines can perform many iterations of the same operation
in parallel, they cannot perform different operations simultaneously.
features of an instruction pipeline
9
ii) Explain the concept of superscalar operation? What are its performance
considerations? (8)
A superscalar processor usually sustains an execution rate in excess of one instruction Definition
per machine cycle. Therefore, a superscalar processor can be envisioned having multiple [2 MARKS]
parallel pipelines, each of which is processing instructions simultaneously from a single
instruction thread.
While a superscalar CPU is typically also pipelined, superscalar and pipelining
execution are considered different performance enhancement techniques. The former
executes multiple instructions in parallel by using multiple execution units, whereas the
latter executes multiple instructions in the same execution unit in parallel by dividing the
execution unit into different phases.
The superscalar technique is traditionally associated with several identifying
characteristics (within a given CPU):
Instructions are issued from a sequential instruction stream [4 MARKS]
The CPU dynamically checks for data dependencies between instructions at run
time (versus software checking at compile time)
The CPU can execute multiple instructions per clock cycle
Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and
allowing it to keep the multiple execution units in use at all times. This has become
increasingly important as the number of units has increased. While early superscalar
CPUs would have two ALUs and a single FPU, a later design such as the PowerPC
970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective
at keeping all of these units fed with instructions, the performance of the system will be
no better than that of a simpler, cheaper design.
A superscalar processor usually sustains an execution rate in excess of one instruction
per machine cycle. But merely processing multiple instructions concurrently does not
make an architecture superscalar, since pipelined, multiprocessor or multi-
core architectures also achieve that, but with different methods.
In a superscalar CPU the dispatcher reads instructions from memory and decides which
ones can be run in parallel, dispatching each to one of the several execution units
contained inside a single CPU. Therefore, a superscalar processor can be envisioned
having multiple parallel pipelines, each of which is processing instructions
simultaneously from a single instruction thread.
performance considerations
Superscalar processors issue more than one instruction per clock cycle. Unlike VLIW
processors, they check for resource conflicts on the fly to determine what combinations [2 MARKS]
of instructions can be issued at each step. Superscalar architectures dominate desktop
and server architectures. Superscalar processors are not as common in the embedded
world as in the desktop/server world. Embedded computing architectures are more likely
to be judged by metrics such as operations per watt rather than raw performance.
14. a. What is cache memory? How can the performance of cache memory improve? (5)
i) Cache memory is a small-sized type of volatile computer memory that provides high-
speed data access to a processor and stores frequently used computer programs,
applications and data. It is the fastest memory in a computer, and is typically integrated Definition
10
onto the motherboard and directly embedded in the processor or main random access [2 MARKS]
memory (RAM).
Cache memory provides faster data storage and access by storing instances of programs
and data routinely accessed by the processor. Thus, when a processor requests data that
already has an instance in the cache memory; it does not need to go to the main memory
or the hard disk to fetch the data.
Cache memory can be primary or secondary cache memory, with primary cache memory
directly integrated into (or closest to) the processor. In addition to hardware-based cache,
cache memory also can be a disk cache, where a reserved portion on a disk stores and
provides access to frequently accessed data/applications from the disk.
Improving cache performance
Three kinds of cache misses
• Compulsory misses – First time data is accessed [3 MARKS]
• Capacity misses – Working set larger than cache size
• Conflict misses – One set fills up, but room in other sets
• Reduce miss rate
• Reduce miss penalty
• Reduce hit time
ii) Explain mapping functions in the cache memory to determine how memory blocks
are placed in cache. Support your answer with the help of suitable examples. (8)
Cache memory mapping techniques:
[2 MARKS]
Direct Mapping
Associative Mapping
Set – Associative Mapping
Direct Mapping:
The direct mapping concept is if the ith block of main memory has to be placed at
the jth block of cache memory then, the mapping is defined as:
j = i % (number of blocks in cache memory)
Let’s see this with the help of an example.
Suppose, there are 4096 blocks in primary memory and 128 blocks in the cache
memory. Then the situation is like if I want to map the 0th block of main memory into [2 MARKS]
the cache memory, then I apply the above formula and I get:
0 % 128 = 0
So, the 0th block of main memory is mapped to the 0th block of cache memory. Here, 128
is the total number of blocks in the cache memory.
1 % 128 = 1
2 % 128 = 2
Similarly, the 1st block of main memory will be mapped to the 1st block of cache, then
2nd block to 2nd block of the cache and so on.
So, this is how direct mapping in the cache memory is done. The following diagram
illustrates the direct mapping process.
11
Associative Mapping:
In the direct cache memory mapping technique, the problem was every block of main
memory was directly mapped to the cache memory. So, the major drawback was the
high conflict miss. That means we had to replace a cache memory block even when [2 MARKS]
other blocks in the cache memory were present as empty.
Suppose, I have already loaded the 0th block of main memory to the 0th block of cache
memory using the direct mapping technique. Now consider, the next block that I need
is 128. Even if the 1,2,3… all blocks of cache memory are empty, I still have to map
the 128 block of main memory to the 0th block of cache memory since,
128 % 128 = 0
Therefore, I have to change the previously loaded 0th block of main memory to
the 128 block. So, that was the reason for high conflict miss. That means I have to
replace a cache block even if the other cache blocks are present as empty. To overcome
this drawback of direct mapping technique, the concept of associative mapping
technique was introduced.
The idea of associative mapping technique is to avoid the high conflict miss, any block
of main memory can be placed anywhere in the cache memory. Associative mapping
technique is the fastest and most flexible mapping technique. We can have the following
diagram to illustrate the associative mapping process.
14. b. What is virtual memory? Explain the steps involved in virtual memory address
i) translation. (7)
Virtual memory is a memory management capability of an operating Definition
system (OS) that uses hardware and software to allow a computer to compensate for [2 MARKS]
physical memory shortages by temporarily transferring data from random access
memory (RAM) to disk storage.
13
● Process always uses virtual addresses
● Memory Management Unit (MMU): part of CPU; hardware that does address
translation – Caches recently used translations in a Translation Look aside Buffer (Page
Table Cache)
● The page tables are stored in OS's virtual address space
● The page tables are (at best) present in the MM – One main memory reference per
address translation!
● To translate a virtual memory address, the MMU has to read the relevant page table
entry out of memory
ii) Explain the DRAM memory technology with its basic organization. (6)
A DRAM (dynamic RAM) with an on-chip cache, called the cache
DRAM, has been proposed and fabricated. It is a hierarchical RAM containing a 1- Definition
Mb DRAM for the main memory and an 8-kb SRAM (static RAM) for cache memory. It [2 MARKS]
uses a 1.2- mu m CMOS technology.
Dynamic random-access memory (DRAM) is a type of random access
semiconductor memory that stores each bit of data in a separate tiny capacitor within
an integrated circuit. The capacitor can either be charged or discharged; these two states
are taken to represent the two values of a bit, conventionally called 0 and 1. The electric
charge on the capacitors slowly leaks off, so without intervention the data on the chip
would soon be lost. To prevent this, DRAM requires an external memory refresh circuit
which periodically rewrites the data in the capacitors, restoring them to their original
charge. This refresh process is the defining characteristic of dynamic random-access
memory, in contrast to static random-access memory (SRAM) which does not require
data to be refreshed. Unlike flash memory, DRAM is volatile memory (vs. non-volatile
memory), since it loses its data quickly when power is removed. However, DRAM does
exhibit limited data eminence.
DRAM is widely used in digital electronics where low-cost and high-capacity memory is
required. One of the largest applications for DRAM is the main memory (colloquially
[4 MARKS]
called the "RAM") in modern computers and graphics cards (where the "main memory"
is called the graphics memory). It is also used in many portable devices and video
14
game consoles. In contrast, SRAM, which is faster and more expensive than DRAM, is
typically used where speed is of greater concern than cost and size, such as the cache
memories in processors.
Due to its need of a system to perform refreshing, DRAM has more complicated
circuitry and timing requirements than SRAM, but it is much more widely used. The
advantage of DRAM is the structural simplicity of its memory cells: only
one transistor and a capacitor are required per bit, compared to four or six transistors in
SRAM. This allows DRAM to reach very high densities, making DRAM much cheaper
per bit. The transistors and capacitors used are extremely small; billions can fit on a
single memory chip. Due to the dynamic nature of its memory cells, DRAM consumes
relatively large amounts of power, with different ways for managing the power
consumption.
15. a. What do you understand by an interrupt? Explain the steps through which the
i) processor handles an interrupt. (7)
Interrupt is a signal to the processor emitted by hardware or software indicating
[2 MARKS]
an event that needs immediate attention. Hardware interrupts are used by devices to
communicate that they require attention from the operating system.
Interrupts and Exceptions
• An interrupt is a change in program defined flow of execution.
[2 MARKS]
• When an interrupt occurs, the hardware executes the instructions at a specified address
instead of following the normal program flow. Program flow.
• User programs are interrupted all the time.
Types of Interrupts
• External – Generated by an I/O device.
15
• Internal – Exception within a program.
[3 MARKS]
• Program Generated – Used to transfer control to the operating system.
External Interrupts
• I/O devices tell the CPU that an I/O request has completed by sending an interrupt
signal to the processor.
• I/O errors may also generate an interrupt.
• Most computers have a timer which interrupts the CPU every so many interrupts the
CPU every so many milliseconds.
Internal Interrupts
• When the hardware detects that the program is doing something wrong, it will usually
generate an interrupt usually generate an interrupt. – Arithmetic error - Invalid
Instruction – Addressing error - Hardware malfunction – Page fault - Debugging
• A Page Fault interrupt is not the result of a program error, but it does require the
operating system to get control.
• Internal interrupts are sometimes called exceptions.
Program Generated Interrupts
• Most computers have an instruction that generates an internal interrupt.
• Program generated interrupts are a means for user programs to call a function of the
operating system
• Some systems refer to these interrupts as Some systems refer to these interrupts as a
Supervisor Call or SVC
ii) Differentiate between synchronous and Asynchronous communication. Where is
each of these suitable? (6)
One of the major differences is that in Synchronous Transmission, the sender
and receiver should have synchronized clocks before data transmission.
[2 MARKS]
Whereas Asynchronous Transmission does not require a clock but it adds a parity bit to
the data before transmission.
[4 MARKS]
16
15. b. What do you understand by a DMA mode data transfer? Give at least one example
i) data transfer situation where DMA mode would be advantageous. (6)
A DMA controller can directly access memory and is used to transfer data from
[2 MARKS]
one memory location to another, or from an I/O device to memory and vice versa.
ADMA controller manages several DMA channels, each of which can be programmed
to perform a sequence of these DMA transfers.
I/O Interface (Interrupt and DMA Mode)
The method that is used to transfer information between internal storage and
external I/O devices is known as I/O interface. The CPU is interfaced using special
communication links by the peripherals connected to any computer system. These
communication links are used to resolve the differences between CPU and peripheral.
There exists special hardware components between CPU and peripherals to supervise
and synchronize all the input and output transfers that are called interface units.
Direct Memory Access: The data transfer between a fast storage media such as
magnetic disk and memory unit is limited by the speed of the CPU. Thus we can allow
the peripherals directly communicate with each other using the memory buses, [2 MARKS]
removing the intervention of the CPU. This type of data transfer technique is known as
DMA or direct memory access.
During DMA the CPU is idle and it has no control over the memory buses. The
DMA controller takes over the buses to manage the transfer directly between the I/O
devices and the memory unit.
17
Bus Request: It is used by the DMA controller to request the CPU to relinquish the
control of the buses.
Bus Grant: It is activated by the CPU to Inform the external DMA controller that the
buses are in high impedance state and the requesting DMA can take control of the
buses. Once the DMA has taken the control of the buses it transfers the data. This [2 MARKS]
transfer can take place in many ways.
[2 MARKS]
PART – C
(1 x 15 = 15Marks)
16. a. The access time of a cache memory is 100ns and that of main memory is
i) 1000ns. It is estimated that 80% of the memory requests are for read and the
remaining 20% are for write. The hit ratio for read access only is 0.9. A write-
through procedure is used.
a. What is the average access time of the system considering only
19
memory read cycles? (2)
b. What is the average access time of the system for both read and write
requests? (3)
c. What is the hit ratio taking into consideration the write cycles? (3)
ANSWER:
a. What is the average access time of the system considering only memory read
cycles?
Average access time for memory read in the system is calculated using formula:
[2 MARKS]
average_access_time_read = hit_ratio x cache_access_time + (1 - hit_ratio) x
main_memory_access_time
so in this case:
average_access_time_read = 0.9 x 100ns + (1 - 0.9) x 1000ns = 90ns x 100ns
= 190ns.
b. What is the average access time of the system for both read and write
requests?
If we take in account both read and write accesses then we have to sum averages
for read and write.
Read average would take those 80% of overall requests and the average read access
time of 190ns we calculated in a) to get 0.8 x 190.
[3 MARKS]
Write average would take those 20% of overall requests and the main memory
access time of 1000ns to get 0.2 x 1000ns.
Summed together we get: 0.8 x 190ns + 0.2 x 1000ns = 152ns + 200ns = 352ns.
c. What is the hit ratio taking into consideration the write cycles?
To take into consideration write cycles means that we should discard write requests
from the given overall hit ratio.
[3 MARKS]
So we have hit_ratio_read = read_requests_percentage x hit_ratio
= 0.8 x 0.9 = 0.72.
ii) Assume we have a computer where CPI is 1.0 when all memory accesses hit in
the cache. The only data accesses are loads and stores, and these total 50% of
the instructions. If the miss penalty is 25 cycles and the miss rate is 2%, how
much faster would the computer be if all instructions were cache hits? (7)
ANSWER:
CPUtime CPUClockCycles MemeoryStalls ClockCycleTime
( IC CPI MemoryStalls) ClockCycleTime
20
When all instructions are hit [7 MARKS]
CPUtime _ Ideal ( IC CPI MemoryStalls ) ClockCycleTime
( IC 1.0 0) ClockCycleTime In reality:
IC ClockCycle Time
MemAccess
MemoryStallCycles IC MissRate MissPenalty
Inst
IC (1 0.5) 0.02 25 IC 0.75
CPUtime _ Cache ( IC CPI MemoryStalls) ClockCycleTime
( IC 1.0 IC 0.75) ClockCycleTime
1.75 IC ClockCycleTime
16. b. Consider a program of 15,000 instructions executed by a linear pipeline
processor with a clock rate of 25 MHz. Instruction pipeline has 05 stages and
i)
instruction is issued per clock cycle. Calculate speed-up ratio, efficiency and
throughput of this pipelined processor. (6)
[6 MARKS]
ii) A pipelined processor uses the delayed branch technique. You are asked to
recommend one of the two possibilities for the design of this processor. In the
first possibility, the processor has 4-stage pipeline and one delay slot, and in the
second possibility, it has a 6-stage pipeline with 2-delay slots. Compare the
performance of these two alternatives, taking only the branch penalty into
account. Assume that 20% of the instructions are branch instructions and that
an optimizing compiler has an 80% success rate in filling the single slot delay.
For the second alternative, the compiler is able to fill the second slot 25% of the
time. (9)
ANSWER:
21
T4-> CPI for 4 stage pipeline, T6-> CPI for 6 stage pipeline [9 MARKS]
Clearly machine with 4 stage pipeline with 1 delay slot is faster than machine with 6
stage pipeline and 2 delay slots.
22