Superscaling in Computer Architecture

International Islamic University Chittagong
6/5/2021
Computer Architecture
Assignment | Date: June 5, 2021
What is Super Scaling in Computer Architecture?
Super scaling is a concept of parallel execution of multiple independent pipeline
instructions at a time. In computer architecture, making a processor by using the
super scaling method is called a superscalar processor, and this architecture is
called superscalar architecture.
History:
Seymour Cray’s CDC 6600 from 1966 is often mentioned as the first superscalar
design. In 1964 the IBM announced System/360 Model 91 as the competitor of
CDC 6600. Then in 1967, it was another superscalar mainframe. The
Motorola MC88100, the Intel i960CA, and the AMD 29000-series 29050
microprocessors were the first commercial single-chip superscalar
microprocessors. The P5 Pentium was the first superscalar x86 processor.
The Nx586, P6 Pentium Pro, and AMD K5 were among the first designs which
decode x86-instructions asynchronously into dynamic microcode-like micro-
op sequences before actual execution on a superscalar computer organization.
The superscalar architecture was designed to improve the performance of the
operations by executing them simultaneously in multiple independent pipelines.
This technology increases the level of complexity in hardware designing.
Superscalar Architecture:
Superscalar processor design generally refers to a set of techniques that enable
the central processing unit (CPU) of a computer to attain a throughput of over
one instruction per cycle while executing a single sequential program. It is an
advanced pipelining technique.
Figure-1: general organization of superscalar

The following ‘Figure-1’ given above supports the parallel execution of two
integer operations, two floating-point operations, and one memory operation.
Operational Block Diagram:

Example: A superscalar processor (Pentium II) with five functional units:
In The flowing architecture given above,

• It is possible to have multiple Arithmetic Logic Units (ALU) in stage-s4.
• Most of the functional units take longer than one clock cycle to execute
in stage-s4.
Availity to contains multiple pipelines:

The superscalar process consists of multiple independent pipelines. Each
pipeline consists of various stages so that each one can handle multiple
instructions at a time. It is a super-pipelined model where only the separate
instructions are executed sequentially, without any waiting state.
Example: Consider the following architecture (as in Pentium).

• Single instruction fetch unit fetches pairs of instructions together and puts
each one into its own pipeline, complete with its own ALU for parallel
operation.
• The main pipeline (U-Pipeline) could execute an arbitrary Pentium
instruction.
• The V-Pipeline could execute only simple integer instructions (and also a
straightforward floating-point instruction).
• If the instructions in a pair were not simple enough or incompatible, only
the first one was executed (in U-pipeline). The second one was then held
and paired with the instruction following it.
Although a superscalar CPU is typically also pipelined, pipelining and superscalar
architecture are considered different performance enhancement techniques.
Superscalar Processor:
A superscalar processor means that you dispatch multiple instructions during a
single clock cycle. The superscalar architecture was first implemented in RISC
processors, which use short and simple instructions to perform calculations. This
architecture can also be called “second-generation RISC.”
Because of their superscalar abilities, RISC processors have typically performed
better than CISC processors running at the same megahertz(MHz). CPU
Processing speeds are measured in clock cycles per second (MHz); However,
most CISC-based processors (such as the Intel Pentium) now include some RISC
architecture as well, which enables them to execute instructions in parallel.
Nearly all processors developed after 1998 are superscalar processors.
The processor or compiler in a superscalar architecture determines if an
instruction is dependent on the output of other sequential instructions or
whether it can be executed independently. The data dependency between
instructions is verified dynamically by the CPU hardware at run time. The
scheduling of instructions in a superscalar architecture is done dynamically, at
run time, by the processor. The superscalar architectures have mechanisms for
fetching multiple instructions, determining dependencies between instructions,
and executing instructions in order.
Instruction Flow in Superscalar Architecture:
Fig: Instruction flow in superscalar architecture
Instruction-level Parallelism:
The superscalar processor implements a form of parallelism called instruction-
level parallelism within a single processor. The degree to which the instruction
of a program can be executed parallels is called instruction-level parallelism.
That is, Instruction-level parallelism (ILP) is a measure of how many of
the instructions in a computer program can be executed simultaneously.
Pipelining in Superscalar Architecture:
• The Super Scalar process consists of multiple independent pipelines.
• Each pipeline consists of various stages so that each one can handle
multiple instructions at a time.
• A pipelined processor may process each instruction in four steps:
Example:
Fig: A Super Scalar pipeline with degree = 2.

Execution Performance of Pipelining, Super
Pipelining & Superscalar Pipelining:
Fig: Superscalar performance comparison with pipeline and super-pipeline
In case of, base-pipeline:

• Can Issue one instruction per clock cycle.
• The base-pipelining process can perform one pipelining stage per clock
cycle.
• Though several instructions are executing simultaneously, only one
instruction is in its execution stage at any one time.
• Total time needs (to executes six instructions): 9 clock cycles.
In case of, super-pipeline:

• A super-pipelining method is able to perform two pipeline stages per
clock cycle.
• Each stage can be divided into two non-overlapping parts.
• With each executing in half a clock cycle.
• Total time needs (to executes six instructions): 6.5 clock cycles.
In case of, superscalar-pipeline:

• The superscalar-pipelining capable of executing two instances of each
stage in parallel.
• Total time needs (to executes six instructions): 6 clock cycles.
Limitation of Superscalar:
1. True data dependency:
Example: I1 : ADD r1, r2
I2 : MOV r3, r1
In this case, MOV of I2 instruction depends on the I1 results of r1. that is, the second(I2)
instruction needs data produced by the first(I1) instruction.
2. Procedure dependency:
• Situation 1: Can not execute instructions after a branch in parallel
with instructions before a unit – this holds up MULTIPLE pipelines.
• Situation 2: Variable-length instructions – must partially decode first
instruction for first pipe before second instruction for second pipe
can be fetched.
3. Resource Conflict:
• They occur if two or more instructions compete for the same resource
(register, memory, functional unit) at the same time. They are similar
to structural hazards discussed with pipelines.
• Introducing several parallel pipelined units, superscalar architectures
try to reduce a part of possible resource conflicts.
4. Output dependency:
• Output dependency occurs when two instructions write a result
together.
• If an instruction relies on the intermediate result, output dependency
problems could arise.
5. Antidependency:
• Antidependency is the exact opposite of data dependency.
• Data dependency: instruction-2 depends on data from instruction-1.
Anti-dependency: instruction-1 depends on data that could be
destroyed by instruction-2.
Example:
I1 : R3= R3+ R5
I2 : R4= R3+1
I3 : R3= R5+1
I4 : R7= R3+ R4
Instruction I3 can not complete before I2 starts as Instruction I2 needs a value in
R3, and Instruction I3 changes Instruction R3.
Comparison of True Data Dependency, Procedural Dependency,

and Resource Conflict:
Fig: effect of dependencies

Superscaling in Computer Architecture

Uploaded by

Copyright:

Available Formats

Superscaling in Computer Architecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Superscaling in Computer Architecture

Uploaded by

Copyright:

Available Formats

International Islamic University Chittagong

Figure-1: general organization of superscalar

Operational Block Diagram:

In The flowing architecture given above,

Availity to contains multiple pipelines:

Example: Consider the following architecture (as in Pentium).

Instruction Flow in Superscalar Architecture:

Fig: Instruction flow in superscalar architecture

• A pipelined processor may process each instruction in four steps:

Fig: A Super Scalar pipeline with degree = 2.

Fig: Superscalar performance comparison with pipeline and super-pipeline

In case of, base-pipeline:

In case of, super-pipeline:

In case of, superscalar-pipeline:

Comparison of True Data Dependency, Procedural Dependency,

Fig: effect of dependencies

You might also like