DLD Module 7 Print
DLD Module 7 Print
• An arithmetic-logic unit, or ALU, performs many different arithmetic • The basic four-bit adder always computes S = A + B + CI.
and logic operations. The ALU is the “heart” of a processor—you could
say that everything else in the CPU is there to support the ALU.
• Here’s the plan:
– We’ll show an arithmetic unit first, by building off ideas from the
adder-subtractor circuit.
– Then we’ll talk about logic operations a bit, and build a logic unit.
– Finally, we put these pieces together using multiplexers.
• We use some examples from the textbook, but things are re-labeled
and treated a little differently.
• But by changing what goes into the adder inputs A, B and CI, we can
change the adder output S.
• This is also what we did to build the combined adder-subtractor circuit.
1 2 3
It’s the adder-subtractor again! The multi-talented adder Modifying the adder inputs
• Here the signal Sub and some XOR gates alter the adder inputs. • So we have one adder performing two separate functions. • By following the same approach, we can use an adder to compute other
– When Sub = 0, the adder inputs A, B, CI are Y, X, 0, so the adder • “Sub” acts like a function select input which determines whether the functions as well.
produces G = X + Y + 0, or just X + Y. circuit performs addition or subtraction. • We just have to figure out which functions we want, and then put the
– When Sub = 1, the adder inputs are Y’, X and 1, so the adder output • Circuit-wise, all “Sub” does is modify the adder’s inputs A and CI. right circuitry into the “Input Logic” box .
is G = X + Y’ + 1, or the two’s complement operation X - Y.
4 5 6
Some more possible functions The role of CI Table of arithmetic functions
• We already saw how to set adder inputs A, B and CI to compute either • The transfer and increment operations have the same A and B inputs, • Here are some of the different possible arithmetic operations.
X + Y or X - Y. and differ only in the CI input. • We’ll need some way to specify which function we’re interested in, so
• How can we produce the increment function G = X + 1? • In general we can get additional functions (not all of them useful) by we’ve randomly assigned a selection code to each operation.
using both CI = 0 and CI = 1.
One way: Set A = 0000, B = X, and CI = 1 • Another example:
– Two’s-complement subtraction is obtained by setting A = Y’, B = X, S2 S1 S0 Arithmetic operation
• How about decrement: G = X - 1? and CI = 1, so G = X + Y’ + 1. 0 0 0 X (transfer)
– If we keep A = Y’ and B = X, but set CI to 0, we get G = X + Y’. This 0 0 1 X+1 (increment)
A = 1111 (-1), B = X, CI = 0 turns out to be a ones’ complement subtraction operation. 0 1 0 X+Y (add)
0 1 1 X+Y+1
• How about transfer: G = X? 1 0 0 X + Y’ (1C subtraction)
(This can be useful.) 1 0 1 X + Y’ + 1 (2C subtraction)
1 1 0 X–1 (decrement)
A = 0000, B = X, CI = 0 1 1 1 X (transfer)
7 8 9
Mapping the table to an adder Building the input logic Primitive gate-based input logic
• This second table shows what the adder’s inputs should be for each of • All we need to do is compute the adder input A, given the arithmetic • We could build this circuit using primitive gates.
our eight desired arithmetic operations. unit input Y and the function select code S (actually just S2 and S1). • If we want to use K-maps for simplification, then we should first
– Adder input CI is always the same as selection code bit S0. • Here is an abbreviated truth table: expand out the abbreviated truth table.
– B is always set to X. – The Y that appears in the output column (A) is actually an input.
– A depends only on S2 and S1. – We make that explicit in the table on the right.
• These equations depend on both the desired operations and the • Remember A and Y are each 4 bits long!
assignment of selection codes. S2 S1 A
0 0 0000
S2 S1 Yi Ai
Selection code Desired arithmetic operation Required adder inputs 0 1 Y
S2 S1 S0 G (A + B + CI) A B CI 0 0 0 0
1 0 Y’
0 0 0 X (transfer) 0000 X 0 S2 S1 A 0 0 1 0
1 1 1111
0 0 1 X+1 (increment) 0000 X 1 0 0 0000 0 1 0 0
0 1 0 X+Y (add) Y X 0 0 1 1 1
0 1 Y
0 1 1 X+Y+1 Y X 1 1 0 0 1
1 0 Y’
X + Y’ Y’
1 0 0 (1C subtraction) X 0
• We want to pick one of these four possible values for A, depending on 1 1 1111 1 0 1 0
1 0 1 X + Y’ + 1 (2C subtraction) Y’ X 1
S2 and S1. 1 1 0 1
1 1 0 X–1 (decrement) 1111 X 0
1 1 1 X (transfer) 1111 X 1 1 1 1 1
10 11 12
Primitive gate implementation Bitwise operations Bitwise operations in programming
• From the truth table, we can find • Most computers also support logical operations like AND, OR and NOT, • Languages like C, C++ Java and HDLs provide bitwise logical operations:
an MSP: but extended to multi-bit words instead of just single bits.
• To apply a logical operation to two words X and Y, apply the operation & (AND) | (OR) ^ (XOR) ~ (NOT)
S1
0 0 1 0
on each pair of bits Xi and Yi: • These operations treat each integer as a bunch of individual bits:
S2 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 13 & 25 = 9 because 01101 & 11001 = 01001
Yi AND 1 1 1 0 OR 1 1 1 0 XOR 1 1 1 0
1 0 1 0 1 1 1 1 01 01 • They are not the same as the operators &&, || and !, which treat each
integer as a single logical value (0 is false, everything else is true):
Ai = S2Yi’ + S1Yi • We’ve already seen this informally in two’s-complement arithmetic,
when we talked about “complementing” all the bits in a number. 13 && 25 = 1 because true && true = true
• Again, we have to repeat this
once for each bit Y3-Y0, • Bitwise operators are often used in programs to set a bunch of Boolean
connecting to the adder inputs options, or flags, with one argument.
A3-A0.
• Easy to represent sets of fixed universe size with bits:
• This completes our arithmetic – 1: is member, 0 not a member. Unions: OR, Intersections: AND
unit.
13 14 15
Bitwise operations in networking Defining a logic unit Our simple logic unit
• IP addresses are actually 32-bit binary numbers, and bitwise operations • A logic unit supports different logical
can be used to find network information. functions on two multi-bit inputs X and Y,
• For example, you can bitwise-AND an address 192.168.10.43 with a producing an output G.
“subnet mask” to find the “network address,” or which network the • This abbreviated table shows four
machine is connected to. possible functions and assigns a selection
code S to each. • Inputs:
192.168. 10. 43 = 11000000.10101000.00001010.00101011 – X (4 bits)
& 255.255.255.224 = 11111111.11111111.11111111.11100000 S1 S0 Output – Y (4 bits)
192.168. 10. 32 = 11000000.10101000.00001010.00100000 0 0 Gi = XiYi – S (2 bits)
• You can use bitwise-OR to generate a “broadcast address,” for sending
0 1 Gi = Xi + Yi • Outputs:
data to all machines on the local network.
1 0 Gi = Xi Yi – G (4 bits)
1 1 Gi = Xi’
•
192.168. 10. 43 = 11000000.10101000.00001010.00101011
We’ll just use multiplexers and some
| 0. 0. 0. 31 = 00000000.00000000.00000000.00011111
primitive gates to implement this.
•
192.168. 10. 63 = 11000000.10101000.00001010.00111111
Again, we need one multiplexer for each
bit of X and Y.
16 17 18
Combining the arithmetic and logic units Our ALU function table A complete ALU circuit
• Now we have two pieces of the puzzle: • This table shows a sample The / and 4 on a line indicate that it’s actually four lines.
– An arithmetic unit that can compute eight functions on 4-bit inputs. function table for an ALU.
S3 S2 S1 S0 Operation
– A logic unit that can perform four functions on 4-bit inputs. • All of the arithmetic operations
0 0 0 0 G=X 4
have S3=0, and all of the logical
0 0 0 1 G=X+1
operations have S3=1.
• 4 Cout should be ignored
We can combine these together into a single circuit, an arithmetic-logic 0 0 1 0 G=X+Y
unit (ALU). • These are the same functions we
0 0 1 1 G=X+Y+1
when logic operations are
performed (when S3=1).
saw when we built our arithmetic
0 1 0 0 G = X + Y’ 4
and logic units a few minutes ago.
0 1 0 1 G = X + Y’ + 1
• Since our ALU only has 4 logical 0 1 1 0 G=X–1 4 4
19 20 21
22 23 24
1 2 3
4 5 6
What’s in a CPU? Register transfers Register transfer language review (from Chapter 8)
• Two-character names denote registers, such as R0, R1, DR, or SA.
ALU ALU • Arrows indicate data transfers. To copy the contents of the source
register R2 into the destination register R1 in one clock cycle:
Registers Registers R1 R2
7 8 9
Register transfer operations (cont’d) Block symbols for registers Register file
• We can apply arithmetic operations to registers. • Modern processors have a
Load number of registers grouped D
R1 R2 + R3 together in a register file. n
R3 R1 - 1
Data input
n
• Much like words stored in a
RAM, individual registers are D data
R0
• n WR Write
Logical operations are applied bitwise. AND and OR are denoted with identified by an address. k
•
Data output DA D address
special symbols, to prevent confusion with arithmetic operations. Here is a block symbol for a
k Register File
2 x n register file.
R2 R1 R2 bitwise AND • We’ll use this block diagram to represent an n-bit register. – There are 2k registers, so k k
•
AA A address B address BA
R3 R0 R1 bitwise OR There is a data input and a load input. register addresses are k
bits long.
– When Load = 1, the data input is stored into the register.
A data B data
modified, and we assume that the shift input is just 0. • The register’s contents are always available on the output lines, inputs and outputs are n A B
regardless of the Load input. bits wide.
R2 sl R1 left shift • The clock signal is not shown because it would make the diagram messy.
R2 sr R1 right shift • Remember that the input and output lines are actually n bits wide!
10 11 12
Accessing the register file What’s inside the register file Explaining the register file
• You can read two registers at • Here’s a 4 x n register file. (We’ll assume a 4 x n register file for all • The 2-to-4 decoder selects one of the four registers for writing. If
once by supplying the AA and our examples.) WR = 1, the decoder will be enabled and one of the Load signals will
D
BA inputs. The data appears n
be active.
on the A and B outputs. n
• The n-bit 4-to-1 muxes select the two register file outputs A and B,
• You can write to a register by based on the inputs AA and BA.
D data
using the DA and D inputs, WR Write
and setting WR = 1.
DA
k D address • We need to be able to read two registers at once because most
• These are registers so there arithmetic operations require two operands.
Register File
must be a clock signal, even
though we usually don’t show k k
AA A address B address BA
it in diagrams.
– We can read from the A data B data
n n
13 14 15
– A and B are two n-bit numeric inputs. taken from the book. 00001 F=A+1
– The ALU computes a result, which DA 2 D address
– FS is an m-bit function select code, which picks one of 2m functions. • The function select code FS is 5 bits 00010 F=A+B
is saved back to the registers. Register File
00011 F=A+B+1
– The n-bit result is called F. long, but there are only 15 different
00100 F = A + B’
• WR, DA, AA, BA and FS are control
functions here. 2 2
– Several status bits provide more signals. Their values determine the AA A address B address BA
F = A + B’ + 1
information about the output F: n n • We use an alternative notation for
00101
00110 F=A–1
exact actions taken by the datapath— A data B data
AND and OR to avoid confusion with which registers are used and for what
• V = 1 in case of signed overflow. arithmetic operations.
00111 F=A
operation.
n n
16 17 18
An example computation Two questions We can access RAM also
• Let’s look at the proper control signals • Four registers isn’t a lot. What if we • Here’s a way to connect RAM n
for the operation below: D data need more storage? D data into our existing datapath.
R0 R1 + R3
WR
1
Write • Who exactly decides which registers WR Write • To write to RAM, we must give WR Write
D data
DA D address are read and written and which ALU DA D address an address and a data value.
•
DA D address
00
Set AA = 01 and BA = 11. This causes
Register File function is executed? Register File • These will come from the Register File
the contents of R1 to appear at A registers. We connect A data AA A address B address BA
data, and the contents of R3 to AA A address B address BA AA A address B address BA to the memory’s ADRS input, A data B data
appear at B data. 01 11
and B data to the memory’s
RAM
•
n
Set the ALU’s function select input A data B data A data B data
DATA input. n ADRS
DATA OUT
•
+5V CS
FS = 00010 (A + B). Set MW = 1 to write to the A B MW WR
• Set DA = 00 and WR = 1. On the next A B A B RAM. (It’s called MW to
FS FS
V
1
19 20 21
Reading from RAM Notes about this setup Memory transfer notation
• To read from RAM, A data must n • We now have a way to copy n • In our transfer language, the contents at random access memory
supply the address. data between our register file address X are denoted M[X]. For example:
• Set MW = 0 for reading. D data and the RAM. D data – The first word in RAM is M[0].
•
WR Write WR Write
• The incoming data will be sent DA D address Notice that there’s no way for DA D address – If register R1 contains an address, then M[R1] are the contents of
to the register file for storage. Register File the ALU to directly access the Register File that address.
• This means that the register AA A address B address BA
memory—RAM contents must AA A address B address BA • The M[ ] notation is like a pointer dereference operation in C or C++.
file’s D data input could come go through the register file
A data B data A data B data
RAM first. RAM
from either the ALU output or n n
•
n ADRS n ADRS
the RAM. DATA OUT Here the size of the memory is DATA OUT
+5V CS +5V CS
• A mux MD selects the source FS FS
A B MW
0
WR limited by the size of the
FS FS
A B MW WR
•
N N
output can be stored in the
Z For simplicity we’ll assume the Z
F F
register file. n
RAM is at least as fast as the n
CPU clock. (This is definitely
– When MD = 1, the RAM D0
Q D1
n
not the case in real processors
D0
Q D1
n
output is sent to the S MD
these days.)
S MD
register file instead.
22 23 24
Example sequence of operations R3 M[R0] R3 R3 + 1
• Here is a simple series of register transfer instructions: • AA should be set to 00, to read n • AA = 11, so R3 is read from the n
register R0. register file and sent to the
R3 M[R0] • The value in R0 will be sent to 1 D data ALU’s A input. 1 D data
R3 R3 + 1 •
WR Write WR Write
the RAM address input, so DA D address FS needs to be 00001 for the DA D address
M[R0] R3 M[R0] appears as the RAM 11 Register File operation A + 1. Then, R3 + 1 11 Register File
output OUT. appears as the ALU output F.
• This just increments the contents at address R0 in RAM.
• MD must be 1, so the RAM
AA
00
A address B address BA
• If MD is set to 0, this output
AA
11
A address B address BA
– Again, our ALU only operates on registers, so the RAM contents output goes to the register file.
A data
n
B data
RAM
will go back to the register file.
A data
n
B data
RAM
ADRS ADRS
must first be loaded into a register, and then saved back to RAM.
• •
n n
To store something into R3, DATA OUT To write to R3, we need to DATA OUT
– R0 is the first register in our register file. We’ll assume it contains we’ll need to set DA = 11 and A B
+5V
MW
CS
WR make DA = 11 and WR = 1. 00001 A B
+5V
MW
CS
WR
FS FS 0 FS FS 0
a valid memory address. WR = 1. • Again, MW should be 0 so the
•
V V
•
ALU ALU
How would these instructions execute in our datapath? MW should be 0, so nothing is C
RAM isn’t inadvertently C
N N
Z Z
accidentally changed in RAM. F changed. F
• Here, we did not use the ALU n • We didn’t use BA. n
(FS) or the second register file D0
n
D0
n
Q D1 Q D1
output (BA). S MD S MD
1 0
25 26 27
28 29 30
Summary HW Control units
• The datapath is the part of a processor where computation is done. 1. Design a 4 bit arithmetic circuit, with two selection variables S1
– The basic components are an ALU, a register file and some RAM. and S0, that generates the arithmetic operations below. Draw • We introduced the basic structure of a control unit, and translated
– The ALU does all of the computations, while the register file and the logic diagram for a single bit stage. (Q 10-4) assembly instructions into a binary representation.
RAM provide storage for the ALU’s operands and results. 2. A computer has a 32 bit instruction word broken into fields as • The last piece of the processor is a control unit to convert these binary
• Various control signals in the datapath govern its behavior. follows: opcode, 6 bits; two register fields, 6 bits each; and one instructions into datapath signals.
• Next, we’ll see how programmers can give commands to the processor, immediate operand/register field, 14 bits. • At the end we’ll have a complete example processor!
and how those commands are translated in control signals for the • (a) What is the maximum number of operations that can be
datapath. specified?
• (b) How many registers can be addressed?
• (c) What is the range of unsigned immediate operands that can
be provided? (Q 10-13) S1S0 Cin = 0 Cin = 1
00 F = A + B (add) F=A+B+1
01 F = A (transfer) F=A+1
(increment)
10 F = B’ F = B’ + 1
(complement) (negate)
11 F = A + B’ F = A + B’ + 1
31 32 (subtract) 33
Datapath review Block diagram of a processor Where does the program go?
• Set WR = 1 to write one of WR D Program • We’ll use a Harvard architecture, which includes two memory units.
the registers. DA Register file – An instruction memory holds the program.
• DA is the register to save to. AA A B BA
Control signals – A separate data memory is used for computations.
• AA and BA select the source constant Control Datapath – The advantage is that we can read an instruction and load or store
registers. 1 0 Unit Status signals data in the same clock cycle.
MB
• MB chooses a register or a Mux B
• For simplicity, our diagrams do not show any WR or DATA inputs to the
constant operand.
• The control unit connects programs with the datapath. instruction memory.
• FS selects an ALU operation.
– It converts program instructions into control words for the
• MW = 1 to write to memory. FS A B ADRS DATA
datapath, including signals WR, DA, AA, BA, MB, FS, MW, MD. ADRS
ADRS DATA
• MD selects between the ALU
V
C ALU MW Data RAM
– It executes program instructions in the correct sequence.
Instruction
RAM MW Data RAM
N
result and the RAM output.
– It generates the “constant” input for the datapath.
Z G OUT OUT
OUT
• V, C, N and Z are status bits.
• The datapath also sends information back to the control unit. For
0 1
instance, the ALU status bits V, C, N, Z can be inspected by branch • Caches in modern CPUs often feature a Harvard architecture like this.
instructions to alter a program’s control flow. •
MD
Mux D However, there is usually a single main memory that holds both program
instructions and data, in a Von Neumann architecture.
34 35 36
Program counter Instruction decoder Jumps and branches
• A program counter or PC addresses the instruction memory, to keep • The instruction decoder is a combinational Data • Finally, the branch control unit
track of the instruction currently being executed. circuit that takes a machine language decides what the PC’s next value V
C Branch
PC
• On each clock cycle, the counter does one of two things. instruction and produces the matching Load PC should be. N
Z
Control
– If Load = 0, the PC increments, so the next instruction in memory control signals for the datapath. – For jumps, the PC should be
will be executed. • These signals tell the datapath which ADRS loaded with the target ADRS
Instruction
– If Load = 1, the PC is updated with Data, which represents some registers or memory locations to access, Instruction
RAM
address specified in the RAM
address specified in a jump or branch instruction. and what ALU operations to perform. instruction. OUT
37 38 39
40 41 42
Instruction Formats Register format Immediate format
15 9 8 6 5 3 2 0 15 9 8 6 5 3 2 0
15 9 8 6 5 3 2 0 Destination Source Source Destination Source
Operand
Destination Source reg- Source reg- Opcode Register Register A Register B Opcode Register Register A
Opcode register (DR) ister A (SA) ister B (SB) (OP)
(DR) (SA) (SB) (DR) (SA)
(a) Register
15 9 8 6 5 3 2 0
• An example register-format instruction: • An example immediate-format instruction:
Destination Source reg- ADD R1, R2, R3 ADD R1, R2, #3
Opcode register (DR) ister A (SA) Operand (OP)
(b) Immediate • Our binary representation for these instructions will include: • Immediate-format instructions will consist of:
15 9 8 6 5 3 2 0 – A 7-bit opcode field, specifying the operation (e.g., ADD). – A 7-bit instruction opcode.
Opcode
Address (AD)
(Left)
Source reg-
ister A (SA)
Address (AD)
(Right)
– A 3-bit destination register, DR. – A 3-bit destination register, DR.
– Two 3-bit source registers, SA and SB. – A 3-bit source register, SA.
(c) Jump and Branch
– A 3-bit constant operand, OP.
• The three formats are: Register, Immediate, and Jump and Branch
• All formats contain an Opcode field in bits 9 through 15.
• The Opcode specifies the operation to be performed
43 44 45
• Two example jump and branch instructions: – Register format instructions require two source registers. Instr uction Opcode Mnemonic Format Description Bits
– Immediate format instructions have one source register and one Move A R[SA ]
BZ R3, -24 0000000 MOVA RD ,RA R [DR] N, Z
constant operand. R [SA]
Increment 0000001 INC R D,RA R[DR] +1 N, Z
JMP 18 – Jump and branch format instructions need one source register and Add 0000010 ADD R D,RA,RB R [DR] R[SA ] + R[ SB] N, Z
• Jump and branch format instructions include: one constant address. Subtract 0000101 SUB R D,RA,RB R [DR] R[SA ] - R [SB] N, Z
– A 7-bit instruction opcode. • Even though there are three different instruction formats, it is best to D ecrement 0000110 DEC R D,RA R[DR] R[SA ] -1 N, Z
make their binary representations as similar as possible. AND 0001000 AND R D,RA,RB R [DR] R[SA ] R[SB ] N, Z
– A 3-bit source register SA for branch conditions. RD,RA,RB R[DR] R[SA] R[SB]
– This will make the control unit hardware simpler. OR 0001001 OR N, Z
– A 6-bit address field, AD, for storing jump or branch offsets. RD,RA,RB R[DR] R[SA] R[SB]
– For simplicity, all of our instructions are 16 bits long.
Exclusive OR 0001010 XOR N, Z
• Our branch instructions support only one source register. Other types NO T 0001011 NO T R D,RA R[DR] R[SA ] N, Z
of branches can be simulated from these basic ones.
46 47 48
Summary
• We saw an outline of the control unit hardware.
– The program counter points into a special instruction memory, which
contains a machine language program.
– An instruction decoder looks at each instruction and generates the
correct control signals for the datapath and a branching unit.
– The branch control unit handles instruction sequencing.
• The control unit implementation depends on both the instruction set
architecture and the datapath.
– Careful selection of opcodes and instruction formats can make the
control unit simpler.
• We now have a whole processor! This is the culmination of everything
we did this semester, starting from primitive gates.
49