VeriLog RTL Design
VeriLog RTL Design
VeriLog RTL Design
97
98 . 5 Register-Transfer Level (RTL) Design
Inputs: S; Outputs: X; Register: Cnt(2)
X=O
Cnt=2
Cnt=O
---v-
3 cycles with X=l
3 cycles with X = 1
(a) (b)
Figure 5.1 Two state machine description types: (a) FSM, (b) high-level state machine
(HLSM).
held high. The first state initializes Cnt to 2. After a button press has been
detected, the second state holds the output high while comparing Cnt to 0 and also
decrementing Cnt. The net result is that the output will be held high for three
clock cycles. Initializing Cnt to 511 (and also declaring Cnt to be a 9-bit register
rather than just 2-bits) would result in holding the output high for 512 cycles.
Describing an HLSM in Verilog can be achieved using a straightforward
approach similar to the approach for describing an FSM. Similar to the approach
for an FSM, the approach for an HLSM considers the target architecture consist-
ing of a combinational logic part and a register part, as shown in Figure 5.2. The
earlier-introduced FSM register part consisted only of a state register. The HLSM
register part consists of a state register, and of any explicitly declared registers.
The figure shows the architecture for the laser timer example system, which has
one explicitly declared register, Cnt.
X :2 J!J
C/)::J
.-J.9-
I::J0
Figure 5.2 Target architecture for an HLSM, consisting of a combinational logic part, and
a register part.
5.1 High-Level State Machine (HLSM) Behavior. 99
'timescale 1 ns/1 ns
input B;
output reg X;
input Clk, Rst;
parameter S_Off = 0,
S_On = 1;
II CombLogic
always @(State, Cnt, B) begin
end
II Regs
always @(posedge Clk) begin
end
endmodule
Figure 5.3 Code template for describing the laser timer's HLSM in Verilog.
II CombLogic
always @(State, Cnt, B) begin
case (State)
S_Off: begin
X <= 0;
CntNext <= 2;
if (B == 0) Note: Writes
StateNext <= S_Off; are to "next"
else
StateNext <= S_On;
variable, reads
end are from
s_On: begin "current"
X <= 1; variable. See
CntNext <= Cnt - 1;
if (Cnt == 0) target
StateNext <= S_Off; architecture to
else understand
StateNext <= S_On;
end why.
endcase
end
Note from the architecture shown in Figure 5.2 that for an explicitly declared
register, the combinational logic reads the current variable (Cnt) but writes the
next variable (CntNext) for the Cnt register. This dichotomy explains why the
action "Cnt <= Cnt - 1" of the HLSM in Figure 5.I(b) is described in the proce-
dure of Figure 5.4 as "CntNext <= Cnt - 1;". Reading is from Cnt, while writing is
to CntNext. When describing HLSMs in an HDL, care must be taken to ensure
that reads are from current variables and writes are to next variables for explicitly
declared registers.
The transitions in the procedure in Figure 5.4, when reading from an explic-
itly declared register, must read from the current variable and not the next vari-
able, again based on the target architecture of Figure 5.2. Thus, the transition that
detects Cnt=Oappears as "Cnt == 0;".
Figure 5.5 shows the procedure for the register part of the HLSM. The register
part actually describes two registers-the state register (involving variables State
and StateNext), and the Cnt register (involving variables Cnt and CntNext). When
the clock is rising and reset is not asserted, the procedure updates each current
variable with the corresponding next variable. If instead reset were asserted, the
procedure sets each current variable to an initial value. Note that the procedure
resets Cnt to an initial value (0), even though such reset behavior is not strictly
L
!
II Regs I
necessary for correct functioning of the HLSM, because the HLSM will initialize
Cnt to 2 in its first state. Such reset behavior was included to follow the modeling
guidelines described in Chapter 3, where it was stated that all registers should
have defined reset behavior.
Figure 5.6 provides simulation waveforms generated when using the laser
timer example testbench introduced in Chapter 3. Note first that the three-cycle-
high behavior is identical to the FSM behavior trom Chapter 3. The waveforms
show two internal variables, Cnt and State. Note how the system enters state S_On
on the first rising clock after B becomes 1, causing Cnt to be initialized to 2. Cnt is
then decremented on each rising clock while in state S_On. After Cnt reaches 0,
State changes to S_OjJ.Note that Cnt also was decremented at that time, causing
Cnt to wrap around from 0 to 3 ("00" - 1 = "11"), but that value of 3 was never
used, because state S_Off sets Cnt to 2 again.
Examining the reset behavior of the system is useful. At the beginning of sim-
ulation, Cnt is unknown. At the first rising clock, Cnt is reset to 0 by the descrip-
tion's explicit reset behavior. At the next rising clock, Cnt is set to 2 by state
S- OjJ.
1for 3 cycles
B
X
Clk
D=~t:=
=
Rst ~............................................................................
Cnt~ ....................................................................................
State ~1 )@
'\I
102 . 5 Register-Transfer Level (RTL) Design
5.2 TOP-DOWN DESIGN- HLSM TO
CONTROLLER AND DATAPATH
Recall from Chapters 2 and 3 that top-down design involves first capturing and
simulating the behavior of a system, and then transforming that behavior to struc~
ture and simulating again. Top-down design divides the design problem into two
steps. The first step is to get the behavior right (freed from the complexity of
designing structure), and the second step is to derive a structural implementation
for that behavior. Dividing into two steps can make the design process proceed
more smoothly than trying to directly capture structure. The two-step approach
also enables the use of automated tools that automatically convert the behavior to
structure.
At the register-transfer level, top-down design involves converting a high-
level state machine (HLSM) to a structural design consisting of a controller and a
datapath, as shown in Figure 5.7. The datapath carries out the arithmetic opera-
tions involved in the HLSM's actions and conditions. The controller sequences
those operations in the datapath. The controller itself will be an implementation of
an FSM.
The first step in converting an HLSM to a controller and datapath is typically
to design a datapath that is capable of implementing all the arithmetic operations
of the HLSM. Figure 5.8 shows a datapath capable of implementing the arithmetic
operations of the laser timer HLSM.
The datapath includes a register Cnt, a decrementer to compute Cnt-l, and a
comparator to detect when Cnt is O.Those components are connected to enable
the operations needed by the HLSM, with a mux in front of the Cnt register to
account for the fact that Cnt can be loaded from two different sources. The datap-
Inputs:S; Outputs:X; Register:Cnt(2)
X=O
Cnt=2
Cnt=O
Controller Datapath
Figure 5.7 RTL top-down design converts an HLSM to a controller and datapath.
5.2 Top-Down Design- HLSM to Controller and Datapath . 103
Inputs: S; Outputs: X; Register: Cnt(2)
X=O
Cnt=2
Cnt=O
Datapath
Clk
Figure 5.8 Deriving a datapath for the arithmetic operations for the laser timer HLSM.
ath provides clear names for the input and output control signals of the datapath
(Cnt_Set, Cnt_Eq_O,and Cnt_Ld).
After creating a datapath, the next step is to derive a controller by replacing
the HLSM with an FSM having the same state and transition structure, but replac-
ing arithmetic operations by Boolean operations that use the datapath input and
output control signals to carry out the desired arithmetic operations inside the
datapath. Such an FSM is shown in Figure 5.9. The FSM does not contain an
explicit register declaration for Cnt, as that register now appears in the datapath.
Likewise, any writes of that register have been replaced by writes to datapath con-
trol signals that configure the input to the register and enable a register load. For
example, the assignment "Cnt = 2" in state Offhas been replaced by the Boolean
actions "Cnt_Sel=1", which configures the datapath mux to pass "1O"(2) through
the mux, and "Cnt - Ld= 1", which enables loading of the Cnt register.
Proceeding with top-down design requires describing Figure 5.9's controller
and datapath in Verilog. One option would be to create a module for the controller
and a module for the datapath, and then instantiating and connecting a controller
module and a datapath module in another higher-level module. However, a sim-
pler approach describes the controller and datapath as procedures within a single
module. The simpler approach will now be discussed.
104 . 5 Register-Transfer Level (RTL) Design T
Inputs: S; Outputs: X; Register: Cnt(2)
X=O
Cnt=2
Cnt=O
/
Inputs: S, Cnt_Eq_O; Outputs: X, Cnt_Sel. Cnt_Ld;
Datapath
X=O
Cnt_Sel=1, Cnt_Ld=1
,
CntlLd
IClk
Controller
Figure 5.9 Deriving a controller by replacing the HLSM with an FSM that uses the
datapath to carry out arithmetic operations.
Cut Ld
Clk
..-...-.-----.--...-.-.-.
Figure 5.10 Partitioning a datapath into a combinationallogic part, and a register part.
II Shared variables
reg Cnt_E~O, Cnt_Sel, Cnt_Ld;
II Controller variables
reg [0:0] State, StateNext;
II Datapath variables
reg [1:0] Cnt, CntNext;
II Datapath Procedures II
II DP CombLogic
always @(Cnt_Sel, Cnt) begin
if (Cnt_Sel==l)
CntNext <= 2;
else
CntNext <= Cnt - 1;
II DP Regs
always @(posedge Clk) begin
if (Rst == 1 )
Cnt <= 0;
else if (Cnt_Ld==l)
Cnt <= CntNext;
end
II Controller Procedures II
II Ctrl CombLogic
always @(State, Cnt_E~O, S) begin
case (State)
S_Off: begin
X <= 0; Cnt_Sel <= 1; Cnt_Ld <= 1;
if (S == 0)
StateNext <= S_Off;
else
StateNext <= S_On;
end
S_On: begin
X <= 1; Cnt_Sel <= 0; Cnt_Ld <= 1;
if (Cnt_E~O == 1)
StateNext <= S_Off;
else
StateNext <= S_On;
end
endcase
end
II Ctrl Regs
always @(posedge Clk) begin
if (Rst == 1 ) begin
State <= S_Off;
end
else begin
State <= StateNext;
end
end
parameter S_Off = 0,
S_On = 1;
II Shared variables
reg Cnt_E~O, Cnt_Sel, Cnt_Ld;
II Controller variables
reg [0:0] State, StateNext;
II Datapath variables
reg [1:0] Cnt, CntNext;
II Datapath Procedures II
II DP CombLogic
always @(Cnt_Sel, Cnt) begin
end
II DP Regs
always @(posedge Clk) begin
end
II ControllerProcedures II
II Ctrl CombLogic
always @(State, Cnt_E~O, B) begin
end
II Ctrl Regs
always @(posedge Clk) begin
end
endmodule
Figure 5.13 Controller and datapath descriptions, consisting of two procedures for the
datapath, and two for the controller.
L
5.3 Describing a State Machine using One Procedure. 109
end
endmodule
(a) (b)
Figure 5.14 Alternative approaches for describing a high-level state machine: (a) two-
procedure description, (b) one-procedure description.
110 . 5 Register-Transfer Level (RTL) Design
parameter S_Off = 0,
S_On = 1;
(a)
~~~
State G.Q;::::::::::::::::::::::::::~::::::::::::::::::::::::
,,
,,
,,
,,
,,
(b) ~~
Clk
Rst 4"""""""":""""""""""""""""""""""""""""'"
'
State 6@~::)6""""""""""""')@""""""""""""
~..........................................................
,
Figure 5.16 Timing differences between different descriptions: (a) In the two-
procedure description, a change in B appearing sufficiently before a rising clock edge
sets up the next state according to the new value of B, (b) In the one-procedure
synchronous description, B's value is only checked on rising clock edges, and thus a
change in B is not noticed until the next rising edge, meaning the new value of B
doesn't impact the next state until two rising clock edges after B changes.
Jill
IS/MULl DELAY CONTROL ON RIGHT SIDE OF ASSIGNMENT
STATEMENTS
Real components do not compute their outputs instantly after the components'
inputs change. Instead, real components have delay-after inputs change, the cor-
rect outputs do not appear until some time later. For example, suppose the com-
parator in the datapath of Figure 5.9 has a delay of7 ns. In order to obtain a more
accurate RTL simulation of the controller and datapath in that figure, a description
could be extended to include such delays. Figure 5.17 shows how the description
of Figure 5.11 could be extended to include a 7 ns delay for the comparator com-
ponent, by using delay control.
Delay control was introduced in Chapter 2 for delaying the execution of a
statement, achieved by prepending the delay control to a statement, as in the state-
ment: "#10 Y <= 0;". However, delay control can also be inserted on the right side
of an assignment statement, such as: "Y <= #100;". The prepended form delays
execution of the statement that follows by 10 time units. In contrast, in the right
side form, the statement is not delayed but instead executes immediately. Upon
executing, however, the update of Y will be scheduled to occur 10 time units in the
future.
In Figure 5.17, the assignment to Cn(flq_O, which models the datapath's
comparator, have been extended with delay control indicating a 7 ns delay. To
more fully model the datapath component delays, delay controls would also be
added to the two CntNext assignments, modeling the delay of the mux and the
decrementer.
5.4 Improving Timing Realism. 113
'timescale 1 ns/1 ns
II DP CombLogic
always @(Cnt_Sel, Cnt) begin
if (Cnt_Sel==l)
CntNext <= 2;
else
CntNext <= Cnt - 1;
(a) Clk=
Rst Iii
cnt_~~ """"""""""""""""""""""0"""""""""""""""""""
0
0
0
0
B ~ ;;
X q::::::::::::::::::::i::::::::::::::::::i:::::::::I:::::::::::::::::::::::::::
(b)
Clk SlJlJlflfLfLn..
Rst 4""""""""""""""""""!""""'r""""""""""""'"
Cnt 6D;"""""""'~;"""""""
Cnt_E~O
State ~ 0
""""""""""""""""""""""0
1 :
,
Figure 5.18 Simulation results: (a) without comparator delay, (b) with 7 ns delay.
,
0
0............................
SAD
256-byte array
32-bits
(a)
~
256-byte array
SAD Out
I
G~
input Go;
output reg [31:0] SAD_OUt;
II Initialize Arrays
initial $readmemh("MemA.txt", A);
ini tial $readmemh ("MemB. txt ", B);
always begin
if (!(Go==1 begin
@(Go==1);
end
Sum = 0;
for (I=O; I<=255; I=I+1) begin
Sum = Sum + ABS(A[I] B[IJ); -
end
#50;
SAD_Out <= Sum;
end
endmodule
L
5.5 Algorithmic-Level Behavior. 117
numbers (rather than hex numbers) into an array. Note that functions are called as
the sole statement of an initial procedure. The syntax may look unusual; using a
begin-end block may make the calls look more familiar:
initial begin
$readmemh( "MernA. txt", A);
end
The main procedure describing the algorithm is the always procedure at the'
bottom of Figure 5.20. The procedure's contents look very similar to the algorithm
in Figure 5.19(b). The always procedure begins by waiting for Go to become 1. It
does so by checking if Go is already 1, and if not, using the event control
"@(Go==1);" Previous event controls involved a single event, such as "@(X)" or
"@(posedge Clk)", or involved a list of events, such as "@(X;11". In Figure 5.20,
however, the event control uses the expression "@(Go==1)". That event control
does not merely detect a change in Go, but detects a change in Go such that Go
becomes 1. The word "change" is critical in the previous sentence. If Go is
already 1 when the event control statement is reached during execution, the proce-
dure still suspends at that event control. The procedure stays suspended at that
statement until Go changes to 0 and then changes back to 1. This behavior of an
event control is somewhat counterintuitive, as many designers make the mistake
of believing that if Go was 1 when reaching the statement, execution will simply
proceed to the next statement without the procedure suspending. The description
therefore uses an if statement to achieve the desired behavior of the procedure pro-
ceeding to compute the SAD if Go is already 1.
The delay control "#50;" at the end of the procedure creates some delay
(albeit a rather short one) between the time that Go becomes 1 and the time that
the computed SAD value appears at the output. Including a delay at the end of the
procedure also prevents an infinite simulation loop in which the procedure repeat-
edly executes without ever suspending if Go were always kept at 1.
Figure 5.21 provides a simple test vector procedure for the algorithm-level
SAD description. The procedure pulses Go_s, waits for some time, and then
checks that the computed SAD equals 4 (we happened to define the memory con-
tents such that exactly four elements differed in A and B by one each). Simulation
waveforms are shown in Figure 5.22. Note that the SAD output is initially
unknown, due to the output not having been explicitly set to some value when the
SAD procedure first executed. A better description would set the output to some
value, likely during a reset.
.
118 . 5 Register-Transfer Level (RTL) Design
'timescale 1 ns/l ns
module Testbench();
reg Go_s;
wire [31:0] SAD_Out_s;
II Vector Procedure
initial begin
Go_s <= 0;
#10 Go_s <= 1;
#10 Go_s <= 0;
#60 if (SAD_Out_s != 4) begin
$display("SAD failed should -- equal 4");
end
end
endmodule
Figure 5.21 Simple testbench vector procedure for the SAD algorithmic-level
description.
Go Jl
.................................................
L
5.6 Top-Down Design-Converting Algorithmic-Level Behavior to RTL . 119
5.6 TOP-DOWN DESIGN-CONVERTING
ALGORITHMIC-LEVEL BEHAVIOR TO RTL
Once satisfied that an algorithm is correct, a designer may wish to proceed to con-
vert the algorithm-level behavior to an RTL description, as a means of moving
towards an implementation of the system. The algorithm of Figure 5.20 can be
recaptured as the HLSM shown in Figure 5.23. The HLSM can then be described
as shown in Figure 5.24.
The HLSM can be tested using a testbench similar to that in Figure 5.21,
except that the testbench should wait longer than just 60 ns for the SAD output to
appear. By counting the number of HLSM states that must be visited to compute
the SAD, one can determine a waiting time of(256*2+3) * (20 ns), where 20 ns is
the clock period. Figure 5.25 shows a simple testbench for the HLSM.
Figure 5.26 provides the waveforms resulting from simulating the testbench,
showing several internal variables to better demonstrate the HLSM's behavior
during simulation.
!Go
SAD_Reg = Sum
input Go;
output [31:0] SAD_Out;
input Clk, Rst;
parameter SO = 0, S1 = 1,
S2 = 2, S3 = 3,
II High-level state machine S4 = 4;
always @(posedge Clk) begin
if (Rst==1) begin reg [7:0] A [0:255];
State <= SO; reg [7:0] B [0:255];
Sum <= 0; reg [2:0] State;
SAD_Reg <= 0; integer Sum, SAD_Reg;
I <= 0; integer I;
end
else begin function integer ABS;
case (State) input integer IntVal;
SO: begin begin
if (Go==1) ABS = (IntVal>=O)?IntVal:-IntVal;
State <= S1; end
else endfunction
State <= SO;
end II Initialize Arrays
S1: begin ini tial $readrnernh ( "MernA. txt", A);
Sum <= 0; initial $readrnernh( "MernB.txt", B);
I <= 0;
State <= S2;
end
S2: begin
if (!(I==255
State <= S3;
else
State <= S4;
end
S3: begin
Sum <= Sum + ABS(A[I]-B[I]);
I <= I + 1;
State <= S2;
end
S4: begin
SAD_Reg <= Sum;
State <= SO;
end
endcase
end
end
endrnodule
module Testbench();
reg Go_s;
reg Clk_s, Rst_s;
wire [31:0] SAD_Out_s;
II Clock Procedure
always begin
Clk_s <= 0;
#10;
Clk_s <= 1;
#10;
end II Note: Procedure repeats
II Vector Procedure
initial begin
Rst_s <= 1;
Go_s <= 0;
@(posedge Clk_s);
Rst_s <= 0;
Go_s <= 1;
@(posedge Clk_s);
Go_s <= 0;
#256*2+3)*20) if (SAD_Out_s != 4) begin
$display("SAD failed -- should equal 4");
end
end
endmodule
Figure 5.25 Simple testbench vector procedure for the SAD HLSM description.
Go~
SAD Out KX~.o.~.o.~o~'~"""""""""
~~I ,.................................................
,,
~................................................
,,
,
time= 10,290 ns
JU1
[SIMUL] SIMULATION SPEED
Higher-level descriptions not only have the advantage of requiring less time to
create them and thus of being more suitable for initial system behavior definition,
but also have the advantage of faster simulation. For example, the algorithmic-
level description of the sum of absolute differences system may simulate faster
than an HLSM description of that system, which in turn may simulate faster than a
lower-level description like a gate-level description. When the system's Go input
becomes 1, the algorithmic-level description resumes a procedure that then exe-
cutes afar loop that involves only a few thousand calculations in total to compute
the output result. In contrast, the HLSM description would require tens of thou-
sands of calculations by the simulator, which must suspend and resume the HLSM
procedure nearly one thousand times in order to simulate the clock-controlled
state machine of that procedure, performing dozens of calculations each time the
procedure resumes in order to compute the current output values and the next
state. For a small testbench, the simulation speed difference may not be notice-
able. However, for a large testbench, or for a system comprised of hundreds or
thousands of sub-systems, the simulation speed difference may become quite sig-
nificant. It is not unusual for system simulations to run for hours. Thus, a 10 times
difference in simulation speed may mean the difference between a 10-20 minute
simulation versus a 2-3 hour simulation; a 100 times slower simulation could take
days. Thus, high-level descriptions are favored early in the design process, when
system behavior is being defined and refined. Low-level descriptions are neces-
sary to achieve an implementation. High-level descriptions are also useful when
integrating components in a large system, to see if those components interact
properly from a behavioral perspective-the fast simulation speed allows for test-
ing of a large variety of component interaction scenarios. Lower-level descrip-
1
tions would be more suitable to verify detailed timing correctness of such large
systems, but their slower simulation speed allows for only relatively few scenarios
to be examined. For example, a system consisting of several microprocessor com-
ponents described at a high level might be able to simulate minutes of micropro-
cessor execution in a few hours, but might only be able to simulate a few seconds
of microprocessor execution if described at a low level.
We previously showed how state machines could be modeled using two pro-
cedures where one procedure was combinational, or using one procedure where
that procedure was sensitive only to a clock input. The latter will simulate much
faster than the former, due to fewer procedure resumes and suspends. The differ-
ence may not be noticeable for small systems, but if a system contains hundreds or
thousands of state machines, the difference can become quite significant.
5.7 MEMORY
Desired storage may initially be described merely using variables declared within
a system's module, as was done in the SAD example of the previous section and
as illustrated in Figure 5.27(a). However, refining the description towards an
implementation may mean creating a description with that storage described as a
separate memory component, as in Figure 5.27(b).
Describing memory separately begins by creating a new module representing
a memory. A description for a simple read-only memory (ROM) appears in Figure
5.28. The memory description has an input port for the address, and an output port
for the data. In this case, the bitwidth of those ports are the same, but they obvi-
ously can be different depending on the memory size and width. The description
declares an array named Memory to store the memory's contents. The description
uses a continuous assignment statement to always output the Memory data ele-
ment corresponding to the current address input value.
SAD SAD
SAD_Out B SAD_Out
L-
256-bytememory
Go----
(a) (b)
Figure 5.27 A more accurate system description may create memory as a separate
component.
124 . 5 Register-Transfer Level (RTL) Design
'timescale 1 ns/l ns
Figure 5.29 shows the HLSM for the SAD example, modified to access the
external memory components. Rather than simply access A and B values, the
HLSM must now set its address outputs A- Addr and B- Addr, and then use the
returned data inputs A_Data and B_Data. Furthermore, because the HLSM is
fully synchronous due to modeling it using a single procedure sensitive only to the
clock input, the HLSM requires an extra state, S3a. Although the A and B memo-
ries will be combinational components, the extra state is necessary because the
A_Data and B_Data inputs will only be sampled on clock edges due to the fully
synchronous HLSM model being used.
!Go
Sum=Sum+ABS(A_Data - B_Data)
1=I+ 1
SAD_Reg = Sum
Figure 5.29 SAD HLSM modified to access the external memory components.
5.7 Memory' 125
'timescale 1 ns/1 ns
input Go;
input [7:0] A_Data, B_Data;
output reg [7:0] A_Addr, B_Addr;
output [31:0] SAD_OUt;
input Clk, Rst;
parameter SO = 0, S1 = 1,
S2 = 2, S3 = 3, S3a = 4,
S4 = 5;
reg [2:0] State;
reg [31:0] Sum, SAD_Reg;
integer 1:;
Figure 5.30 SAD HLSM description with separate memory (part 1).
Figure 5.30 shows the fIrst part of a description of the new SAD HLSM that
uses memory components. The module declaration now includes the ports for
interfacing with the external memory components. Furthermore, the description
now defInes a function ABSDiffthat computes the absolute value of the difference
of two vectors.
Figure 5.31 shows the second part of the new HLSM description. The part
shown illustrates the new state S3a, corresponding to the HLSM of Figure 5.29.
Notice how the accesses to items A and B are now completely through address and
data ports. Also notice that state S3 now calls function ABSDiff rather than ABS. A
synthesis tool would replace that function call by the contents of the function
itself. In other words, the description could have included ABSDiff's if-else state-
ment directly in state S3. However, using the function leads to improved readabil-
ity of the HLSM. (We point out that we violated our own guideline of always
using a begin-end block for an if statement. We did so merely so that the code fIg-
ure would fIt in this textbook).
126 . 5 Register-Transfer Level (RTL) Design
always @(posedge Clk) begin
if (Rst==l) begin
A_Addr <= 0;
B_Addr <= 0;
State <= SO;
Sum <= 0;
SAD_Reg <= 0;
J: <= 0;
end
else begin
case (State)
SO: begin
if (Go==l)
State <= Sl;
else
State <= SO;
end
Sl: begin
Sum <= 0;
J: <= 0;
Figure 5.31 SAD HLSM description with separate memory (part 2).
5.7 Memory. 127
module Testbench();
reg Go_s;
wire [7:0] A_Addr_s, B_Addr_s;
wire [7:0] A_Data_s, B_Data_s;
reg Clk_s, Rst_s;
wire [31:0] SAD_Out_s;
II Clock Procedure
always begin
Clk_s <= 0; #(ClkPeriod/2);
Clk_s <= 1; #(ClkPeriod/2);
end
II Initialize Arrays
initial $readmemh("MemA.txt", SADMemA.Memory);
initial $readmemh("MemB.txt", SADMemB.Memory);
II Vector Procedure
initial begin
... II Reset behavior not shown
Go_s <= 1;
@(posedge Clk_s);
Go_s <= 0;
#256*3+3)*ClkPeriod) if (SAD_Out_s != 4) begin
$display("SAD failed -- should equal 4");
end
end
endmodule
Figure 5.32 shows a testbench for the SAD system with separate memory
instances. The testbench declares the variables and nets needed for connecting
module instances together, and then instantiates and connects the instances. Note
that a testbench can instantiate multiple modules for testing, rather than instantiat-
ing just one module. To the extent possible, designers should always also test
modules separately, before testing them together in a single testbench.
The testbench then defines the clock, memory initialization, and vector proce-
dures. The memory initialization procedures again use the $readmemh function to
initialize arrays within the SADMemA and SADMemB memories by specifying the
arrays to initialized as SADMemA.Memory and SADMemB.Memory. The vector
procedure is similar to the procedure of the previous SAD HLSM without external
II'
A_Data OI':
........................................................
~
,...,.........................................................................
~ nnnn nn n..nn....................... """"""""""""""""""""'..
B_Data G@
~o~.~ I ~O1
State
I
...............................
SAD_Reg 6~~~~~'~~~
SAD_Out 6~~~~~~~~ ~~==J~~'-~~~~-=~;~
n...........
15390 ns
memory components, except that the procedure must wait longer for the SAD out-
put to appear, due to the extra state in the HLSM. Thus, rather than waiting
(256*2+3) * (20 ns), the procedure waits (256*3+3) * (20 ns).
The testbench in Figure 5.32 has also been improved by declaring a parameter
named ClkPeriod to represent the clock's period of 20 ns, and using that parame-
ter in the clock and vector procedures, rather than hardcoding the 20 ns value
throughout. Declaring such a parameter enables a designer to easily change the
clock period just by changing one number, rather than having to change multiple
numbers scattered throughout the code (and possibly forgetting to change the
number in one place or changing a number when it should not have been
changed).
Finally, Figure 5.33 shows waveforms generated from the testbench. Wave-
forms for the address and data signals now appear. The final SAD output also
appears, although it appears later than in Figure 5.26, due to the extra state in the
HLSM. Generally, converting a design from higher-levels to levels closer to an
implementation yield increasing timing accuracy.