Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
35 views4 pages

vi điều khiển

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 4

E Asia Pacific Cmference 011 Civnrits and Systems '96

member 28- 21, 1996,


are-Software C
Dept. of Electrical and Electronic Engineering
Abstract
ropose a systematic method which syn-
thesizes the data path and control path of CPU
We use a graphical representation
re design space more broadly,
change the architecture of data path. The num-
ber of data transfer paths is reduced by replacing
NTRODUCTI ON
One of the fast time-to-market design solutions for
he embedded system. While the software
part is used for providing the behav
ibility of system, the hardware par
Application Specific
as been studied pop- lnstruction Processor
I P is a processor
CPU core of ASIP.
other is to synthesize the micro-architecture opti-
mized the given application which is described with
instructions. They have used the initial architectures
which have data path with an almost fixed connec-
tion topology In those cases, since architectural flex-
ibility is limited by the initial topology, it is not easy
to explore design space widely.
Hiroaki Kunieda
Dept. of Electrical and Elec tronic Engin
Tokyo Institute of Techn
2-12-1, Ookayama, Meguro-ku,
el: i-81-3-5734-257
: +81-3-5734-2842
unieda@ss.titech.ac.jp
Compared with the previous works, our approach
is more aggressive to achieve the high performance
of ASIP. Instruction sequence is decomposed into
micro-operations(M0P's). They are scheduled in
MOP level in order to achieve higher performance
with optimized micro-architecture. To explo
sign space broadly, we try to transform the
tecture by the selection of synthesis parameters. We
assumed a virtual machine as the initial architectural
template, in which there is no limit in th
of control path on the se1
time.
ed microprocessor. The
part and hardware part is done
quence. In the hardware synthesis part, th
bly codes are translated into a graphical form, callcd
ical representation of dat
ing between RTL compon
and the topology synthesis process begin, the combi-
nations of synthesis parameters is applied selectively
to enhance the performance or to reduce the area.
This results in the transformation of data path topol-
ogy. The scheduling is performed in MOP-level un-
ansfer Graph(RTG) wh
306 T4-OB4. 0-7803-3702-6/96/$5.00@1996 IEEE
eI .TL n.ni.r
Fig. 1. : Synthesis Flow
der the selected synthesis parameters. Additionally,
to reduce the connection area cost, the data transfer
paths are reduced by replacing a path with its bypass
route.
3. REGISTER TRANSFER GRAPH
Since instruction is composed of a series of MOPs
which are data transfer operation or data processing
operations between RTL components, it is possible
to consider an instruction as the ordered execution
of such RTL operations. In order to represent oper-
ations in RTL, weintroduce a new RTL-level graph
RTG(V,A), in which V is the set of RTL compo-
nents and A is the set of MOPs. Compared with
CDFG, RTG is more useful to predict the usages of
RTL components, and the connection topology be-
tween them. The easiness of prediction is important
to evaluate the usages of RTL components before the
resource allocation is performed. The multiple exe-
cutions of operation are denoted by execution order
set Oz3, simply called as order set. The elements of
the order set are control step numbers at which the
operation have to be executed.
Initially, the propagation delays of all operations
are assumed to have unit time delay. The numbering
of registers are subject to its assignment table. The
functional operation is represented by two incoming
arcs with the same execution order to a vertex. All
functional operation is numbered by a unique num-
ber. The RTG for LOAD instruction and its MOP
definition are shown in Fig.2. The order sets are as
fOllOWS: 01, 3 ={I}, 01,1 ={2}, 0 2 , s ={4}, 0 3 , 4 =
MAR <- PC
MBRo <- mem[MARl. PC <- PC + 1
IR <- MBR
MAR <- 1R.addr
MBRo <- mem[UARI
Rd <- MBR
Fig. 2. : RTG of LOAD Instruction
Fig. 3. : RTG for Sample Instruction Sequence
{2,5), 0 4 , 2 =(3)~ 04, Bn =( 6) .
According to the execution sequence of the given
instruction sequence, each RTG for instruction is in-
tegrated into a representative RTG. When the dif-
ferent kind of instruction is integrated, a new vertex
or a new arc may be added as well as the change of
order sets. The order of arcs is updated sequentially
according to its execution sequence. Fig.3 shows the
representative RTG of the example instruction se-
quence for x =(w +x) - y.
4. SELECTION OF SYNTHESIS PARAMETER
Depending on the selected synthesis parameters,
RTG is modified to accommodate the parameters,
and in turn, the result architecture will be changed.
Each parameter or the combination of them is ap-
plied to the initial RTG in which all MOPs are
assumed to be executed sequentially without any
execution overlap or component sharing. We are
using four synthesis parameters: Resource Sharing
(Cl ), Multiport Memory (C2), Multicycled Opera-
tion (C3), Pipelined Operation (C4).
In this paper, these synthesis parameters are ap-
plied additively shown in Table.1. Multicycled func-
tional operation and functional pipelining are se-
lectable alternatively according to the application
or the objective function. Fig.4 shows the modified
RTG by each case. The elements of order sets and
the connection topology are changed. In (b), there
are two pair of MBR and MAR. One is for instruction
fetch (3,4), the other is for data fetch (3*,4*). In (c),
two operand registers (pl,p2) are included for mul-
ticycled functional operation (Case-111) or pipelined
function operation (Case-IV).
5. LIST SCHEDULI NG WI TH I NSTRUCTI ON ORDER
In order to schedule MOPs to guarantee correct
execution of instructions, the dependencies between
T4-OB4.2 307
( C )
Fig 4. Modified RTG under (a) Case-I (b) Case-I1 (c)
Case-II1,IV
TABLE I COMBINATIONS OF SYNTHESIS PARAMETERS
Fi
IT1 c1+ c2 +c3
I V c1+ c 2 +c4
instructions or MOPS must be kept. There are two
kinds of dependencies in instruction sequence : inter-
instruction and intra-instruction.
ion dependency is the dependency be-
ons. Since the concurrent execution
of multiple instructions such as super-scalar is not
allowed at the current syste tion can be ex-
ecuted only after all instruct re the instruc-
tion are executed. Hence,
struction implies the depe
ral dependency between them, called as inter-MOP
dency. The operation code field of an instruc-
has the information w h kind of operation
to be executed Therefore, only after the in-
tion is decoded properly, the type of execution
wn. The MOPs of exec
uled before the MOPs o
cle We call this constraint as cycle boundary. We
use list scheduling to sch
tion into a control step order is used
as the priority function which resolves resource con-
tention.
Fig. 5. . Refined RTGs by Transfer Path Reduction
6. TRANSFER PATH REDUCTION
For the scheduled RTG, we apply a heuristic tech-
nique to reduce the number of data transfer paths
without increasing the number of control steps. Data
transfer operations means register-to-register opera-
tion which directly transfers data without modifica-
tion. If we can find out an alternative path for a data
transfer path in RTG, the data t can be
results in the reduction of connection cost. The
rect connections between registers and the functional
units with bypass operation are used as the bypass
resources.
placement is performed in three st
replaced with alternative path (b , wh
nement and selection.
s to find out the candi
s to refine the cur-
path replacement.
The selection step is to select only candidates with
total number of control steps after
RTGs for each case are refined like sho
In (a), a4,60, a4, 61 are removed and
a4*,60, a4*, 61 are removed.
7. DATA/CONTROL PATH GENERATION
After the scheduling and the transfer path reduc-
tion are completed, the vertices of RTG are mapped
into RTL components and the arc
mapped into connection resources
can be implemented in bus-oriented
ed. Since the connection geomet
itly in connectivity graph, multiplexer-type
When buses are used as the connection resources,
the occupancies of connection resources have to be
carefully investigated. When more than one MOP
ath is derived straight-forw
308 T4-OB4.3
are executed simultaneously, all resources required
to execute those concurrent MOPs should be re-
served so as to avoid resource conflict and data colli-
sion. Also, the operand paths and the result path of
functional unit should be reserved during operation
time. In case of memory access, both data path and
address path must be reserved together in order to
ensure the correct memory access.
Control path consists of condition register, state
register, decoder and micro-instructions stored in
PLA, ROM or wired logic. Micro-instructions are
generated from the scheduled time table. The dif-
ferent combination of MOPs is executed at every
control steps. We define the combination of MOPs
as M-set. Among M-sets, there are common M-sets
which are executed more than one times. Common
M-set is unique for all control steps and has unique
micro-instruction. Decoder associates the current
control step with the micro-instruction which has to
be executed. In order to reduce the hardware, new
instruction set tuned to the derived topology must
be generated. Currently, our system does not include
such a procedure.
8. EXPERIMENTAL RESULT
To verify the feasibility of the proposed method,
the basic block of dzfleq, the differential equation
benchmark are chosen as the example. Table.11
shows the result component utilization of five cases.
wegenerate data path and control path under 4 com-
binations of synthesis parameters. We assume the
delay of multiplier is three times of that of addition.
Mark t means the multicycled multiplier with the
propagation delay of 3 control cycles and means
the 4-stage pipelined multiplier, respectively. Cur-
rently, the number of pipeline stages is fixed to 4
and the propagation delay of multicycled operator is
3 control steps. As the advanced work, the optimal
number of pipeline stages and the propagation delay
of multicycled operator will be determined so that
the given design goal can be satisfied. re, which is
the product of the number of control steps and the
maximum register-to-register delay, is calculated un-
der the assumption that the control cycle time of the
initial RTG is the nominal cycle time s, n, , w, are
the number of control steps, the number of M-set
and the width of micro-instruction.
Note that even if the number of functional com-
ponents and the number of storage components are
same, the usages of connection resources are differ-
ent according to its connection topologies and im-
plementation methods. This result indicates that
the connection geometry as well as the utilization
of storage unit and functional unit has to be consid-
ered at the design evaluation step. Also, the width
of micro-instruction is vaned with the different im-
plementation methods even in the same case. So, in
order to select more practical solution, the effect of
control path has to be considered together with the
data path.
9. CONCLUSI ON
We proposed a systematic method which synthe-
sizes the data path and control path of CPU Core for
hardware-software codesign. We firstly proposed a
graphical representation method to describe instruc-
tions in register transfer level. By using RTG, we can
derive the topology of data path directly. In order
to transform the architecture of data path, we ap-
plied synthesis parameters selectively. As the result,
we can explore design space more efficiently. The
optimization of data path topology as well as the
maximization of resource utilization is considered si-
multaneously. By reducing the number of data trans-
fel paths by replacing the rarely used path with its
bypass route, the connection cost is minimized. To
select the best among the candidate CPU core, the
data path cost and control path cost are considered
together.
10. ACKNOWLEDGEMENT
This work has been engaged as a project in CAD21
Research Body of Tokyo I nstitute of Technology. We
wish to thank all the members of CAD21 for their
suggestions and cooperations.
T4-OB4.4 309

You might also like