Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Attia 2013

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2013 10th International Multi-Conference SSOl13 1569695461

on Systems, Signals & Devices (SSD)


Hammamet, Tunisia, March 18-21, 2013

VCRBCM: A low latency Virtual Channel Router


architecture Based on Blocking Controller Manger

Brahim Attia, Nourddine Abid, Wissem Chouchen Abdelkrim Zitouni, Rached Tourki
Electronics and Microelectronics Laboratory Electronics and Microelectronics Laboratory
Faculty of Sciences of Monastir Faculty of Sciences of Monastir
Brahim.attia@topnet.tn rached.tourki@topnet.tn

Abstract-the network on chip has been proposed as a solution associated with each channel. However, a channel can be
for communications between IP cores in System on Chip design. multiplexed in n virtual channels (VC). VCs provide multiple
Many types of network on chip using different kinds of buffering buffers for each channel, increasing he resources allocation for
strategies has been proposed in the open literature . The virtual each packet. The insertion of VCs also enables the use of
channel buffering proposed by Dally allows improving the special policies to allocate the physical channel bandwidth,
performance of the network with an overhead area. In this type
allowing support to quality of service (QoS). Many approach
of network, the blocking of packets can occur and a mechanism
was been proposed in the literature to improve th e
to manage these packets is needed. For this p urpose a new low
performances in a network on chip. This Section reconsiders
latency virtual channel router based on blocking manager
several relevant proposals in this topic. To begin, one must
module was proposed. The blocking manager is incorporated in
present the notion of the virtual channels and study briefly the
the router micro architecture w ith a distributed fashion. Each
physical channel uses a blocking controller module to switch
models proposed recently. The virtual channels (VCs) are
between virtual channels when a blocking problem occurs. The
channels that share the same physical channel by using
low latency is obtained by placing the routing function in parallel different buffer. It increases the cost in terms of area and
with the link controller to reduce the critical path. To validate power. However, they have many advantages:
our proposal, we present the functional validation of the blocking
Avoid deadlocks: The virtual channels are independent.
controller module, a virtual channel router based on the blocking
Thus, by using them with adapted techniques of routing, they
manager, and finally we present the simulation of the network
make it possible to break cycles of the graph of the
with a scenario that involves the problem of blocking.
dependences of resources. VCs facilitate in particular the use of
Keywords-component; formatting; style; styling; insert (key the adaptive algorithms of routing.
words) Optimize the links uses : The division of the physical link
by several virtual channe Is can make it possible to make a
I. INTRODUCTION (HEADING 1) better use of the link. This offers also a reduction of the
Designers are developing ICs integrating complex number of necessary link.
heterogeneous functional elements into a single chip, known as Increase performances. [5] Shows that the performances of
a System on a Chip (SoC ). IP cores, communication the interconnection network can be improved by dividing the
architectures and wrapper interfaces to peripheral devices [1] size of a FIFO buffer on several virtual channels.
compose a Soc. Usually; the interconnection architecture
employs dedicated wires or shared busses. Dedicated wires are Providing different services: The virtual channels can be
effective for systems with a small number of co res, but the used to separate different type of traffics and to offer different
number of wires around the core increases as the system levels of priority (QoS supports).
complexity grows. The, dedicated wires have poor reusability We will see that the over cost induced by the use of
and flexibility. A shared bus is more scalable and reusable virtual channels can be compensated by the advantages which
when compared to dedicated wires. However, busses allow it brings. The already proposed models derive from/to each
only one communication transaction at a time, all cores share other. Chien [6] proposes architecture of a wormhole router
the same communication bandwidth in the system and and a model for the virtual channels router. The [6] chien
scalability is limited to a few dozens of cores [2]. In this model presents several significant lacks. Initially, it does not
context, applying concepts from computer and telecom consider the pipeline. Secondly, this model supposes that the
networks to embedded systems, a new int erconnection Crossbar must provide a separated port for each virtual
structure, named Network on Chip (NoC) is emerging channel. That means that the latency of the crossbar and its
[2] [3] [4]. The throughput of interconnection networks arbitration grows very qu ickly with the number of virtual
(Network on Chip) is limited to a fraction (typically 20% to channels. Duato [7] uses the [6] chien model and considers a
50%) of the network capacity due to coupled allocation of pipeline at three stages. The pipeline is translated in a stage of
resources [5]. Two main resources compose interconnection routing, where the decoding of the address is carried out stage
networks: buffers and channels. Typically, a single buffer is

978-1-4673-6457-7/13/$31.00 ©2013 IEEE


of exchange which includes the crossbar and a stage of end of transmission of the brown packet, the green packet must
allocation of the virtual channels. The work in [8] extended the continue to propagate through his path. Finally, after the end of
[6] Chien model to a virtual cut-through router model, by transmission of the pink packet, the red packet must continue
modifying his parameters and the number of buffer in the input to propagate through his path.
ports. The works in [9] introduce a new micro architecture for a
virtual channels router which reduces the latency of the router
to that of the router using wormhole control flow. It is called
speculative virtual-channel router. By comparing the wormhole
with speculative virtual channels, we note that this last gives
same latency with an improvement of the throughput moreover
than 40%. The canonical architecture of the virtual channels
contains a number of modules which are difficult to be
pipelined because they contain states which are dependent on
the outputs modules. These modules are called atomic
modules (example: virtual channel allocator of the VC router).
The inputs port of these modules depends on the outputs port
of the others modules. This dependence determines the critical
path. Whereas with the architecture of the speculative virtual
channels, the allocators are separate, there is much number of
wires connecting the different allocators. In the speculative
virtual channels router the VC allocator is independent of the
SW allocator. The congestion of the oC reduces the Figure I. switch allocator of a non-speculative virtual channel
performances of the SoC. This effect is particularly great in
networks on chip that use a single buffer in each input port. To avoid these types of problems, a new extension of the
This type of network use a router with simplified design but virtual channel router based in blocking manager module will
prevents packets to share a physical channel at any given be presented. The blocking manager is implemented in
instant of time and this is due to competition packets to distributed fashion where each input port has it proper blocking
network resources. This problem is the main cause of the controller. The blocking controller generates the select signal
increase in network latency and decreased throughput primarily of the multiplexer placed after the two output controller and
by increasing the network traffic. In this kind of network many communicates and enable and disable the output controller of
problems of blocking of packets may occur. We assume the each virtual channel. The paper is organized as follows. In
following scenario in a 4x4 mesh network presented in figure section 2, we will present the building block of proposed
1. The IF core connected to router (1, 0) want to send the virtual channel router based on blocking manager module.
packet number 1 to the IF core connected to the router (3, 3). Section 3 will present the blocking controller module. Section
The IF core connected to router (1, 1) want to send the packet 4 presents the functional validation of the blocking controller,
number 2 to the IF core connected to the router (3, 2). The IP virtual channel router based blocking manager, and the
core connected to router (2, 2) want to send the packet number associated network on chip, and evaluation performance of a
3 to the IF core connected to the router (3, 2). The figure 1 network on chip based on proposed router. Finally, we
illustrates the blockage problem. 'When the IP core (1,1) start conclude in section 5 with remarks.
the transmission, the packet number 2 use then the physical
channel between router (1, 1 ) and the router (1,2) until the end II. BUll. DING BLOCKS OF VIR1UAL CHANNEL ROU TER
of transmission of this packet and no other packet can use this BASED 0 BLOCKlNGMANAGER
link. In addition, the packet number 1 reach the output
The virtual channels are implemented in all port except the
controller of the input port west of the router (1,1) and blocks
locale port. Each port required a module called blocking
itself. The packet number 3 use the physical channel between
controller for swapping between the virtual channels that
router (2,2) and the router (3,2). Therefore the packet number
request the same output port. The local input port is the same
2 reaches the output controller of the input port south of the
that are proposed and described in [10]. The main building
router (2, 2) and blocks itself. At this moment, the packet
blocks of generic router are: Virtual channel allocator, Switch
number 1 remains to blocking state and the physical channel
Allocator, Crossbar, credit switching and input and output
between router (1, 1) and the router (1, 2) is unused. Finally,
ports. The router has 5 input ports and 5 output ports,
the packet number 2 can be sent after the end of transmission
supporting 2 virtual-channels (VC) per port. Each input port
of the packet number 3. The packet number 1 can be sent after
contains a multiplexer, demultiplexer, and 2 virtual-channels.
the end of transmission of packets number 2 and 3. The packet
Each virtual-channel is composed by a link controller, FIFO
indicated by red color must be first blocked because of the
buffer, output controller, and a routing fimction.
occupation of the physical link (between router 11 and router
12) by the packet indicated by green color. Then it must be Mu lti plexer : its role is to connect the physical channel
propagated after blocking of green packet, because of the with requested virtual channel. Before the transmission of
occupation of the physical link (between router 22 and router packet, the emitter router must firstly compute the value of
12) by the brown packet. The red packet must then block VCid (Virtual channel identity) that present the number or
because of the transmission of the packet pink color. After the address of the virtual channel of receiver router and

2
concatenate theVCid with Data signal coded with 32 bits. computation may be performed in the previous router in
When the header of packet arrives at the input port of receptive preparation for use in the next such as the wo rks of [ 11]. The
router, theVCid are extracted from the header flit and used for idea of [ 11] that the route may be calculated one step ahead of
selection of the multiple xer. This signal will be rejected later where it is required was first employed by the SOl routing chip
and will not be stocked in the FIFO memory. [12] and is known as look-ahead routing. In our work we place
the routing function in parallel with the link co ntroller wish
Link controller: The flow of messages across the physical
reduces the critical path and the minimal latency of the virtual
channel between adjacent routers is implemented by the link
channel router [lO].
controller. The link controllers on either side of a c hannel
coordinate to transfer units of flow control. This block has the On-eo arbiter per One arbit.pf per

role of receiving a packet flits from the Output port of the input Virtual
channel
output Virtual
channe-l
adjacent router and storing them in the buffer of the virtual Input port

channel usingVirtual Channel identity (vci). The Link L oc.al

Controller FSM starts when BOP and REQ are asserted and the
current credit is in high level. Its tasks are summarized in two
principal tasks: - To receive packets sent by the output Input port
So u th
controller of the sender and to write packets received in buffer
FIFO of the virtual channel addressed by the vci field.
Output controller: The output Controller is like a bridge
that connects the FIFO and the output port destination. Its tasks
are to read data from the FIFO, detect if EOP is in high level or
not in each data and to determine the last flit of each packet. In
fact, the EOP signal is asserted in output pin when it was
detected and it is also connected to the RF to indicate the end
of the routing. In reality the EOP is also asserted when one of Input port
W est
all credit inputs and when one of all grants inputs are activated.
The output controllers allow the transfer synchronization and
manage the flow control of the packet with the next router. The
interaction between FIFO and the output controller is
implemented with GALS techniques. I ts role is to read data
stored in FIFO and the emission of the flits of each packet to
the output port selected by the routing function after having Firs.t sta,ge of the
Virtual cohan noel
Sec:ond stag.e of the
Virtual oech.a. nnel
tested the signal credit indicating the availability of the next allocato r allocato r

router. It uses as input 4 credit signals provided by adjacent


routers and 4 grant signals from the global switch arbiters. Figure 2. Complexity of a virtuakhannel allocator, with detenl1inist routing
functions. This routing function returns virtual channels of a single physical
Routing function: The main role of routing function is to
channel ( ), the virtual-channel allocator needs in the first stage a v: I arbiters
determine from the destination address field of the header flit for each input virtual channel, followed by a second stage of piv: I arbiters for
the path that the packet must follow. Each routing f unction each output virtual channel. (pi, number of input switch ports, and po, number
sends 4 requests to the global switch arbiter. If the output port of output switch ports � p, the number of physical ports in the router; and v is
has denied access, the routing function must maintain its the number of virtual channels per physical channel).

requests. The Switch Allocator manages access between input


controllers and the output ports by asserting correspondent The complexity ofVC allocation is dependent on the range
grants signals. Our routing module use deterministic routing of the routing function. If the used routing algorithm is
algorithm which provides a minimal path between any two determinist, the routing function returns a singleVC; the
nodes called XY. In this type of routing algorithms, the allocation process simply consists of a single arbiter for each
decision is independent of the state of the network. According outputVc. As any of the inputVCs may request any output
to the wormhole, the ro uter can contain only information of VC, each arbiter must support P xV inputs. If the used routing
one packet. In this fact, the input port cannot receive a new algorithm is adaptive, the router fu nction can returns multiple
packet. outputVCs restricted to a single physical channel, an
additional arbitration stage is required to reduce the number of
Demultiplexer: Each packet sends a request for the use of requests from each inputVC to one. The winning request at
physical channel. This request is processed by the switch each virtual channel buffer then proceeds to thesecond stage as
allocator that sends the signal Sel_demux to select the virtual described above. The complexity of such a scheme is
channel that win the access to crossbar and probably the illustrated in Figure 2.
physical channel.
The routing function determines the output port andVCs
Virtual channel allocator: In order for virtual-channel and that may be requested prior toVC allocation. AVC which is
switch allocation to take place the routing function must first free to be allocated is then selected by the fi rst stage of
be evaluated to determin e which virtual -channel(s) at which arbitration. The result of this first stage of arbitration is a
output port(s) the packet may request. To ensure that this request for a singleVC at a particular output port. This request
computation does not lie on the router's critical path, the is subsequently sent to the appropriate second stage arbiter.

3
While this scheme does not guarantee to allocate al free output port is routed from theVC which wins the first stage of
VCs to potential waiting inputVCs in a single cycle, there is no arbitration. In order to improve fairness, the state of theV
performance penalty as only one flit may be sent per cycle on input arbiter is only updated if the request is also successful in
an output channel. TheVirtual channel allocator is composed the second stage of arbitration. We assume this organization
by two stages. The first stage is composed by an 8/1 wherever multiple stages of arbitration are present. This switch
multiplexor and 1/8 demultiplexor, one arbiter, and two and allocator organization may reduce the number of requests for
gates such as presented in figure 3. The second stage is different output ports in the first stage of arb itration, resulting
composed byV x (P -1) + 1 arbiters. One arbiter for each in some wasted switch bandwidth.
virtual channel and the last is for the local port of the router. - '" Inpul_�IIII1x...,w1
dk !3iro
The arbiter used for each virtual channel must supportV x (P - rl!q..:VCl
2) + 1 inputs. The arbiter used by the local port of router must ""-'I<"
supportV x (P -1). In the most general case where the routing ,0<:<. ...ooetl

channel may return any of P xVVCs, the number of inputs to


the first stage of arbiters must now be increased fromV to P x
V. In this case some performance degradation may be expected .. �put�lI� �

as the scheme makes little effort to perform a good matching of []!Q) rlKl.VC1

requests to free outputVCs. An attempt is made to allocate an ""1' __""2

unusedVC to the new packet. A request is made for one of the �



....".,.
virtual-channels returned by the routing function.
.. �plJt � II\Ult_� �

� rCI:Lvtt


rllQ....VC2

� ... =-

State SO
State SI
.. fJpUC�II� �
St,re EO rlMl-VC1
State EI
State WO '"'9"' req..:vt;2
Shte Wi
� "''''''"

V: 1

Arbiter ,. 'I!CLPt(IO) '"'I'



"'12 �
��
11 �P2(10)
,.., �
� ... .... �
, L1 �PL(10) '"'I' ,
..." ,
..., �
Figure 3. First stage of the virtual channel allocator
W ""'" .... """U!!l...!.l!V

Allocation involves arbitrating between all those packets , 10 rI!!lLP1(10) ..." --r:I!iJ!D
requesting the same outputVC. The complexity and latency of rect... P 2(10)
..." �
'eo oNll
a virtual-channel allocator of a virtual -channel router depends ..., --§J!D
on the range of the routing function. If the routing function ... .... �
returns a single virtual channel (R -7V), the virtual -channel , 10 rMLPW"O) ,.., �
allocator needs only arbitrate among input virtual channels roa2 ,
, 11 rICLP2{l1l)
which are competing for the same output virtual channel. If the ,.., ,
routing function is more general and returns any candidate '" '"'I' �
virtual channels of a single physical channel (R -7P) such as 'eq pSOll \ rllCLPl{l0) ... , , L
presented in this work, each of the arbiters now needs to roa2 I-- �
arbitrate among v possible requests in the first stage of the " 11 rICLP2{IO)
,.., �
separable allocation, before forwarding to the arbiters in the '" '"'I' -illi:]ID
second stage. ,. 1 \ rICLP1(IO) reo'
-GLID
Switch Allocator: Individual flits arbitrate for access to ,.., �
,. 'I!CLP2(IO)
physical channels via the crossbar on each cycle. Arbitration ,.., �
may be performed in two stages [9]. The first reflects the '" .... �

sharing of a single crossbar port byV input virtual channels;


this requires aV -input arbiter for each input port. The second Figure 4. First stage of the switch allocator

stage must arbitrate between winning requests from each input


port (P inputs) for each output channel. The scheme is The switch allocator of a non -speculative virtual channel
illustrated in Figure 4 and 5. The request for a particular output router, with a v: 1 arbiter in the first stage determining which

4
virtual channels of an input port gets to bid for its output port, blocked. The blocking information must be delivered to the
and a pi: 1 arbiter for each output port in the second allocation routers used by this packet. If the header of a packet is routed
stage. A separable switch allocator for a generic virtualchannel from one router to the next router, crossed router can block this
router, with the first stage of v: 1 arbiters for each input port, packet if it receives blocking information from the next router.
and a second stage of pi: 1 arbiter s for each output port. For Each input port use s only one blocking controller to specify
simplicity, the figure 4 shows the outputs of v: 1 arbiters what packet is blocked at every moment. Figure 6 illustrates
connected to the inputs of pi: 1 arbiters, while in reality, the the entity used by the blocking controller to manage two
outputs of the v: 1 arbiters select the output port request of the virtual channels. The inputs and outputs of this controller are:
winningVC and forward it on to the sec ond stage of pi: 1
Select: which is the virtual channel identity generated by
arbiters.
the transmitter router. The information provided by this input is
Arbiter: the first stage of the switch allocator contain 4 used to select the multiplexer of the input port.
arbiter for the 4 ports north, east, west, south. The local port BOP, EOP: these signals are used to indicate the beginning
does not use an arbiter since it incorporates a single input and the end of 1he transmission of the current packet.
block. Suppose that the two virtual channels of the north input BlocageINl, BlocageIN2: these inputs are derived by the
port contain packets waiting for transmission to neighboring receiver router and indicating the blocking state of the packets
routers, the arbiter of the north output port will then receive a sent to the two virtual channels. BlocageINI indicates the
two request in his input. By using the round robin mechanism, status of the packet sent to the firs t virtual channel and
the arbiter will shoe which packet will be transmitted or which BlocagelN2 indicates the status of the packet sent to the second
virtual channel will be connected to the physical channel by
virtual channel. This controller is synchronized with the
selecting the adequate multiplexer. After one clock cycle, if
router's clock
the successN signal are equal to 'I 'then the arbiter maintain his
request, else the arbiter shoes the second Vlitual channel. Bloucageoutl, Bloucageout2: this information is sent to

Multiplexor: An input port has the ability to send packets


the previous router to indicate whe1her the packet is blocked in
to 4 output ports at maximum. Subsequently, the output signal the router or not.
of the routing function that present the request of an output port
is coded by 2 bits. The task of the multiplexor is to select the
request from the wining virtual channel and to code them in 4
bits ( a single bit is set to a time ). Each bit presents a single
request from a one input port to one output port. These signals
will be the inputs of the second stage of the switchallocator.
Transcoder: Since the local port has a single routing
function, his request will be transcoded directly without
selection; the request signal for a virtual channel will be used
as an activation signal. Figure 5. Blocking controller module for eeach input physical link

The architecture of the second stage is com posed by 5 The finite states machines presented in figure7 describe the
arbiters. One arbiter for each output port. Each arbiter uses the functionality of the blocking controller. This FSM USB 8 states.
round robin mechanism for choosing one port among the input In the first state ca lIed idle, during which no packet is sent to
ports requesting the same output port. To guarantee a stable the input port, the two virtual channels are free and the two
and low latency virtual channel router, we have shoes that the output signals are at o. Upon the arrival of a new packet, the
output controller modules use the rising edge of the clock and module detects the BOP andVCI signals. TheVCI indicates to
the arbiters of the second stage of the switch allocator use which virtual channel the packet will be sent. The next state of
falling edge of the clock. Then for one cycle of selection of the this machine will beBlocagel if the packet is transmitted to the
crossbar, the output controller waits a half cycle before the second virtual channel. Otherwise the next state of this
transmission of the first flit. To resolve this problem, each machine will be Blocage2. When the current state of this
arbiter must send a 2 selections signal to a crossbar. The first is machine is Blocagel or Blocage2, it indicates to the previous
generated on the falling edge used by the credit switching. The router that packets which will be routed to the first or second
second signal is is delayed by half a cycle, delay time of th e virtual channel will be blocked. In the case that the machine
output controller, used for the command of the crossbar. detects the end of the packet by the EOP signal, all virtual
Finally, the switch allocator sends the selection signals to the channels are free again and the controller returns to the idle
multiplexor of the first stage in each activation of the signal state. Otherwise the module can detect the blocking of the
success to be used to select thedemultplexor of the input port. packet with the signal BlocageIN1 or signal BlocagelN2
according to the destination of the packet. The next state can be
III. BLOCKING CONTROLLER MAN\GER BlocagelAttente2 or Blocage2Attenetl. In these states, t he
current packet is blocked and the machine waits the arrival of
To manage the notion of blocking, we designed a control
next packet from the other virtual channel. Even after the
module called blocking controller. The blocking is based on the transition to one of these states, the FSM can detect the end of
following concepts: If a packet is blocked in a router, the router the first packet by the set of the signal EOP to a high state and
must inform all ports used by this packet th at the packet is this is due to the delay caused by the switch allocator to drive

5
the grant signal to the output controller in '0'. If this is the case, virtual channels of th is port. Then, the arbiter has at its input
the FSM must return to the idle state. Otherwise, if no new the state of blo cked packets which are sent or which will be
packet detects an information of release of the channel by transmitted to the virtual channels of requested port. The first
returning the signal BlocageINI or signal BlocageINI to '0', arbiter of the first stage of the virtual channel allocator request
the FSM will return to the state Blocagel or Blocage2. Else ifu one physical channel and send s the blocking information of
new packet arrives before the release of the first packet , the this channel to the blocking controller module. If the packet is
next state will be Blocagelenvoie2 or Blocage2envoiel where blocked, the arbiter masks its request to the virtual channel
one packet will wait the release of the channel and the other without acting on the state of occupat ion of this channel to
packet will be routed to the virtual channel. allow other packets to use the same physical channel. We must
note that the local port of the router do es not use a blocking
controller module.
�i\oil�.1tI lI1J1JJ1fUU1JlJl.1J1Jl.fliUU1MJlJU1M!
. 1flMJWLI1JUi lJIJU
�i\oil�.o'Id ______ �r�-,�-----
� b\oil�.1I\q1 l l�____-=___________
�i\oil�.I1'�
WI:wI:II1!\o'II�.�1
WI:wI:II1i\oil�.� I
WI:wI:II1 i\oll�.� I
�i\oil�.oIir4;eU11 r'==�I1'----
l\olt�.1I'ii!
��.1It
��.�
�\o!.IJrqt.'� ___----
l� ��n�
r-
__
-' ��� ____ _______

�\o!.IJrqt.I� I------l ---.111...


.: .
________ ________

��.I.I
��.fIIW:rn1
��.'�I
�\o!.IJrqt.I� I
��.� Ilxql
�\o!.IJrqt.1It �

j-______�----��
��.� L

Figure 6. The finate state machine of the proposed blocking manager ��.'hI 1-----------' '----:---__----n'-
-' ____
�\o!.IJrqt.I� 1--______________��___�rL
If the current state is BlocageiAttente2 and a new packet �\o!.1Jrqt.1\tJ:ijl11
arrives with the release of the first packet, then the machine �\o!.IJrqt.I\W.In1
switches to the state Blocage2Envoiei and the first packet is ��.I�

released and the second packet is blocked. With the same ��.lI\t:qIJlI
�\o!.IJrqt.� lhl11 de
principle, if the current state is Blocage2Attentel, the machine
switches to the state BlocagelAttente2 upon the arrival of a
new packet with the release of the first packet . If the current Figure 7. Functional validation of the proposed blocking controller entity
state is BiocagelEnvoie2 or Blocage2Envoiel and if the
blocking controller detects the end of the second packet and the To validate the blocking controller of the proposed network
release of the first packet, then the next state is blocag el or on chip, we will present the simulation of blocking controllers
blocage2. If the Blocking Controller detects the end of the that pr oduce blocking in the case presented in figure 8. the
second packet, and if the blocking of the first packet is blocking controller involved in this case are: west port of the
maintained, then the next state is BlocagelAttente2 or router (1,2), south port of the router (3,2), and south port of the
Blocage2Attentel. This entity can also detect the blocking of router (2,3).When the green packet are received by the first
the second pack et; the next state will be then virtual channel in west port of router (I, 2), the next state of the
BlocagelBlocag2. blocking controller connected to this port will be blocage2 and
this later will activate the bloucageout2 signal to indicate the
blocking of all packets that want to use the second virtual
IV. SIMULATION RESULTS �D PERFORMANCE STUDY
channel Together, when the brown packet is received by the
Functional validation of the blocking controller first virtual channel in south port of router (3, 2) the next state
To ensure the correct operation of the system, the output of the blocking controller connected to this port will be
signals generated by the blocking controller of the recei ver blocage2 for the same reason. After 6 clock cycles, the green
router must be linked to the inputs port of the blocking packet reach th e south port of router (2,2) and request the
controller associated with the transmitter router. It is thus second virtual channel of the south port of router (3,2) and then
necessary to forward each signal to the suitable virtual channel. it will be blocked. The next state of the blocking controller of
To resolve this problem, we concatenate Blocageoutl and west port of the router (1, 2) will be blocage1attente2. The first
Blocageout2 signals with Etatl and Etat2 signals of the same output signal "blocageoutl" will be activated and the second
virtual channel. When the routing function request s an output output signal "blocageout2" will not change. At this moment,
port, the associated multiplexer select s the state of the two the red packet can propagate. After 4 clock cycles, the red

6
packet cross the router (1, 2) and the next state of the blocking occupation of physical channel by the packet number 4, the
controller connected to the port west will be blocage1envoie2. router (1,1) continu e the transmission of the packet number 1
The red packet will be blocked when it reach the west port of because it will detect the blocking later. After the detection of
the router (1, 3) because it's physical channel are used by the the blockage, two supplementary clock cycles are necessary to
pink packet. To this end, the green and red packets remain the permutation between packets 1 and 2. At the end of the
blocked for half clock cycle. This case is presented when the transmission of packet number 2, the packet number 1 remains
next state of the east port of the router (1, 2) becomes blocked and the router (1, 1) do not send any data until the end
blocage1blocage2 and the switching from this state will appear of transmission of the packet number 4.
at the end of the transmission of the brown packet. The green
Network on chip functional validation
packet will continue the propagation and the blocking
controller will switch to blocage2envoitl state. At this moment, The proposed simulation scenario of the network is
the red packet wich who occupies the second virtual channel presented in the figure lO. It presents the injected packets in
are blocked and the green packet who occupies the first virtual local ports of routers and the routing of these packets to the
channel is in progressive propagation. At the end of local output port. The simulation results show clearly the
transmission of green packet and before the end of routing of the totality of packets without loss and duplication
transmission of the pink packet, the red packet will stay in and with the maximum of the occu pation of the physical with
blocking state and the blocking controller of the east port of the minimum possible time.
router (1, 2) will switch to b10cage2attente1 state. After the end
Performance study of the proposed network on chip
of transmis sion of the pink packet, the red packet is released
and the controller of the west port of the router (1, 2) passes to The router area and speed were estimated by using the
the state b10cage1. finally, at the end of transmission of the red target device XC5VLX30 with package ff676 with speed -2
packet, the blocking controller switch to the idle state. from Xi1inx Corporation. The maximal freque ncy obtained is
Virtual chanrel Router functional validation
162.359 MHz. The number of Slice registers used is 319 and
the number of Slice LUT used is about 1060, and finally the
In order to represent the blocking theory in the network, we number of fully used LUT-FF 282. The minimal router latency
will present the blocking management in the router (1, 2) in measured by simulation for our router is equal to 3 cycles
figure 9. It presents the simulation scenario that explains the which is les or equal to the minima1 1atency result published in
permutation between virtual channels in the chosen scenario. the open literature [15]. Our evaluation metric for analyzing the
proposed router and its associated NoC in this paper aremain1y
jJlhj;
latency and throughput. Transport latency is defined as thei1me
1m
(in clock cycles) that elapses from between the occurrence of a
j �ilI message header injection into the network at the source node
j !MIJ.ilI and the occurrence of a tail flit reception at the destination node
I �.iII [l3]. Depending on the source/destination pair and the rout ing
a��.iII
I -----;:==---------;:==-n
..'::=---�
algorithm, each message may have a different latency.
O\�ro � ----' Therefore, for a given packet Pi, the latency Li is defined as:
----.lLl
_____

j��.m ______________
Li receiving time (tail flit of Pi) - Sending time (header
=

I'�.m
i�'-----
flit of Pi). (3)
I !MIJ.� -- ':� 11�------ f l l-�

I------��--�
j :q,m The average packet latency is used as a performance metric
I �.� in our evaluation. Let F be the total number of packets reaching
jy�.� their destination and let Li be the latency of packet Pi, where I
ranges from 1 to F.
IllMIJ.� �_ � 1 1

Lav�l =
2:F L-
-.

Figure 8. Functional validation of the proposed virtual channel router based ;=1 F
on blocking controller manager

We use the most frequently used traffic pattern for mesh 2D


The router (1, 2) starts the transmission the packet number topology which is the complementary distribution [14]. In this
2 to the north output port. When the packet reach the input port distribution, each source node I send a set of packets to the
west indicated by the bop signal, and after 4 clock cycles of the router J which has the complementary address. We compare
routing of the first flit of packet number 2 to north output port, the proposed virtual channel network on chip with a wormhole
the router (1, 2) detect the blockage of the packet 2. The router input queuing network on chip that use credit based f low
(1, 1) detects the blockage of the packet in the next clock cyre. control. We consider a two 4x4 NoC connected to 16 initiators
This later will change the used virtual channel to send the and 16 receivers distributed alternately. Each traffic generator
packet number 1. The west input port will be not used for two sends 100 packets of 64 flits to its associated target. Then we
clock cycle which is the necessary time to activation the measure network performance by varying the injection rate.
permutation process. The transmission of the packet number 1 From the experimental in figure 11 and figure 12, it is clear that
is detected by the activation of the BOP signal. The router (1, for the same number of packets generated by each source and
2) will route packet number 1 to the output port east. At the for the same size of the packet, the value of latency decrease
time of the blocking of the packet number 1 caused by of the

7
and the value of the throughput increases with the use of the v. CONCLUSION
virtual channels.
•\1;
mJ1il11d1e>.." In this paper a low latency virtual cha nnel router based on a
.1W1�." blocking controller manager has been presented. Functional
mJ10i\d0\q>."
mJlOi\doII<,." validations of the proposed blocking controller, the virtual
mJll.\doIot" chanell router and its associated network on chip have been
.1W1�.ri
done. The pr oposed router has been implemented in FPGA
.1W11iJd1�."
.1N11iJdId!'." virtex 5. A comparative study in terms of latency and throutput
>lNllo\dIol-" between proposed network and input queuqing network and
.lW1lo1W1o>." �------------�
! ------�---------------------
.lW1lo\d>\q>." n prove the benefic of the virtual channel mechanism.
.!N1l<\dIIIi,." 1901131; 5
�" �
'�." UlL- ____ �------------------------- REFERENCES
IINIIIIl:tI�.ri
�'."
ll!ot.OO3
.�.003
.\q>.003
II----:--
,
________________
n I
�nL______�--------------
[1] Phi-Hung Pham Junyoung Song ; Jongsun Park ; Chulwoo Kim ."
Design and Implementation of an On -Chip Permutation Network for
Multiprocessor System-On-Chip. In: IEEE Transaction on Very Large
"'..003 IlO 1 1 1 J 5 6 I B
Scale Integration (VL:'I) Systems, vol. 21, pp. 173-177, Januuary 2013.
1i!o..0>l
j�.blt1 [2] Wen-Chung Tsai, Ying -Cherng Lan, Yu -Hen Hu, and Sao -Jie Chen,
.\q>.blt1
"Networks on Chips: Structure and Design Methodologies," Journal of
.'Sio"OlIl Electrical and Computer Engineering, vol. 2012, Article ID 509465, 15
1i!o..013
j�.blt1 pages, 2012. doi: 10.1155/2012/ 509465
______________________________ �n�______=
.\q>.013
,�bl3 �=================:jf�ITlITlfiLO Srr
T f'
:t [3] Khan, G.N.; Tino, A.; , "Synthesis of NoC Interconnects for Multi -core
Architectures," Complex, Intelligent and Software Intensive Systems
(CISIS), 2012 Sixth International Conference on, pp.432 -437, 4-6 July
Figure 9. Functional validation of the proposed network on chip 2012.

[4] Wang, D.; Lo, c.; Vasil jevic, J.; Emight Jerger, N.; Steffan, 1.; ,
This result is compatible with what we want to prevent. The "DART: A Programmable Architecture for NoC Simulation on FPGAs,"

use of the virtual channels makes it possible to decrease the Computers, IEEE Transactions on, vol.PP, no.99, pp.1, 2012.

time of occupation of the physical channel by blocked packets. [5] Dally, W. 1. "Virtual -Channel Flow Control". In: 17th International
Symposium on Computer Architecture, 1990, pp. 6068.
A light reduction in the value of load of the point of saturation
[6] Andrew A. Chien, "A Cost and Speed Model for k-ary ncube Wormhole
from 49.61% to 44.44% appears because of the increase in the
Routers", IEEE Transactions of Parallel and Distributed Systems, vol. 9,
minimal latency time. To refine this value, it is recommended no. 2, February 1998.
to increase the number of virtual channels used in the router. [7] Jose Duato and Pedro Lopez, "Performance Evaluation of Ada ptive
Cornplernent traffic Routing Algorithms for k -ary n ·cubes", In Proceedings of Parallel
Computer Routing and Communication Workshop, May 1994.

. / [8] D. R. Miller and W. A. Najjar, "Empirical Evaluation of Detel1l1inistic


and Adaptive Routing with Constant ·Area Routers", In Procee dings of
J 4000 f-.............+...... .......+...............j...............+..............; ............. ... .. .,tL.::::: International Conference on Parallel Architectures and Compilation
I/,
.. ................. i .. i
Techniques, San Francisco, November 1997.
�3000'
:§ �.. [9] Li-Shiuan Peh and William 1. Dally, "Fli�Reservation Flow Control", In
Proceedings of 6th International Symposium on High ·Perfol1l1ance
Computer Architecture, Toulouse, January 2000.

so 60 70 [10] B. Attia, W. Chouchene, A. Zitouni, A. Nouredin and R. Tourki, " A


Injection rate (%)
modulair router architecture design for network on Chip", in SSD 2011,
pp. 1-6.
Figure 10. Avearage Packets latency undee complemrnt traffic
[II] R. Mullins, A . West and S . Moore, "Low·Latency Virtual ·Channel
Routers for On -Chip Networks",i n proceding ofinternational
Complement traffic
Symposium in Computer architecture, juin2004, pp. 1 88·197.

[12] M. Galles. Scalable Pipelined Interconnect for Distributed Endpoint


Routing: The SGI SPIDER Chip. In Proceedings of Hot Interconnects
120 Symposium IV, 1996.

C 15 L \.
...... , ....... ,............��............. , ....... ! ....... !............. ! ..........� [13] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Perfol1l1ance

i o evaluation and design trade ·offs for network ·on·chip interconnect


�, � '
.
........ .. + + oIt[\
.........i� \� ; + ; ............. ; -j architectures," IEEE Trans. Computer, Vol. 548, pp.1025-1040, AUb'llst
2005.

"'I. [14] J. Duato, S. Yalamanchili, a nd L. Ni, "Interconnection networks: an

�lO--��--�---�5O� �6bo==�7tO::������ Injection rate(%)


engineering approach. Mlrgan Kaufmann Publishers, 2003.

[15] Erno Salminen, Ari Kulmala, and Timo D. H"am"al"ainen "Survey of


Network-on-chip Proposals" white paper OCP-IP march 2008.
Figure II. Throughput under complement traffic

You might also like