Applsci 14 02905 v2
Applsci 14 02905 v2
Applsci 14 02905 v2
sciences
Article
A Deep Reinforcement Learning Floorplanning Algorithm Based
on Sequence Pairs †
Shenglu Yu 1,2 , Shimin Du 1, * and Chang Yang 1,2
1 Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China;
2111082190@nbu.edu.cn (S.Y.); 2311170013@nbu.edu.cn (C.Y.)
2 College of Science & Technology, Ningbo University, Ningbo 315300, China
* Correspondence: dushimin@nbu.edu.cn
† This manuscript is an extended version of the conference paper titled Yu, S.; Du, S. VLSI Floorplanning
Algorithm Based on Reinforcement Learning with Obstacles. In Proceedings of the Biologically Inspired
Cognitive Architectures 2023—BICA 2023, Ningbo, China, 13–15 October 2023; Springer Nature: Cham,
Switzerland, 2023; pp. 1034–1043.
Abstract: In integrated circuit (IC) design, floorplanning is an important stage in obtaining the
floorplan of the circuit to be designed. Floorplanning determines the performance, size, yield, and
reliability of very large-scale integration circuit (VLSI) ICs. The results obtained in this step are
necessary for the subsequent continuous processes of chip design. From a computational perspective,
VLSI floorplanning is an NP-hard problem, making it difficult to be efficiently solved by classical
optimization techniques. In this paper, we propose a deep reinforcement learning floorplanning
algorithm based on sequence pairs (SP) to address the placement problem. Reinforcement learning
utilizes an agent to explore the search space in sequence pairs to find the optimal solution. Experi-
mental results on the international standard test circuit benchmarks, MCNC and GSRC, demonstrate
that the proposed deep reinforcement learning floorplanning algorithm based on sequence pairs can
produce a superior solution.
Keywords: VLSI; floorplanning; sequence pair; deep reinforcement learning; MCNC; GSRC
For floorplan graphs with slicing structures [8], binary trees are widely used, where leaves
correspond to blocks and internal nodes define the vertical or horizontal merge operations
of their respective descendants. For more general non-slicing floorplan representations, sev-
eral effective forms have been developed, including sequence pairs (SP) [9], the bounded
slicing grid (BSG) [10], O-trees [11], transitive closure graphs with packed sequences
(TCG-S) [12], and B*-trees [13]. Among these, the representation of block placement with
sequence pairs, which uses positive and negative sequences to represent the geometric
relationships between any two modules, has been extended in subsequent work to handle
obstacles [14], soft modules, rectilinear blocks, and analog floorplans [15–18]. The decod-
ing time complexity of the sequence pair representation is O(N2 ). In order to reduce the
decoding complexity, Tang et al. [19]. utilized the longest common subsequence algorithm
to decrease the decoding complexity to O(NlogN). Subsequently, Tang and Wong [20]
proposed an enhanced Fast Sequence Pair (FSP) algorithm, further reducing the decoding
time complexity to O(NloglogN). Another category involves the study of floorplanning
algorithms. By employing suitable planar graph representations and/or efficient perturba-
tion methods, high-quality floorplans can be achieved through linear programming [21] or
some metaheuristic methods such as simulated annealing (SA) [22,23], genetic algorithms
(GA) [24,25], memetic algorithms (MA) [26], and ant colony optimization [27].
Despite decades of research on VLSI floorplanning problems, the existing studies
indicate that current EDA floorplan tools still struggle to achieve a floorplan close to opti-
mal. These tools continue to face numerous limitations, making it challenging to obtain
satisfactory design outcomes. Existing floorplan tools generally require long runtimes and
experienced experts to spend weeks designing integrated circuit floorplans. Furthermore,
these tools have a limited scalability and often require a time-consuming redesign when
faced with new problems or different constraints. Reinforcement learning (RL) [28] pro-
vides a promising direction to address these challenges. Reinforcement learning possesses
autonomy and generalization capabilities, allowing the agent in reinforcement learning,
through interactions with the environment, to automatically extract knowledge about the
space it operates in. In addition to breakthroughs in gaming [29] and robot control [30],
reinforcement learning has been applied to solve combinatorial optimization problems.
Ref. [31] proposed deep reinforcement learning (DRL) for solving the Traveling Salesman
Problem (TSP). Moreover, significant progress has been made in the application of reinforce-
ment learning to task scheduling [32], vehicle routing problems [33], graph coloring [34],
and more. Recently, integrating reinforcement learning into electronic design automation
(EDA) has become a trend. For example, the Google team [35] formulated macro-module
placement as a reinforcement learning problem and trained an agent using reinforcement
learning algorithms to place macro-modules on chips. He et al. [36] utilized the Q-learning
algorithm to train an agent that selects the best neighboring solution at each search step.
Cheng et al. [37] introduced cooperative learning to address floorplan and routing problems
in chip design. Agnesina et al. [38] proposed a deep reinforcement learning method for
VLSI placement parameter optimization. Vashisht et al. [39] utilized iterative reinforcement
learning combined with simulated annealing to place modules. Xu et al. [40] employed
graph convolutional networks and reinforcement learning methods for floorplanning under
fixed-outline constraints.
This paper proposes a deep reinforcement learning-based floorplanning algorithm
utilizing sequence pairs for the floorplanning problem. The algorithm aims to optimize
the area and wirelength of the floorplan. To evaluate the effectiveness of our algorithm,
we conduct experiments on the internationally recognized benchmark circuits MCNC
and GSRC, comparing our approach with simulated annealing and the deep Q-learning
algorithm proposed by He et al. [36]. In terms of dead space on the MCNC benchmark
circuits, our algorithm outperforms simulated annealing and the literature [36] by an
average improvement of 2.7% and 1.1%, respectively. Additionally, concerning wirelength,
our algorithm shows an average improvement of 9.1% compared to simulated annealing.
On the GSRC benchmark circuits, our algorithm demonstrates an average improvement of
Appl. Sci. 2024, 14, 2905 3 of 14
7.0% and 3.7% in dead space to simulated annealing and the literature [36], respectively.
Furthermore, for wirelength, our algorithm exhibits an average improvement of 8.8% over
simulated annealing. These results validate the superior performance and robustness of
our algorithm in handling ultra-large-scale circuit designs.
We employ the widely used Half-Perimeter Wirelength (HPWL) model [41] as the
method to estimate the total wirelength, which is defined as follows:
!
m
W= ∑ max xi − x j + max yi − y j
bi ,b j ∈ni bi ,b j ∈ni
(2)
i =1
Based on the optimization objective defined by the minimum rectangle area A and the
wirelength W, the formulation is as follows:
F = αA + βW (3)
Among these, F is a feasible floorplan diagram, indicating the weighted sum of the
total area A and the total wirelength W. The coefficients α and β are weight factors ranging
from 0 to 1.
order is inconsistent between Г+ and Г−. For example, in a given pair of sequences (Г+, Г−),
Appl. Sci. 2024, 14, 2905 there are four possible positional relationships between any two modules, bi and bj: 4 of 14
(1) If bi is positioned before bj in Г+, i.e., <....bi....bj....>, and bi is also positioned before bj in
Г−, i.e., <....bi....bj....>, it indicates that bi is located on the left side of bj.
(2) If b(2) If bj is positioned
j is positioned before bbefore i in Г+, b i in<....b
i.e., Г+ , i.e.,
j....bi<....b
....>, andj ....bib ....>, bj is also positioned
andpositioned
j is also before bi inbefore bi
in Г−
Г−, i.e., <....b , i.e.,
j....b <....b
i....>, j ....bi ....>, that
it indicates it indicates
bi is located that bon i isthe locatedright on
side the bj. side of bj .
ofright
(3) If (3) If bi is positioned
bi is positioned before before bj in Г+b, ji.e.,
in Г<....b
+ , i.e., <....b
i....b i ....b
j....>, andj ....>,
bj isand bj is positioned
positioned bi in Г−b, i in Г− ,
before before
i.e., <....b ....b ....>,
i.e., <....bj....bi....>,j it indicates
i it indicates that
that bi is located b is located
i above bj. above bj .
(4) If (4) If bj is positioned
bj is positioned before before bi in Г+b, ii.e.,
in Г<....b
+ , i.e., <....b
j....b j ....b
i....>, andi ....>,
bi isand bi is positioned
positioned bj in Г−b, j in Г− ,
before before
i.e., <....bi.e., <....b
i....b j....>,i ....b ....>,
it indicates
j it indicates
that bi is that
locatedb i is located
below b j below
. b j .
As an example,
As an example, Figure Figure
1 shows1 shows an inclined
an inclined grid representing
grid representing the relative
the relative positionspositions
between
between modules
modules in a sequence
in a sequence pair (Г+pair
, Г−) (Г + , Г−
= (<4, 3,)1,
= 6,
(<4, 3, 1,
2, 5) and6, 2, 5) 3,
(<6, and (<6,
5, 4, 3, 5, 4, 1, 2>).
1, 2>).
4 4
1
2 1 2
3 3
4 2
3 1
1 6 5 4
6 5 6 5
2 3
5 6
(a) (b)
Figure 1. (a) displays
Figure an inclined
1. (a) displays an grid representing
inclined the relative
grid representing thepositions
relative between
positionsmodules
betweeninmodules
a se- in a
quence sequence
pair (Г+, Г−pair
) = (<4, 3, 1, 6, 2, 5) and (<6, 3, 5, 4, 1, 2>); (b) corresponds to the floorplan of the
(Г+ , Г− ) = (<4, 3, 1, 6, 2, 5) and (<6, 3, 5, 4, 1, 2>); (b) corresponds to the floorplan of
sequence pair. Each module has the following dimensions: 1 (4 × 6), 2 (3 × 7), 3 (3 × 3), 4 (2 × 3), 5 (4
the sequence pair. Each module has the following dimensions: 1 (4 × 6), 2 (3 × 7), 3 (3 × 3), 4 (2 × 3),
× 3), 6 (6 × 4).
5 (4 × 3), 6 (6 × 4).
T
T
4 4 4 4 4 4
1 1 1 1 1 4
4
4 2 4
4 2 2 2 2 1
S 3 1 1
1S 3 T S 3 3 1 T 3 T 3
2 2 S 3
S 3 2 T 3 2
S 3 6 5 6T 35 6 6 5 5 6 5 6
6 5
6 5 6 5
6 5 6 5
S S
(a) (a) (b) (a) (b)
S (a)
Figure 2. (a) represents the
Figure
horizontal
2. (a) represents
constraint
Figure
graph
the horizontal
2.and S the horizontal
(a) represents
(b) represents
constraint the
graph
vertical
and constraint
(b)
constraint
represents
graphtheand
vertical
(b) r
graph. (a) (b)
(a)graph. graph. Figure 2. (a) represents the horiz
(b)
graph.
Figure 2. (a) represents the horizontal constraint graph and (b) represents the vertical constraint
Figure 2. (a)
Figure
So, 2. represents
by the horizontal
(a) represents
constructing the byconstraint
So,horizontal
horizontal graph
andconstraint
constructing
vertical and (b)
graph represents
So,horizontalandand
constraint
by constructing
graphs the vertical
(b) vertical
represents
horizontal
and constraint
the vertical
constraint
calculating
and con- and
vertical
graphs
the constra
cal
graph.
graph.
straint path
graph.lengths for
longest longest
both directions,
path lengths
we
longest
for
canboth
determine
pathdirections,
lengthsthefor
width
weboth
canand So, by of
directions,
determine
height constructing
we
thethe
can
width horh
determi
and
minimumSo, bybounding
constructing horizontal
rectangle
minimum of the and vertical
bounding
floorplan.minimum
rectangle constraint
Subsequently,
bounding graphs
of the floorplan.
floorplanning
rectangle and ofcalculating
longest
Subsequently,
the path lengths
is performed
floorplan. theSubsequen
floorplanning for both i
So,So,byby constructing
constructing horizontal
horizontal and and vertical
vertical constraint
constraint graphs
graphs and and calculating
calculating thethe
longest
for path
the pair oflengths
sequences, for
forboth
the
therebydirections,
pair determining
of sequences,we
for canthethe determine
pair
thereby
size the
of sequences,
and
determining width
position and
minimum
thereby
of
thethe height
size ofposition
bounding
determining
non-slicing
and the the rectangl
of
size
thea
longest
longest path
path lengths
lengths forforbothboth directions,
directions, wewe cancan determine
determine thethe widthwidth and and height
height of of thethe
minimum
plane graph. bounding rectangle plane graph. of the floorplan. planeSubsequently,
graph. floorplanning for the is performed
pair of sequences, th
minimum
minimum bounding
bounding rectangle
rectangle of the
of the floorplan.
floorplan. Subsequently,
Subsequently, floorplanning
floorplanning is performed
is performed
for the pair of sequences, thereby determining the size and position plane of thegraph. non-slicing
forfor
thethe pair of of
pair sequences,
sequences, thereby
thereby determining
determining thethe sizesizeand and position
position of of thethe non-slicing
non-slicing
4. Reinforcement
plane graph. Learning4. Reinforcement Learning 4. Reinforcement Learning
plane
planegraph.
graph.
Reinforcement learningReinforcement is a machine learning Reinforcement
approach
is a machine learning
aimed at4.islearning
learning Reinforcement
a approach
machine howlearningtoLearning
aimed atappro
lear
4. Reinforcement
make decisions
4. Reinforcement to Learning
achieve make
Learning specific
decisions goals to through
achieve
make decisions
interaction
specific goals
to achieve
with through
the specific
environment. Reinforcement
interaction
goals through
with
In re- the learning
interacti
environ
4. Reinforcement Learning
Reinforcement
inforcement learning,
Reinforcement learning
an
inforcement
agent
learning is observes
a machine
a learning, the learning
inforcement
state
an agent approach
of the observes
learning,
environment, aimed
the
an at
agent
state makelearning
selects
of decisions
observes
the how
appropriate
environment,
the to achieve
to
state the sp
ofselectsen
Reinforcement learning is ais machine
machine learning
learning approach
approach aimedaimed at at learning
learning how how to to
make
actions,
make decisions
and
decisions to achieve
continuously
to achieve specific
actions,
optimizes and
specific goals goalsthrough
continuously
its strategy
actions,
through interaction
based
optimizes
and on with
continuously
interaction its
feedback the
strategy environment.
inforcement
optimizes
from based
the on
its In
strategy
environment
feedback re-
learning, based
from an age
on
the e
make decisions to achieve specific goals through interaction withwith the environment.
the environment. In re- In
inforcement
regarding its
reinforcement learning,
actions.
learning, an
This agent
regarding
feedback
an agent observes
its is
actions.
observes the
typically state
regarding
This of
provided the
feedback
its environment,
actions.
in is
thetypically
form
This selects
actions,
feedback
of provided
rewards appropriate
isand
[42,43]continuously
typically
in the form
or provided
of op
rewar i
inforcement learning, an agent observes the the state state
of the of the environment,
environment, selects
selects appropriate
appropriate
actions,
penalties,
actions, andand
and continuously
the agent’s
continuously optimizes
penalties,
objective
optimizesandits isthestrategy
to
its learn based
penalties,
agent’s
strategy the onon
objective
optimal
and
based feedback
the isstrategy
agent’s
to learn
feedback from the
regarding
objective
bythe
from environment
maximizing
optimal
the is toitsstrategy
environment actions.
the theThis
learn by fee
optim
max
actions, and continuously optimizes its strategy based on feedback from the environment
regarding
long-term
regarding its actions.
cumulative This feedback
reward.
long-term is typically
cumulative long-term provided
reward. in
cumulative the form
reward. of rewards
penalties, [42,43]
and or
the agent’s ob
regarding its its actions.
actions. ThisThis feedback
feedback is typically
is typically providedprovided in the inform
the formof rewardsof rewards [42,43] [42,43]
or
penalties,
or Almost and
penalties, allthe
and agent’s
reinforcement
the agent’s objective
Almostlearningall
objective is reinforcement
to learn
satisfies the
the optimal
Almost framework
learning
all strategy
reinforcement
satisfies
of Markov by the maximizing
long-term
learning
framework
Decision cumulative
satisfiesthe
Pro- thereward
of Markov frame D
penalties, and the agent’s objective is toislearn to learnthe the optimal
optimal strategy
strategy by by maximizing
maximizing thethe
long-term
cesses
long-term cumulative
(MDPs). A typical
cumulative reward.
cesses
MDP,
reward. (MDPs).
as shown A typical
cesses
in Figure (MDPs).
MDP, 3, consistsA typical
as shown of four
in Figure
MDP, 3,Almost
key elements:
as shown
consistsin all reinforcemen
ofFigure
four key 3, cons
elem
long-term cumulative reward.
Almost
(1) AlmostAlmost
States all
S:all reinforcement
a all reinforcement
finite set(1) learning
learning
of environmental
States S: a finite satisfies
satisfies
(1)
states. the
setStates
of theframework
S:framework
environmental
a finite set of ofMarkov cesses
Markov Decision
(MDPs).
Decision Pro-APro-
typical MD
reinforcement learning satisfies the framework ofstates.environmental
Markov Decision states.
Pro-
cesses
(2)cesses (MDPs).
(MDPs). A typical
A typical MDP,MDP, as shown in Figure 3, consists of four key elements:
cesses Actions
(MDPs). A:Aa typical
finite (2)
set of
MDP, Actionsas as
actions shown
A:
shown taken
a finite
in(2)in
by Figure
the
set
FigureActions
of 3, consists
reinforcement
3,actions
A:
consists of of
a finite
taken four
learning
set
by
four of
the
key key
(1)
actions elements:
States
reinforcement
agent.
elements: taken S: aby finite
learningset ofage
the reinforc en
(1)
(3)(1) States
State S: a finite
transition set
modelof
(3) environmental
State
P (s, a,
transition
s′): states.
representing
(3) modelState P the
transition
(s, a,
probability
s′): representing
model of P (2)(s, a,Actions
the
transitionings′): A:
probability
representing
from a finite
of set
the of
pr
transit
(1) States S: aS:finite
States a finiteset set of environmental
of environmental states.
states.
(2)(2) Actions
state s Є A:
S a
to finite
the set
next of actions
state,
state s′
s Є taken
S,
S to
whenthe by the
next
action
state reinforcement
state,
Actions A: a finite set of actions taken by the reinforcement learning agent. a
s Є s′
A
S Єto
is S,
taken.
the when learning
next (3)
action
state, agent.
s′
a State
Є S,
A istransition
when taken. action model
a Є A P
(2) Actions A: a finite set of actions taken ′ by the reinforcement learning agent.
(3)
(4)(3) State
Reward transition
State functionmodel
transition R(4)
model P (s,
(s, a):
Reward
P a, a,
(s, s′):
representing s representing
function (4)the RReward
): representing (s, a):the
numerical probability
representing
function
the reward
probability R for
(s,ofa):
the transitioning
taking
numerical
of state
representing
action
transitioning Єafrom
sreward SЄthe
toAthe
from for next
numerical
taking sta
(3) State transition model P (s, a, s′): representing the probability of transitioning from
state
in s Єs SЄto
state S the
S. This
to next
the state,
reward
next instate,
state
can s′ S,
s′ Єsbe when
Єpositive,
S.
S, This
when action
reward
in
negative,
state
action a Єcan
asAЄoris
S.
A taken.
bezero.
This
ispositive,
reward
taken. (4)canReward
negative, be orpositive,function
zero. R (s, a)
negative, o
state s Є S to the next state, s′ Є S, when action a Є A is taken.
(4)(4) Reward
Reward function
function RR (s,(s,
a):a):representing
representingthe thenumerical
numericalreward reward for for taking
takinginaction state as Є Є S.AA inThis rewar
(4) Reward function R (s, a): representing the numerical reward for taking action a Є A
in state s Є S. This reward can be positive, negative, or zero.
in state s Є S. This reward can be positive, negative, or zero.
The goal of an MDP is to find a policy π that maximizes the total accumulated
numerical reward. The expression for the total cumulative reward is as follows:
∞
Rt = ∑ γt rt (4)
t
4
Appl. Sci. 2024, 14, 2905 1 6 of 14
2
S 3 T
where γ represents the reward discount factor, t denotes the time step, and r represents the
reward value at time step t. The state value function Vπ (s) in an MDP is defined as the
6
expected reward value of state s under policy π, as defined in Equation (5). 5
∞
" #
Appl. Sci. 2024, 14, 2905 6 of 15
Vπ (s) = Eπ [ Rt |st = s]= Eπ ∑ γt rt st = s (5)
t
(a)
Figure 2. (a) represents the horizontal constraint graph
graph.
4. Reinforcement Learning
Figure3.3.AAtypical
Figure typicalframework
frameworkfor
forMDPs.
MDPs.
Reinforcement learning is a machine learnin
In this context,
The goal of an MDP E represents the expected value
make
π is to find a policy π that maximizes of the reward
decisions thetototalfunction
achieve under goals
specific
accumulated policy
nu- through i
π. Similarly, the state–action value function
merical reward. The expression for the total cumulative Q (s, a) is
π inforcement the
reward expected
learning, reward value when
an agent observes the state
is as follows:
action a is taken in state s under policy π, defined as follows:
actions, and continuously optimizes its strategy b
∞
Rt = γregarding
rt ∞ its actions. This#feedback is(4)
t "
penalties, and the agent’s objective is
typically pr
to learn th
Qπ (s, a) = Eπ [ Rt |st = s, at = at]= Eπ ∑ γ rt st = s, at = a
t
(6)
long-term
t cumulative reward.
where γ represents the reward discount factor, t denotes Almost the time step, and r represents
all reinforcement learning satisfies th
the reward value at time step t. The state value
4.1. The MDP Framework for Solving Floorplanningcesses function
ProblemsV
(MDPs). A typicalisMDP,
π (s) in an MDP defined as as the in Figur
shown
expected reward value of state s under policy π,
In floorplanning problems, the agent in reinforcementas defined in Equation (5).
(1) States S: alearning finite setinteracts with the states.
of environmental
environment by selecting a perturbation to iteratively ∞
generate
(2) Actions A: anew finitefloorplan solutions.
set of actions taken by the
Vπ (s) =theEπtotal st =and
[Rt |area = E π [ γ t rt | s t = s ]
s] total (5)
The objective is to minimize wirelength,
(3) State t transition which modelserveP as
(s, rewards
a, s′): representin
to encourage the agent to learn better strategies and ultimately state s Є S to find thean optimal
next state,floorplan
s′ Є S, when actio
In thisTo
solution. context,
exploreEπbetter
represents the expected
floorplan solutions,value
the of the reward function under policy
(4)following
RewardMDP function is defined:
R (s, a): representing the nu
π. Similarly, the state–action value function Qπ (s, a) is the expected reward value when
(1) State space S: for the floorplanning problem, a instate
state ss Є S. This rewarda can
S represents be positive, ne
floorplan
action a is taken in state s under policy π, defined as follows:
solution, including a complete sequence of gates (Г+ , Г− ), and the orientation of each
∞
module.
(2) ActionQ π (s, a
space =neighboring
A:) A Eπ [Rt |st =solution
s, at = ofa]a floorplan
= E π [ t
γ t rt | s t = s , a t = a ]
is generated by predefined (6)
perturbations in the action space. The following five perturbations are defined:
The reward r in this context refers to a local reward, representing the reward value ob-
tained when the current floorplan transitions from state s to state s′ through a perturbation.
Here, F represents the optimization objective function defined in Equation (3).
θ ← θ + α∇θ J (θ ) (9)
α represents the learning rate. By taking the derivative of Formula (8), Formula (10) is
derived, which will be utilized to update the values of the θ parameters. The definition of
this formula is as follows:
This is the calculation of the expected trajectory τ obtained by sampling the policy π θ .
R(τ) represents the reward accumulated over a single episode.
To train the policy network, the deep reinforcement learning algorithm is employed
as shown in Algorithm 1, At each step of the episode, the policy network will predict a
probability distribution assigned to the different actions available in the environment given
the state description. The set of state, action, reward and next state in each episode are
recorded. The set of discounted rewards are employed to calculate the gradient and update
the weights of the policy network.
Algorithm 1: Deep Reinforcement Learning Algorithm
Input: number of episodes, number of steps
Output: Policy π
1: Initialize θ (policy network weights) randomly
2: for e in episodes do
3: for s in steps do
4: Perform an action as predicted by the policy network
5: Record s, a, r, s′
6: Calculate the gradient as per Equation (10)
7: end
8: Update θ as per Equation (9)
9: end
Appl. Sci. 2024, 14, 2905 8 of 14
After numerous experiments for parameter tuning and optimization, the hyperparam-
eters settings of the algorithm in this paper are presented in Table 1.
5. Experimental Results
5.1. Experimental Environment and Test Data
The experiment was conducted on a computer with a 12th Gen Intel(R) Core(TM)
(Intel(R), Santa Clara, CA, USA) i5-12500 3.00 GHz CPU and 16.00 GB of RAM memory. The
algorithm we proposed was implemented in Python3.8 using the PyTorch library [44], with
the Adam optimizer [45] applied to train the neural network model. During training, to
save time and prevent overfitting, we employed a stopping mechanism where the training
process would halt if no better solution was found in the final 50 steps. For the simulated
annealing algorithm, we adjusted its parameters through multiple experiments and selected
the best parameter set based on the floorplan results.
Our algorithm was tested on two standard test circuit sets, MCNC and GSRC, and
compared with the simulated annealing algorithm and the deep Q-learning algorithm
proposed by He et al. [36]. The MCNC benchmark comprises five hard modules and five
soft modules, while the GSRC benchmark consists of six hard modules and six soft modules.
Hard modules allow rotation but cannot change their shape, whereas soft modules provide
area and aspect ratio, allowing multiple shapes. In this experiment, our test circuits
consisted of fixed-size hard modules. The basic information about the three test circuits in
the MCNC and GSRC test circuit sets is shown in Tables 2 and 3, respectively.
Table 5 presents the experimental results comparison for three GSRC test circuits,
revealing that the proposed algorithm outperforms the simulated annealing algorithm
and Reference [36] in terms of floorplan area and DS and is also superior to the simulated
annealing algorithm in wirelength. In comparison with the simulated annealing algorithm,
the proposed algorithm achieves an average improvement of 7.0% in DS and 8.8% in
wirelength. Furthermore, when compared to Reference [36], the proposed algorithm
obtains an average improvement of 3.7% in DS. From both tables, it can be observed
that, as the size of the test circuits increases, the DS for all three methods also increases,
indicating an increased difficulty in floorplan placement. However, the proposed algorithm
demonstrates a further improvement in performance compared to the simulated annealing
algorithm and Reference [36] for large-scale circuits, offering more pronounced advantages
in floorplan area and wirelength optimization.
5.2.2. Experimental Results of MCNC and GSRC Test Sets with Obstacles
The optimization of area and wirelength for MCNC benchmark circuits was con-
ducted, and the experimental results are presented in Table 6. Since the fixed module
placement constraint was employed, the proposed algorithm is only compared with the
simulated annealing algorithm in this context. It can be observed from the table that, for
all five test circuits, the proposed algorithm achieves a smaller floorplan area, DS, and
wirelength than the simulated annealing algorithm. The proposed algorithm demonstrates
an average improvement of 9.2% in wirelength and 3.4% in DS compared to the simulated
annealing algorithm. Therefore, regarding the MCNC benchmark circuits under the fixed
placement constraint of three modules, the proposed algorithm exhibits more significant
advantages in optimizing the floorplan area and wirelength compared to the simulated
annealing algorithm.
Table 7 compares the experimental results of three GSRC benchmark circuits, under
the constraint of fixed module placement. It can be observed that the proposed algorithm
in this paper outperforms the simulated annealing algorithm in terms of planar floorplan
area, wirelength, and DS. Compared to the simulated annealing algorithm, the proposed
algorithm achieves an average improvement of 11.2% in wirelength and 8.5% in DS. From
Tables 5 and 6, it can be seen that, as the size of the test circuits increases, both methods
experience an increase in wirelength and DS, making the floorplan and routing more chal-
lenging. However, under the constraint of pre-placing three fixed modules, the proposed
algorithm in this paper demonstrates superior performance compared to the simulated an-
nealing algorithm in large-scale circuit testing. Lastly, comparing the experimental results
of pre-placing fixed modules in the MCNC and GSRC benchmark circuits, we can observe
that the former outperforms the latter, indicating that pre-placing fixed modules makes the
floorplanning problem more complex and challenging.
(a) (b)
Appl. Sci. 2024, 14, 2905 Figure 4.Figure
the floorplans generatedgenerated
by the algorithm proposed proposed
in this paper (a) and the(a) 12 of 15
simulated
4. The floorplans by the algorithm in this paper and the simulated
annealing algorithm
annealing (b) for the
algorithm (b)ami49
for thetest circuit.
ami49 test circuit.
(a) (b)
Figure 5.Figure
the floorplan
5. Thegenerated
floorplan by the algorithm
generated by theproposed
algorithminproposed
this paperin(a) and
this the simulated
paper (a) and the simulated
annealingannealing
algorithmalgorithm
(b) for the n100 test circuit.
(b) for the n100 test circuit.
5.3.2. Visualization
5.3.2. Visualization of MCNC of and
MCNCGSRC andCircuit
GSRCFloorplan
Circuit Floorplan with Obstacles
with obstacles
Comparison
Comparison of MCNC of circuit
MCNCfloorplan
circuit floorplan results
results with with obstacles
obstacles are shownareinshown
Figurein6.Figure 6.
The DS of the planar floorplan generated by the algorithm in this paper
The DS of the planar floorplan generated by the algorithm in this paper is only 9.0%, while is only 9.0%,
the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC circuit floor- circuit
while the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC
floorplan
plan results results with
with obstacles areobstacles
shown inare shown
Figure 7. in
The Figure
DS of7.the
Thefloorplan
DS of thegenerated
floorplanbygenerated
by theinalgorithm
the algorithm this paperinisthis
onlypaper
9.1%,iswhereas
only 9.1%,
the whereas
DS of thethe DS of the
simulated simulated
annealing annealing
algo-
rithm is algorithm
19.3%. Thisisdemonstrates
19.3%. This demonstrates
the effectivenesstheofeffectiveness
the algorithm ofproposed
the algorithm
in thisproposed
pa- in
per. this paper.
(a) (b)
Comparison of MCNC circuit floorplan results with obstacles are shown in Figure 6.
The DS of the planar floorplan generated by the algorithm in this paper is only 9.0%, while
The DS of the planar floorplan generated by the algorithm in this paper is only 9.0%, while
the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC circuit floor-
the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC circuit floor-
plan results with obstacles are shown in Figure 7. The DS of the floorplan generated by
plan results with obstacles are shown in Figure 7. The DS of the floorplan generated by
the algorithm in this paper is only 9.1%, whereas the DS of the simulated annealing algo-
Appl. Sci. 2024, 14, 2905
the algorithm in this paper is only 9.1%, whereas the DS of the simulated annealing algo-
rithm is 19.3%. This demonstrates the effectiveness of the algorithm proposed in this pa- 12 of 14
rithm is 19.3%. This demonstrates the effectiveness of the algorithm proposed in this pa-
per.
per.
(a) (b)
(a) (b)
Figure 6. Figure
the floorplans of the ami49
6. The floorplans test
of the circuit
ami49 generated
test by the algorithm
circuit generated proposed
by the algorithm in this paper
proposed in this paper (a)
Figure 6. the floorplans of the ami49 test circuit generated by the algorithm proposed in this paper
(a) and the simulated
and annealing
the simulated algorithm
annealing (b).
algorithm
(a) and the simulated annealing algorithm (b). (b).
(a) (b)
(a) (b)
Figure 7. The floorplans of the n100 test circuit generated by the algorithm proposed in this paper (a)
and the simulated annealing algorithm (b).
6. Conclusions
In this paper, we investigate the floorplanning problem in the integrated circuit design
flow and propose a sequence pair-based deep reinforcement learning floorplanning algo-
rithm. Experimental results on the MCNC and GSRC benchmark circuit sets demonstrate
that our algorithm outperforms the deep Q-learning algorithm and the simulated annealing
algorithm in terms of both DS and wirelength. Moreover, as the circuit size increases
and the difficulty of the floorplan and wiring grows, the advantages of our algorithm
become more pronounced. In recent years, machine learning-based methods have been
increasingly applied in the EDA field. However, the algorithm in this paper also has some
limitations, such as the long optimization time of the algorithm. Next, we aim to explore
novel approaches, such as graph neural networks, within deep learning algorithms to
address floorplanning problems. This integration may potentially enhance the intelligence
and precision of algorithms, thereby significantly improving the quality of the floorplan
optimization results.
Author Contributions: Conceptualization, S.Y.; methodology, S.Y.; software, S.Y.; validation, C.Y.;
formal analysis, C.Y.; investigation, S.D.; data curation, S.D.; writing—original draft preparation, S.Y.;
writing—review and editing, S.D.; visualization, S.Y.; project administration, S.D. All authors have
read and agreed to the published version of the manuscript.
Funding: This work was financially supported by the National Natural Science Foundation of
China (grant no. 61871244, 61874078, 62134002), the Fundamental Research Funds for the Provincial
Universities of Zhejiang (grant no. SJLY2020015), the S&T Plan of Ningbo Science and Technology
Appl. Sci. 2024, 14, 2905 13 of 14
Department (grant no. 202002N3134), and the K. C. Wong Magna Fund in Ningbo University
of Science.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to further study.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Fleetwood, D.M. Evolution of total ionizing dose effects in MOS devices with Moore’s law scaling. IEEE Trans. Nucl. Sci. 2017, 65,
1465–1481. [CrossRef]
2. Wang, L.T.; Chang, Y.W.; Cheng, K.T. (Eds.) Electronic Design Automation: Synthesis, Verification, and Test; Morgan Kaufmann: San
Francisco, CA, USA, 2009.
3. Sherwani, N.A. Algorithms for VLSI Physical Design Automation; Springer Science & Business Media: Berlin/Heidelberg, Germany,
2012.
4. Adya, S.N.; Chaturvedi, S.; Roy, J.A.; Papa, D.A.; Markov, I.L. Unification of partitioning, placement and floorplanning. In
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, ICCAD-2004, San Jose, CA, USA, 7–11
November 2004; pp. 550–557.
5. Markov, I.L.; Hu, J.; Kim, M.C. Progress and challenges in VLSI placement research. In Proceedings of the International Conference
on Computer-Aided Design, San Jose, CA, USA, 5–8 November 2012; pp. 275–282.
6. Gubbi, K.I.; Beheshti-Shirazi, S.A.; Sheaves, T.; Salehi, S.; Pd, S.M.; Rafatirad, S.; Sasan, A.; Homayoun, H. Survey of machine
learning for electronic design automation. In Proceedings of the Great Lakes Symposium on VLSI 2022, Irvine, CA, USA, 6–8
June 2022; pp. 513–518.
7. Garg, S.; Shukla, N.K. A Study of Floorplanning Challenges and Analysis of macro placement approaches in Physical Aware
Synthesis. Int. J. Hybrid Inf. Technol. 2016, 9, 279–290. [CrossRef]
8. Subbulakshmi, N.; Pradeep, M.; Kumar, P.S.; Kumar, M.V.; Rajeswaran, N. Floorplanning for thermal consideration: Slicing with
low power on field programmable gate array. Meas. Sens. 2022, 24, 100491.
9. Tamarana, P.; Kumari, A.K. Floorplanning for optimizing area using sequence pair and hybrid optimization. Multimed. Tools Appl.
2023, 1–23. [CrossRef]
10. Nakatake, S.; Fujiyoshi, K.; Murata, H.; Kajitanic, Y. Module packing based on the BSG-structure and IC layout applications. IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst. 1998, 17, 519–530. [CrossRef]
11. Guo, P.N.; Cheng, C.K.; Yoshimura, T. An O-tree representation of non-slicing floorplan and its applications. In Proceedings of
the 36th annual ACM/IEEE Design Automation Conference, New Orleans, LA, USA, 21–25 June 1999; pp. 268–273.
12. Lin, J.M.; Chang, Y.W. TCG-S: Orthogonal coupling of P*-admissible representation with worst case linear-time packing scheme.
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2004, 23, 968–980. [CrossRef]
13. Chang, Y.C.; Chang, Y.W.; Wu, G.M.; Wu, S.-W. B*-trees: A new representation for non-slicing floorplans. In Proceedings of the
37th Annual Design Automation Conference, Los Angeles, CA, USA, 5–9 June 2000; pp. 458–463.
14. Yu, S.; Du, S. VLSI Floorplanning Algorithm Based on Reinforcement Learning with Obstacles. In Proceedings of the Biologically
Inspired Cognitive Architectures 2023—BICA 2023, Ningbo, China, 13–15 October 2023; Springer Nature: Cham, Switzerland,
2023; pp. 1034–1043.
15. Zou, D.; Wang, G.G.; Sangaiah, A.K.; Kong, X. A memory-based simulated annealing algorithm and a new auxiliary function for
the fixed-outline floorplanning with soft blocks. J. Ambient. Intell. Humaniz. Comput. 2017, 15, 1613–1624. [CrossRef]
16. Liu, J.; Zhong, W.; Jiao, L.; Li, X. Moving block sequence and organizational evolutionary algorithm for general floorplanning
with arbitrarily shaped rectilinear blocks. IEEE Trans. Evol. Comput. 2008, 12, 630–646. [CrossRef]
17. Fischbach, R.; Knechtel, J.; Lienig, J. Utilizing 2D and 3D rectilinear blocks for efficient IP reuse and floorplanning of 3D-integrated
systems. In Proceedings of the 2013 ACM International symposium on Physical Design, Stateline, NV, USA, 24–27 March 2013;
pp. 11–16.
18. Fang, Z.; Han, J.; Wang, H. Deep reinforcement learning assisted reticle floorplanning with rectilinear polygon modules for
multiple-project wafer. Integration 2023, 91, 144–152. [CrossRef]
19. Tang, X.; Tian, R.; Wong, D.F. Fast evaluation of sequence pair in block placement by longest common subsequence computation.
In Proceedings of the Conference on Design, Automation and Test in Europe, Paris, France, 27–30 March 2000; pp. 106–111.
20. Tang, X.; Wong, D.F. FAST-SP: A fast algorithm for block placement based on sequence pair. In Proceedings of the 2001 Asia and
South Pacific design automation conference, Yokohama, Japan, 2 February 2001; pp. 521–526.
21. Dayasagar Chowdary, S.; Sudhakar, M.S. Linear programming-based multi-objective floorplanning optimization for system-on-
chip. J. Supercomput. 2023, 1–24. [CrossRef]
22. Tabrizi, A.F.; Behjat, L.; Swartz, W.; Rakai, L. A fast force-directed simulated annealing for 3D IC partitioning. Integration 2016, 55,
202–211. [CrossRef]
Appl. Sci. 2024, 14, 2905 14 of 14
23. Tung-Chieh, C.; Yao-Wen, C. Modern floorplanning based on B*-tree and fast simulated annealing. IEEE Trans. Comput.-Aided
Des. Integr. Circuits Syst. 2006, 25, 637–650. [CrossRef]
24. Sadeghi, A.; Lighvan, M.Z.; Prinetto, P. Automatic and simultaneous floorplanning and placement in field-programmable gate
arrays with dynamic partial reconfiguration based on genetic algorithm. Can. J. Electr. Comput. Eng. 2020, 43, 224–234. [CrossRef]
25. Chang, Y.F.; Ting, C.K. Multiple Crossover and Mutation Operators Enabled Genetic Algorithm for Non-slicing VLSI Floorplan-
ning. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; pp. 1–8.
26. Tang, M.; Yao, X. A memetic algorithm for VLSI floorplanning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2007, 37, 62–69.
[CrossRef]
27. Xu, Q.; Chen, S.; Li, B. Combining the ant system algorithm and simulated annealing for 3D/2D fixed-outline floorplanning.
Appl. Soft Comput. 2016, 40, 150–160. [CrossRef]
28. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018.
29. Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al.
Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [CrossRef]
30. Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy
updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3
June 2017; pp. 3389–3396.
31. Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016,
arXiv:1611.09940.
32. Zhou, C.; Wu, W.; He, H.; Yang, P.; Lyu, F.; Cheng, N.; Shen, X. Deep reinforcement learning for delay-oriented IoT task scheduling
in SAGIN. IEEE Trans. Wirel. Commun. 2020, 20, 911–925. [CrossRef]
33. Nazari, M.; Oroojlooy, A.; Snyder, L.; Takac, M. Reinforcement learning for solving the vehicle routing problem. Adv. Neural Inf.
Process. Syst. 2018, 31.
34. Huang, J.; Patwary, M.; Diamos, G. Coloring big graphs with alphagozero. arXiv 2019, arXiv:1902.10162.
35. Mirhoseini, A.; Goldie, A.; Yazgan, M.; Jiang, J.W.; Songhori, E.; Wang, S.; Lee, Y.-J.; Johnson, E.; Pathak, O.; Nazi, A.; et al. A
graph placement methodology for fast chip design. Nature 2021, 594, 207–212. [CrossRef]
36. He, Z.; Ma, Y.; Zhang, L.; Liao, P.; Wong, N.; Yu, B.; Wong, M.D.F. Learn to floorplan through acquisition of effective local search
heuristics. In Proceedings of the 2020 IEEE 38th International Conference on Computer Design (ICCD), Hartford, CT, USA, 18–21
October 2020; pp. 324–331.
37. Cheng, R.; Yan, J. On joint learning for solving placement and routing in chip design. Adv. Neural Inf. Process. Syst. 2021, 34,
16508–16519.
38. Agnesina, A.; Chang, K.; Lim, S.K. VLSI placement parameter optimization using deep reinforcement learning. In Proceedings of
the 39th International Conference on Computer-Aided Design, Virtual, 2–5 November 2020; pp. 1–9.
39. Vashisht, D.; Rampal, H.; Liao, H.; Lu, Y.; Shanbhag, D.; Fallon, E.; Kara, L.B. Placement in integrated circuits using cyclic
reinforcement learning and simulated annealing. arXiv 2020, arXiv:2011.07577.
40. Xu, Q.; Geng, H.; Chen, S.; Yuan, B.; Zhuo, C.; Kang, Y.; Wen, X. GoodFloorplan: Graph Convolutional Network and Reinforcement
Learning-Based Floorplanning. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2021, 41, 3492–3502. [CrossRef]
41. Shahookar, K.; Mazumder, P. VLSI cell placement techniques. ACM Comput. Surv. (CSUR) 1991, 23, 143–220. [CrossRef]
42. Gaon, M.; Brafman, R. Reinforcement learning with non-markovian rewards. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3980–3987.
[CrossRef]
43. Bacchus, F.; Boutilier, C.; Grove, A. Rewarding behaviors. Proc. Natl. Conf. Artif. Intell. 1996, 13, 1160–1167.
44. Zimmer, L.; Lindauer, M.; Hutter, F. Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl. IEEE Trans. Pattern
Anal. Mach. Intell. 2021, 43, 3079–3090. [CrossRef]
45. Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International
Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.