Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Leveraging Adversarial Samples for Enhanced Classification of Malicious and Evasive PDF Files
Previous Article in Journal
Efficient Processing of k-Hop Reachability Queries on Directed Graphs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Triple-Threshold Path-Based Static Power-Optimization Methodology (TPSPOM) for Designing SOC Applications Using 28 nm MTCMOS Technology

1
Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510670, China
2
College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China
3
Electric Power Research Institute of Guizhou Power Grid Co., Ltd., Guiyang 550002, China
4
College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
5
School of Micro-Nano Electronics, Zhejiang University, Hangzhou 310027, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(6), 3471; https://doi.org/10.3390/app13063471
Submission received: 10 December 2022 / Revised: 6 March 2023 / Accepted: 7 March 2023 / Published: 8 March 2023
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Abstract

:
The threshold voltage distribution technique is an effective way to reduce the static power consumption of integrated circuits. Several gate-level-based distribution algorithms have been proposed, but the optimization effect and run time still need further optimization when applied to very large-scale integration (VLSI) designs. This paper presents a triple-threshold path-based static power optimization methodology (TPSPOM) for low-power system-on-chip. This method obtains the path weights and cell weights from paths’ timing constraints and cells’ delay-to-power ratios, then uses them as indexes to distribute each cell to low-threshold voltage (LVT), standard-threshold voltage (SVT), or high-threshold voltage (HVT). The experimental results based on a 28 nm circuit containing 385,781 cells show that the TPSPOM method reduces static power consumption by 15.16% more than the critical-path aware power consumption optimization methodology (CAPCOM). At the same time, run time is reduced by 96.85%.

1. Introduction

With the development of complex semiconductors and communication technologies, the research and development of very large-scale integration (VLSI) designs have gradually emerged since the 1970s. With further process development, VLSI is mainly limited by area, speed, and power consumption [1]. Researchers have moved their focus from high-speed to low-power designs to improve portable system endurance, decrease heat dissipation issues, improve reliability, and decrease packaging complexity [2,3,4,5]. The purpose of low-power design is to reduce power consumption while maintaining performance [6].
The circuit power consumption is composed of static power consumption and dynamic power consumption. It is observed that in deep submicron technology, the leakage current contributes 50% of total power consumption [7]. Meanwhile, at the sub-threshold level, the leakage current influences static stability [8]. Therefore, reducing static power consumption is an essential part of low-power design. The sub-threshold leakage current is given by [9]
I l e a k a g e = I o 10 ( V g s V t h ) / S S = l n ( 10 ) V T ( 1 + C d C o x )
where V g s is the gate-to-source voltage, V t h is the threshold voltage, I o is the drain current with V t h = V g s , C d is the depletion layer capacitance, C o x is the gate-oxide capacitance, V T is the thermal voltage, and S is the subthreshold slope.
Techniques such as multi-threshold CMOS (MTCMOS), transistor stacking [10], sleepy stack approach [11], sleepy keeper approach [12], and body bias technique [13] are used to minimize static power consumption. In these techniques, multi-threshold CMOS is very influential in reducing leakage currents [14,15]. Low-threshold voltage (LVT) devices have low latency but high static power consumption, while high-threshold voltage (HVT) devices do the opposite. According to Equation (1), the static power consumption will increase exponentially when the threshold voltage decreases linearly. Thus, when a cell is swapped from LVT to HVT, the proportion of change in static power consumption is much more significant than the proportion of change in delay [16]. Therefore, multi-threshold CMOS technology aims to reduce the proportion of LVT cells as much as possible. It means only using LVT cells on critical paths to meet the timing constraints and HVT cells on non-critical paths to reduce static power consumption. However, the complex circuit network makes it difficult to distribute the threshold voltage. In many circuit designs, the standard-threshold voltage (SVT) between LVT and HVT is also used, further increasing the threshold voltage distribution’s complexity.
The threshold voltage distribution algorithms in [17,18,19] effectively reduce static power consumption. However, these methods require updating the timing in the static timing analysis (STA) tools after each cell replacement, which means that the number of timing updates depends directly on the circuit scale. When applied to a VLSI design, the algorithms are less time feasible. On the other hand, when considering the circuit timing, these algorithms use the number of critical paths passing through the cell to distinguish whether the cell is critical, which cannot accurately reflect the connection between the cell and the path timing.
This paper describes a triple-threshold path-based static power optimization methodology (TPSPOM). This method can effectively assign the three threshold voltages of the cell to reduce static power consumption. Its primary significance is as follows.
(1)
We proposed a new way of distinguishing critical cells. We distinguish the criticality of a cell based on its delay-to-power ratio and the criticality of the paths through it. The most critical cell in the most critical path is then selected in the replacement process.
(2)
Our algorithm flow avoids many operations of updating the timing and is suitable for VLSI designs.

2. Related Work

Various algorithms have been proposed for minimizing static power consumption using the dual-threshold technique at the transistor level [20,21,22]. However, the transistor-level analysis and assignment methodology is complex and unsuitable for current VLSI designs. Compared to the transistor stage, the gate-level circuit uses standard cells as the basic unit of the circuit, significantly reducing the complexity of the VLSI designs. As a result, the design approach for threshold voltage distribution has also evolved from the transistor level to the gate level [17,18,19,23,24]. The methods in [23,24] significantly reduce the number of operations by batch replacement, while the methods in [17,18,19] choose a node-by-node replacement for accuracy.
The gate-level dual-threshold static power-optimization methodology (GDSPOM) [17] provides a way to distribute threshold voltages at the gate level. The circuit’s initial state in this method is all-HVT. This method analyzes the timing by STA [25] and then swaps the cell with the highest cost to LVT, where the cost of the cell refers to the number of critical paths through it. This method then repeats the above operation of analyzing the timing and replacing the cell until the timing meets the constraints. For example, as shown in Figure 1, there are six paths in the design, three of which are critical. Cells A and D will be swapped to LVT first since their costs are the highest.
The cell-based leakage power reduction priority (CBLPRP) [18] optimization methodology views the static power consumption difference Δ p as a more effective indicator than the number of critical paths through the cell. The circuit’s initial state in this method is all-LVT. This method calculates each cell’s static power consumption difference Δ p between LVT and HVT states, then swaps the cell with the highest Δ p to HVT. If the timing does not meet the constraints after replacement, this method will swap the cell back to LVT. The above swapping operation continues until all cells are accessed. In Figure 2, for example, the cells are accessible in the order of A C B D based on the value of Δ p .
Critical-path aware power-consumption optimization methodology (CAPCOM) [19] considers that cells’ delay, static power consumption, and the number of critical paths through the cells need to be considered. This method divides cells into critical cells and non-critical cells. For non-critical cells, this method calculates the delay difference Δ d and the static power difference Δ p for each cell between LVT and HVT states. For critical cells, this method calculates the number of critical paths N through each cell additionally. The initial state of the circuit is all-LVT, and the non-critical cells are first accessed in descending order of the power-to-delay ratio Δ p / Δ d . Then, access each critical cell in descending order of Δ p / ( Δ d N ) values. As with CBLPRP, if the swapped cell does not meet the timing constraint, swap it back and loop until all cells are accessed. This method can be applied to triple-threshold voltage distribution: prioritize whether the cell can be swapped from LVT to HVT; after accessing all cells, consider whether the remaining LVT cells can be swapped to SVT.
However, these methods have the following drawbacks:
  • These methods need to update the timing every time they swap a cell, which can lead to significant runtime in VLSI designs. The operation to update the timing is performed in STA tools, which calls the composite current source (CCS) timing library to recalculate the cell’s delay and update the path timing. Calculating delay using the CCS library is time-consuming [26], and updating the paths requires traversing many paths. For the same number of cells, replacing one cell at a time takes longer than replacing cells in a batch.
  • The VLSI designs have a large order of magnitude of paths, so calculating the number of critical paths is also time-consuming. In GDSPOM, the critical path number must be updated after each cell replacement, while CAPCOM only calculates the critical path number once in the initial stage. Regarding time complexity, CAPCOM is much less than GDSPOM, while similar to CBLPRP.
  • Targeted replacement is more effective if the timing constraints for each path are considered rather than just the number of critical paths.

3. Theoretical Analysis

This paper describes a triple-threshold path-based static power optimization methodology (TPSPOM). This method concentrates on the specific time constraints of each path and considers how each cell affects the overall circuit and the given path. At the same time, this method saves run time by reducing the number of update times. For designs containing hundreds of thousands of cells, the TPSPOM method can efficiently distribute triple-threshold voltages.
The main steps of the method are as follows:
(1)
A simplified path collection is used instead of all paths when considering the circuit’s timing.
(2)
Obtain the timing constraint inequalities for each path based on the information of cells and paths.
(3)
Quantify the criticality of paths and use them as path weights.
(4)
Get cell weight based on the weights of all paths associated with the cell and the delay-to-power ratio of the cell.
(5)
The initial state of the circuit is all-HVT. Path weights are used as the first order and cell weights as the second order to select the cells needing replacement. After each replacement, the path and cell weights are updated using approximate calculations. Repeat swapping cells until all the paths in the path collection meet the timing constraints. This stage saves time by avoiding updating the timing within the STA tools.
(6)
Apply the assignment results obtained from step 5 in the STA tools. If there are still timing violations, expand the path collection and perform step 5 again until the STA results meet the constraints.

3.1. The Reason and Way of Simplifying Path Collection

The critical paths reflect the timing state of the circuit, and the information provided by the critical paths is an important reference for threshold voltage distribution. However, the order of magnitude of the paths in the mesh structure is so large that traversing all critical paths is time-consuming or infeasible. For example, as shown in Figure 3, suppose that the structure in the dotted box is repeated 15 times. Currently, the maximum logic depth (the number of cells on one path) is ( 5 + 15 2 ) = 35 , which is common in VLSI designs. When the structure inside the dashed box is not considered, the paths through A and B are 3 respectively. For each repetition of the structure inside the dashed box, the number of paths is multiplied by 2. Thus the number of paths is ( 2 15 ( 3 + 3 ) ) = 196 , 608 , while the cell number is ( 7 + 15 3 ) = 52 . It means that the order of magnitude of the paths may be much larger than that of cells.
The high similarity of these paths means they provide similar information, and access to all of them means redundant information at a high cost. In order to reduce the computational complexity, the proposed method obtains only the worst timing path through each pin. In the case of Figure 3, the maximum number of paths counted in this way is 68, which is the same order of magnitude as the pin. Using simplified path collection lacks completeness, and ways to expand the path collection to compensate for this drawback are described later in Section 3.5.

3.2. Obtaining Path-Based Timing Constraints

The CAPCOM method considers the cell delay without considering the path and uses the average delay of the cell as a parameter. However, the specific impact of the cell on a path needs to be considered. For example, in Figure 4, the delays of timing arc A- t o -O and timing arc B- t o -O are different. A significant difference exists between the cell delay on the path through pin A and the path through pin B.
On the other hand, CAPCOM only distinguishes paths into critical paths and non-critical paths. The method in this paper can obtain more accurate assignment results from distributing threshold voltages against path constraints. The path constraint is given by
j C I P i ( α j d C ( i , j ) + d N ( i , j ) ) + t s i t r i
where i refers to a particular path, C I P i is the set of cells in path i, α j is the timing derate of the cell j, d C ( i , j ) is cell j’s delay in the path i, d N ( i , j ) is the net delay which connected to cell j’s output pin in path i, t s is the library setup time, t r is the required time.

3.3. Path Weight

The timing slack of the path is given by
t s l a c k i = t r i j C I P i ( α j d C ( i , j ) + d N ( i , j ) ) t s i
where t s l a c k is the timing slack of the path. When t s l a c k is not lower than 0, the timing of the path is met, and the opposite is not met.
Then we propose the concept of repairable timing, which means the total timing gain we can obtain when converting the path from all-HVT to all-LVT. The equation of the repairable timing is given by
t r e p a i r a b l e i = j C I P i ( α H j d C H ( i , j ) + d N H ( i , j ) α L j d C L ( i , j ) d N L ( i , j ) )
where t r e p a i r a b l e is the repairable timing of the path, α H j , d C H ( i , j ) , d N H ( i , j ) is obtained when cell j is in HVT state, α L j , d C L ( i , j ) , d N L ( i , j ) is obtained when cell j is in LVT state.
Before distributing the threshold voltage, the circuit meets the timing at a given frequency. It means that the timing of the path will meet the constraint if the path is all-LVT. Thus, the sum of t r e p a i r a b l e and t s l a c k will not exceed 0. The path weight R i is then given by
R i = t s l a c k i t r e p a i r a b l e i , R i 1
When R i is less than 0, it indicates that path i meets the timing. The closer R i is to 1, the higher the proportion of LVT required within the path.
When a cell is replaced, the path weight will be updated. The equation is given by
U i = t s l a c k i + ( α H n d C H ( i , n ) + d N H ( i , n ) α L n d C L ( i , n ) d N L ( i , n ) ) t r e p a i r a b l e i ( α H n d C H ( i , n ) + d N H ( i , n ) α L n d C L ( i , n ) d N L ( i , n ) ) R i = 0 if U i < 0 U i if U i 0
where n refers to the cell swapped from HVT to LVT. The value of R i becomes smaller after the update, which indicates that the probability of the remaining unreplaced cells being replaced with LVT decreases. When all paths’ weight R does not exceed 0, all paths in the path collection have met the timing.
The larger the value of R, the more significant the proportion of LVT cells required by this path and the higher the probability of cells being replaced by LVT cells. If the low-probability cell is swapped first, the high-probability cell may still need to be swapped because the path timing constraint where it is located is tight. It can cause redundant replacements. Conversely, replacing high-probability cells will make many low-probability cells no longer need to be swapped. Therefore, it is necessary to prioritize the replacement of cells on paths with larger weights. Thus, in this paper, the algorithm prioritizes traversing the paths with larger R and then selects a cell on it for replacement and subsequent update t s l a c k , t r e p a i r a b l e , and R, after which the above operation is iterated until all paths meet timing constraints.

3.4. Cell Weight

When selecting a cell to be replaced within the path, we consider the cell’s delay-to-power ratio and the timing constraints of the paths through it. The weight of the cell is obtained from the path collection and is given by
W j = i P T C j ( α H j d C H ( i , j ) + d N H ( i , j ) α L j d C L ( i , j ) d N L ( i , j ) ) R i p L j p H j , R i > 0
where j refers to a cell, PTC is the set of paths through the cell j, and p is the static power consumption of the cell j. The cell’s weight is based on its delay and static power consumption and depends on the criticality of the paths through it.
When a particular path m through the cell j changes, cell j’s weight will decrease accordingly. It is shown in Equation (8), where R m is obtained from Equation (6).
W j = W j + ( α H j d C H ( m , j ) + d N H ( m , j ) α L j d C L ( m , j ) d N L ( m , j ) ) p L j p H j ( R m R m )
By updating the weights, we can find the cell in the given path with the most significant impact on the current timing.

3.5. Reasons for Expanding Path Collection

In this paper, the algorithm finds the critical path by R and then finds the most critical cell on this path by W. After that, update t s l a c k , t r e p a i r a b l e , R, W, and repeat the above operation until the slack of all paths is greater than 0.
However, applying the obtained results to STA may not fully meet the timing constraints. There are two main reasons.
The first reason is that we use approximate calculations to get the delay of the cell. The timing arc delay of the HVT cells is obtained in the all-HVT case. The same goes for LVT cells. Therefore, there is a small gap between the delay in the calculation and the delay after applying STA, which may result in paths with minor violations remaining in the path collection.
Another reason is that the original path collection collects the most critical path of each pin, and some sub-critical paths still need to be collected. After meeting the constraints of the paths in the original path collection, some leftover sub-critical paths will still not meet the constraints.
In this case, the path collection needs to be supplemented. The legacy sub-critical paths are added to the original path collection, and the state of the sub-critical paths must be changed to all-HVT. The above algorithm is then re-run in the all-HVT state based on the new path collection. Repeat until the result of the threshold voltage distribution meets the timing constraints within the STA.

3.6. Advantages of the Method

3.6.1. Effectively Reduce Runtime

For single-cell replacement, updating the timing in the STA tools is replaced by updating the path weights and cell weights. The delay information of the cell is obtained at the initial stage, and the calculation method using CCS is replaced by an approximate calculation when updating the cell delay. The approximate calculation may bring two cases: (1) a small amount of redundant replacement slightly increase static power consumption; (2) a small amount of path timing is incompletely repaired, which can be solved by adding paths to the path collection. However, the approximate computation can substantially improve the algorithm’s feasibility and reduce the operating cost.
Another way to improve the algorithm’s feasibility is to use some of the most critical paths instead of all as constraints. The disadvantage is the existence of sub-critical paths that are not considered, which can also be solved by expanding the path collection.

3.6.2. More Accurate Swapping Strategy

The CAPCOM method uses the delay-to-power ratio as the cell weight, leading to the optimal solution if the optimization object has only one path. Considering that the object is the overall timing (all paths), the CAPCOM method multiplies the delay-to-power ratio by the number of critical paths to represent the impact of the cell on the overall timing.
The advantage of the algorithm in this paper is that, compared with previous algorithms, the impact of cells on timing is considered more precisely by using specific path constraints rather than simply dividing them into critical or non-critical paths. The traversal is divided into two stages. The first stage obtains the path with the largest weight. The larger the path weight, the more significant the proportion of LVTs needed within the path, and the greater the probability that cells within this path will be replaced with LVTs. Prioritizing the cells with higher replacement probability can effectively avoid redundant replacements. The second stage is to obtain the cell with the largest weight within the path. The cell weight depends on the path weight and the delay-to-power ratio, which reflects the cell’s ability to influence the overall timing in more detail.

3.6.3. Effectively Compensate for the Impact of Simplification

The path collection is expanded to compensate for the little differences due to approximate calculations and the legacy timing problems due to simplifying paths. After bulk replacement in STA tools, add the legacy path constraints to the original path collection. Loop until the result of STA shows that the timing constraints have been met. It effectively ensures the convergence of the running time and makes the operation of simplifying paths feasible.

4. Algorithm Flow

4.1. Basic Flow

As shown in Figure 5, the flow of TPSPOM can be divided into three parts.
  • Part 1: Initial data.
    Convert the circuit to all-HVT/all-SVT/all-LVT states separately, perform STA, and obtain the timing arc, derate, and power consumption information. Then obtain the initial path collection with the circuit at all-HVT.
  • Part 2: Sub algorithm.
    Obtain the result of the threshold voltage assignment based on the given path collection. The first path collection is provided in Part 1, and the subsequent one is in Part 3.
  • Part 3: Top algorithm.
    Apply the threshold voltage assignment result provided by Part 2, and perform STA. If the STA timing results do not meet the timing constraints, add new paths to the original path collection. Then the new path collection is provided to Part 2, repeat until the results of STA meet timing constraints.
The details of the steps of the above process are implemented in Algorithm 1 (sub algorithm) and Algorithm 2 (top algorithm).
When the path collection is given, the result of the assignment of cell thresholds is obtained according to the Algorithm 1 as follows.
  • Step 1: Initialize data.
    According to Equations  (2)–(5) and (7), get t s l a c k i , t r e p a i r a b l e i and R i of each path i and get W L j , W S j of each cell j. W S j denotes the weight brought by the replacement of the cell from HVT to SVT, and  W L j denotes the weight brought by the replacement of the cell from SVT to LVT. For the same cell, W S j is greater than W L j , according to Equation (1). Swapping a cell to LVT is considered only after it has been swapped to SVT. W j equal W S j when cell is in at HVT, while equal W L j when in SVT and equal 0 when in LVT.
  • Step 2: Iterate to find replacement cells.
    Find the most critical path m with the largest R m , then find the cell n with the largest weight W n on the path. After that, swapping the cell n from HVT to SVT or SVT to LVT.
  • Step 3: Update data during iteration.
    When a cell is replaced, update all paths through it, and update the cells’ weight on those paths.
Algorithm 2 calls Algorithm 1 and keeps updating the path collection to find the optimal solution until the STA results meet the timing constraints.
Algorithm 1 Sub triple-threshold voltage distribution algorithm
Input: path collection C, static power consumption P, timing arcs, the OCV derate α
Output: sub cell swap list
 1:Initialize each path i’s t s l a c k i , t r e p a i r a b l e i and R i from collection C
 2:Collect the replaceable cells on each path i as collection C I P i
 3:Collect the paths through each cell j as collection P T C j
 4:Initialize each cell j’s weight W S j (HVT-to-SVT) and W L j (SVT-to-LVT)
 5:Initialize the state of all cells to 0
 6:When cell j’s state is 0, W j = W S j ; when state is 1, W j = W L j ; when state is 2, W j = 0
 7:repeat
 8:   Select the path m with the largest R m value
 9:   Select the cell n with the largest W n value in C I P m
 10:   if cell’s state = 0 then
 11:     Swap cell n from HVT to SVT, remark cell’s state as 1
 12:   else if cell’s state = 1 then
 13:     Swap cell n from SVT to LVT, remark cell’s state as 2
 14:   end if
 15:   for each path x in P T C n  do
 16:     Update t s l a c k x , t r e p a i r a b l e x , R x with Equations (3), (4) and (6)
 17:     for each cell y in C I P x  do
 18:        Update W S y , W L y with Equation (8)
 19:     end for
 20:   end for
 21:until all paths’ t s l a c k 0
 22:Return sub cell swap list
Algorithm 2 Top triple-threshold voltage distribution algorithm
Input: Static timing analysis results
Output: Final cell-swapping list
 1:Initialize the circuit state to all-HVT
 2:Initialize the path collection C
 3:repeat
 4:   Provide path collection C to Algorithm 1
 5:   Obtain the sub cell-swapping list from Algorithm 1
 6:   Obtain the new path collection C after applying the sub cell swapping list in STA tools
 7:   Add the new path collection C to the original path collection C
 8:   Restore the circuit state to all-HVT
 9:until STA results meet the timing constraints
 10:Return Final cell-swapping list

4.2. Optimize the Sub Algorithm

We then will discuss the time feasibility of Algorithm 1. Calculate the time complexity of Algorithm 1. Assume that the number of swapped cells is A, the number of paths in path collection is B, and the maximum number of cells on a path is C.
The time complexity of the initialization phase is O ( B C ) .
In the algorithm, we need to retrieve the most critical path before each cell replacement, i.e., find the path with the worst timing among all the paths that have not yet met the timing constraints. With the execution of the algorithm, the number of paths that do not meet the timing constraints gradually decreases. Therefore, the number of paths that need to be traversed per replacement is also decreasing. Figure 6 shows the relationship between the number of paths to be traversed and the current number of replaced cells. The experimental circuit in this figure is the same as that in Section 5. In this figure, the experiment was performed on a CPU circuit containing 385781 cells and triple-threshold voltage. The experiment was to execute Algorithm 1 once based on the initial path collection. The experimental results show that the number of paths to be traversed decreases approximately linearly as the number of replaced cells increases, so the time complexity of finding the worst path is O ( A B ) .
The time complexity of finding the cell with the highest weight in the worst path is O ( A C ) , while that of swapping cells is O ( A ) . The worst case is that every cell of every path is replaced. Thus the time complexity of updating the path weights is O ( B C ) . A change in each path causes the weights of the C cells to be updated. Thus the time complexity of updating the cell weights is O ( B C 2 ) .
In the actual circuit design, the logical depth C of the path is much lesser in order of magnitude than the number of replaced cells A and the number of traversed paths B, which can be considered as a constant when calculating the time complexity. Moreover, A and B can be considered in the same order of magnitude. Thus, the worst time complexity obtained is O ( B 2 ) of finding the worst path.
Then we update the sub-algorithm, which improves the running time in two ways while guaranteeing the same execution results.

4.2.1. Optimize the Way to Update Cells’ Weight

Weighting aims to find the most critical cell in a given path. The original approach was to replace a cell and update the weights of all associated cells. Therefore, we propose a new updating strategy that can effectively reduce the number of updating weights.
First, we do not update the cell weight immediately after the cell is replaced. At this point, the weights of all the cells in the given path are not updated. Moreover, there is a fact that the weight of the updated cell will not exceed what it was when it was not updated. Therefore, we only need to obtain a cell that satisfies the following conditions: (1) its weight has been updated; (2) its weight is greater than the weight of all other cells in the path (regardless of whether the weight of other cells is updated).
To this end, the following update operations are taken: (1) to obtain the cell with the highest weight in the path. (2) If the weight of this cell is not updated, update it and return to step (1); otherwise, output this cell.
Figure 7 shows an example. In step (a), we obtain cell A with the highest weight in the path. In step (b), we judge that the weight of cell A needs to be updated, so we update it and find a new cell F with the largest weight. In step (c), we judge that the weight of cell F still needs to be updated; after updating it, we find that cell F has met the abovementioned conditions, so we output it.
The above operations effectively reduce the number of operations to update the cell’s weight and the algorithm’s running time.

4.2.2. Optimize the Way to Find the Path

Updating path weight is the operation with the highest time complexity in Algorithm 1. Therefore, the following method is proposed to reduce the time complexity.
We set a range according to path weight R, obtain the paths with the worst timing in the path collection, and set it as the subpath collection. There is a fact that for each path, the path weight only decreases as the cells are swapped to a lower threshold voltage. As long as the following operations are performed, the path with the worst timing must be in the subpath collection: (1) replace and update the path’s weight in the subpath collection every time and remove the path that no longer belongs to the range. (2) When the subpath collection is empty, give a new range to obtain a new subpath collection.
Therefore, we can only traverse the collection of subpaths when looking for the worst path, avoiding traversing all paths. We obtain the subpath collection in the following way: set a parameter D ( D > 1 ) and set the initial range of the subpath set as ( 1 1 / D ) < R < 1 . When a new range is needed, reduce the left and correct bounds by 1 / D each.
We compared the experiment shown in Figure 6 based on the same circuit condition. Set D = B 0.5 = 655 . The relationship between the number of traversed paths and the times of swapping cells are shown in Figure 8. Observe that the number of traversed paths is always less than the value of B / D . According to the experiment results, the number of operations for extracting the sub-collection and traversing path is less than ( A B / D + B D ) . We take D = B 0.5 in the experiment. Thus the time complexity in the experimental results of the new approach is O ( B 1.5 ) .

4.2.3. The Update Algorithm

After applying the above two methods, the update algorithm for Algorithm 1 is shown in Algorithm 3. The main update steps are shown below.
  • Updated step 1: Optimize the way to update cells’ weight.
    The weights are not updated when replacing cells. Update the weights of some cells on the path after finding the most critical path. First, find the cell with the maximum weight on the path, update the weight of this cell, and then find the new cell with the maximum weight. Loop until the cell with the maximum weight has been updated.
  • Updated step 2: Optimize the way to find the path.
    Extract the worst part from the path collection C as the sub-path collection S C , and the smallest R-value in the sub-collection is denoted as R F . Find the most critical path m with the largest R m in the sub-collection S C . When R m is smaller than R F , update the sub-collection S C .
Algorithm 3 The updated sub triple-threshold voltage distribution algorithm
Input: path collection C, static power consumption P, timing arcs, the OCV derate α
Output: sub cell swap list
 1:Initialize each path i’s t s l a c k i , t r e p a i r a b l e i and R i from collection C
 2:Collect the replaceable cells on each path i as collection C I P i
 3:Collect the paths through each cell j as collection P T C j
 4:Initialize each cell j’s weight W S j (HVT-to-SVT) and W L j (SVT-to-LVT)
 5:Initialize the state of all cells to 0
 6:When cell j’s state is 0, W j = W S j ; when state is 1, W j = W L j ; when state is 2, W j = 0
 7:repeat
 8:   Obtain sub-collection S C from collection C, and get R F , R m from S C
 9:   Select the path m with the largest R m value from SC
 10:   while  R m R F  do
 11:     repeat
 12:        Select the cell n with the largest W n in C I P m
 13:        Update cell n’s W S n (when state is 0) or W L n (when state is 1)
 14:     until  W n has been updated in this loop
 15:     if cell state = 0 then
 16:        Swap cell n from HVT to SVT, remark cell’s state as 1
 17:     else if cell state = 1 then
 18:        Swap cell n from SVT to LVT, remark cell’s state as 2
 19:     end if
 20:     for each path x in P T C n  do
 21:        update t s l a c k x , t r e p a i r a b l e x , R x with Equations (3), (4) and (6),
 22:        update collection C, SC
 23:     end for
 24:     Select the path m with the largest R m value from SC
 25:   end while
 26:until all paths’ t s l a c k 0
 27:Return sub cell swap list

5. Experimental Results

The experiment is based on a CPU IP using 28 nm technology, while the number of cells is 385781, and the required frequency is 1 GHz. TPSPOM is applied on this circuit, while CBLPBR and CAPCOM have also been used for comparison.

5.1. Static Power Consumption Results

When the circuit is all-LVT, the value of static power consumption is 19.7909 m W .
First, we compare the methodologies’ results in using only dual-threshold voltage, HVT, and LVT. Table 1 shows the experimental results of power consumption, the number of the LVT combinational cell, and the number of the LVT sequential cell under various delay constraints. Figure 9 shows the result of static power consumption, and Figure 10 shows the results of the percentage of the LVT cells.
The results show that for static power consumption optimization, CAPCOM is slightly better than CBLPRP, while TPSPOM is significantly better than the other two methods. Moreover, the results indicate that under the shortest 1 ns delay constraint, compared to the all-LVT circuit, CBLPRP’s static power consumption is reduced to 35.12%, while CAPCOM’s is reduced to 34.91% and TPSPOM’s is reduced to 24.26%.
For the percentage of LVT cells, CAPCOM is significantly lower than CBLPRP, while TPSPOM is significantly lower than the other two methods. It proves that using the parameter of delay-to-power ratio as a basis for distributing threshold voltage can effectively reduce the proportion of LVT cells.
Further, compare the results of the CAPCOM and TPSPOM methods when using triple-threshold voltage. The result is shown in Table 2, Figure 11 and Figure 12.
The results show that the TPSPOM method is more effective when using triple-threshold voltage. The results indicate that under the shortest 1 ns delay constraint, compared to the all-LVT circuit, CAPCOM’s static power consumption is reduced to 24.91%, while TPSPOM’s is reduced to 9.75%. The percentage of the SVT cells for CAPCOM is 11.30%, while for TPSPOM is 29.61%. Meanwhile, the percentage of the LVT cells for CAPCOM is 21.2%, while for TPSPOM is 4.95%.
The CAPCOM method prioritizes swapping cells from LVT to HVT, then considers swapping cells from LVT to SVT. Moreover,the CAPCOM method will have a higher percentage of LVT compared to the priority replacement of SVT and then HVT.
However, the difference in power consumption between LVT and SVT is much more significant than that between SVT and HVT, and priority should be given to swapping cells from LVT to SVT. For example, in this circuit, the value of static power consumption for a particular inverter is 6.403 * 10 10 W for HVT, 6.561 * 10 9 W for SVT, and 9.663 * 10 8 W for LVT. The delay difference between HVT and SVT is 0.03235 ns, while the value is 0.01265 ns between SVT and LVT. Calculate the ratio of static power consumption difference to delay difference, and the result is that swapping from SVT to LVT is 38.9 times higher than that of swapping from HVT to SVT. It indicates that the static power consumption cost of using LVT cells is too large.
Therefore, when fixing timing violations, the design should prioritize converting the cell from HVT to SVT before considering converting the cell from SVT to LVT. The TPSPOM method implements a design philosophy that prioritizes HVT, followed by SVT, and finally LVT, which is more consistent with the current circuit design goals. Therefore, even if the percentage of SVT cells of TPSPOM is significantly larger than that of CAPCOM, it can achieve a lower static power consumption value because of the lower percentage of LVT cells.

5.2. Method Runtime

The experiments are run on a computer with 41.60 GFLOPS. With the lower LVT percentage of CAPCOM compared to CBLPRP, there will be fewer operations for CAPCOM to replace cells that do not meet the timing from HVT back to LVT, which makes CAPCOM run faster than CBLPRP. In the triple-threshold case, under the timing constraints of 1 ns, the CAPCOM runs for 139.6 h and the TPSPOM for 4.4 h. It shows that another advantage of the TPSPOM is that the time is acceptable when applied to a VLSI design.
In addition, with the initial path collection, the running time of Algorithm 1 is 15.6 h, while Algorithm 3 is 6 min.

6. Conclusions

This paper describes a triple-threshold path-based static power-optimization methodology (TPSPOM) using the MTCMOS technique. This method addresses the shortcomings of existing threshold voltage assignment algorithms that do not specifically consider the connection between cells and paths and frequently update the timing in STA tools. It improves both static power consumption results and algorithm running time:
(1)
The algorithm proposed in this paper locates the cells to be replaced based on path weights and cell weights to optimize static power consumption. The path weight is obtained from the path constraint, and the cell weight is obtained from the path weight through the cell and its delay-to-power consumption ratio.
(2)
The proposed algorithm uses a simplified collection of paths for running-time optimization to characterize the overall timing and avoids frequent timing updates in STA tools.
The experiments on a CPU IP containing 385,781 cells on a 28 nm process show that the TPSPOM method reduces static power consumption by 15.16% more than CAPCOM with a cycle time of 1 ns while reducing the running time by 96.85%. It proves that the algorithm flow provided in this paper can be effectively applied in the designs of VLSI and supports the distribution of triple-threshold voltage with a significant effect of optimizing static power consumption and low running time.
For future work, since the running speed of the algorithm is directly related to the scale of the circuit, we will further improve the running speed of the algorithm to be suitable for larger-scale circuits.

Author Contributions

Conceptualization and supervision, D.Z. and K.H.; methodology, P.L. and S.Z.; validation, P.L., S.Z., W.X. and C.X.; data curation, W.X. and C.X.; writing—original draft preparation, S.Z.; writing—review and editing, S.Z.; funding acquisition, D.Z. and K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key R&D Program of China (2020YFB0906000, 2020YFB0906001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kumar, S.V.; Rao, P.V.; Sharath, H.A.; Sachin, B.M.; Ravi, U.S.; Monica, B.V. Review on VLSI design using optimization and self-adaptive particle swarm optimization. J. King Saud Univ.-Comput. Inf. Sci. 2020, 32, 1095–1107. [Google Scholar] [CrossRef]
  2. Singh, V.; Arya, S.K.; Kumar, M. Different Perspectives of Low Power Design for CMOS VLSI Circuits. Int. J. Electron. Eng. 2018, 604–613. [Google Scholar]
  3. Lu, Y.H.; De Micheli, G. Comparing system level power management policies. IEEE Des. Test Comput. 2001, 18, 10–19. [Google Scholar]
  4. Caldari, M.; Conti, M.; Crippa, P.; Orcioni, S.; Solazzi, M.; Turchetti, C. Dynamic power management in an AMBA-based battery-powered system. In Proceedings of the 9th International Conference on Electronics, Circuits and Systems, Dubrovnik, Croatia, 15–18 September 2002; Volume 2, pp. 525–528. [Google Scholar]
  5. Zhang, J.F.; Gao, R.; Duan, M.; Ji, Z.; Zhang, W.; Marsland, J. Bias Temperature Instability of MOSFETs: Physical Processes, Models, and Prediction. Electronics 2022, 11, 1420. [Google Scholar] [CrossRef]
  6. Moradinezhad Maryan, M.; Amini-Valashani, M.; Azhari, S.J. An input controlled leakage restrainer transistor-based technique for leakage and short-circuit power reduction of 1-bit hybrid full adders. Int. J. Circuit Theory Appl. 2021, 49, 2382–2395. [Google Scholar] [CrossRef]
  7. Ghosh, P.; Saha, T.; Kumari, B. Aspects of Low-Power High-Speed CMOS VLSI Design: A Review. In Industry Interactive Innovations in Science, Engineering and Technology; Springer: Singapore, 2018; pp. 385–394. [Google Scholar]
  8. Bai, N.; Hu, Z.; Wang, Y.; Xu, Y. Leakage Current Stability Analysis for Subthreshold SRAM. Electronics 2022, 11, 1196. [Google Scholar] [CrossRef]
  9. Kao, J.T.; Chandrakasan, A.P. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. -Solid-State Circuits 2000, 35, 1009–1018. [Google Scholar] [CrossRef]
  10. Narendra, S.; De, V.; Antoniadis, D.; Chandrakasan, A.; Borkar, S. Scaling of stack effect and its application for leakage reduction. In Proceedings of the 2001 International Symposium on Low Power Electronics and Design, Huntington Beach, CA, USA, 6–7 August 2001; pp. 195–200. [Google Scholar]
  11. Park, J.C.; Mooney, V.J.; Pfeiffenberger, P. Sleepy stack reduction of leakage power. In Proceedings of the International Workshop on Power and Timing Modeling, Optimization and Simulation, Santorini, Greece, 15–17 September 2004; pp. 148–158. [Google Scholar]
  12. Hanchate, N.; Ranganathan, N. A new technique for leakage reduction in CMOS circuits using self-controlled stacked transistors. In Proceedings of the 17th International Conference on VLSI Design, Proceedings IEEE, Washington, DC, USA, 5–9 January 2004; pp. 228–233. [Google Scholar]
  13. He, X.; Al-Kadry, S.; Abdollahi, A. Adaptive leakage control on body biasing for reducing power consumption in CMOS VLSI circuit. In Proceedings of the 2009 10th International Symposium on Quality Electronic Design, San Jose, CA, USA, 16–18 March 2009; pp. 465–470. [Google Scholar]
  14. Suguna, T.; Rani, M.J. Survey on power optimization techniques for low power VLSI circuit in deep submicron technology. Int. J. Vlsi Des. Commun. Syst. 2018, 9, 1–15. [Google Scholar]
  15. Kuo, J.B.; Lin, S.C. Low-Voltage SOI CMOS VLSI Devices and Circuits; John Wiley & Sons: Hoboken, NJ, USA, 2004; pp. 241–244. [Google Scholar]
  16. Dixit, A. Transistor Leakage Mechanisms and Power Reduction Techniques in CMOS VLSI Design. Int. J. Adv. Res. Comput. Commun. Eng. 2016, 5, 102–107. [Google Scholar]
  17. Chung, B.; Kuo, J.B. Gate-level dual-threshold static power optimization methodology (GDSPOM) using path-based static timing analysis (STA) technique for SOC application. Integration 2008, 41, 9–16. [Google Scholar] [CrossRef]
  18. Huang, H.X.; Shen, S.R.; Kuo, J.B. Cell-based leakage power reduction priority (CBLPRP) optimization methodology for designing SOC applications using MTCMOS technique. In Proceedings of the International Workshop on Power and Timing Modeling, Optimization and Simulation, Madrid, Spain, 26–29 September 2011; pp. 143–151. [Google Scholar]
  19. Lin, G.J.; Hsu, C.B.; Kuo, J.B. Critical-path aware power consumption optimization methodology (CAPCOM) using mixed-V TH cells for low-power SOC designs. In Proceedings of the 2014 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, Melbourne, Australia, 1–5 June 2014; pp. 1740–1743. [Google Scholar]
  20. Wei, L.; Chen, Z.; Johnson, M.; Roy, K.; De, V. Design and optimization of low voltage high performance dual threshold CMOS circuits. In Proceedings of the 35th Annual Design Automation Conference, San Francisco, CA, USA, 15–19 June 1998; pp. 489–494. [Google Scholar]
  21. Samanta, D.; Pal, A. Optimal dual-V/sub T/assignment for low-voltage energy-constrained CMOS circuits. In Proceedings of the ASP-DAC/VLSI Design 2002, 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design, Bangalore, India, 11 January 2002; pp. 193–198. [Google Scholar]
  22. Wang, Q.; Vrudhula, S.B. Algorithms for minimizing standby power in deep submicrometer, dual-V/sub t/CMOS circuits. IEEE Trans. -Comput.-Aided Des. Integr. Circuits Syst. 2002, 21, 306–318. [Google Scholar] [CrossRef]
  23. Xianrui, L.; Haibo, G.; Xinquan, K.; Yushan, L. Dynamic Threshold Static Power Optimization Techniques for the chip. J. Univ. Electron. Sci. Technol. China 2009, 38, 443–446. (In Chinese) [Google Scholar]
  24. Fan, R.; Dandan, Z.; Xiaolang, Y. An algorithm for reducing leakage power based on dual-threshold voltage technique. In Proceedings of the 2013 Fourth International Conference on Digital Manufacturing & Automation, Singapore, 13–15 December 2013; pp. 132–134. [Google Scholar]
  25. Bhasker, J.; Chadha, R. Static Timing Analysis for Nanometer Designs: A Practical Approach; Springer Science & Business Media: Berlin, Germany, 2009. [Google Scholar]
  26. El Motassadeq, T. CCS vs NLDM comparison based on a complete automated correlation flow between PrimeTime and HSPICE. In Proceedings of the 2011 Saudi International Electronics, Communications and Photonics Conference (SIECPC), Riyadh, Saudi Arabia, 24–26 April 2011; pp. 1–5. [Google Scholar]
Figure 1. A gate-level dual-threshold static power-optimization methodology (GDSPOM) cell-swapping example.
Figure 1. A gate-level dual-threshold static power-optimization methodology (GDSPOM) cell-swapping example.
Applsci 13 03471 g001
Figure 2. A cell-based leakage power reduction priority (CBLPRP) cell-swapping example.
Figure 2. A cell-based leakage power reduction priority (CBLPRP) cell-swapping example.
Applsci 13 03471 g002
Figure 3. A path number example.
Figure 3. A path number example.
Applsci 13 03471 g003
Figure 4. An example of or-gate timing arcs.
Figure 4. An example of or-gate timing arcs.
Applsci 13 03471 g004
Figure 5. Flow diagram of triple-threshold path-based static power optimization methodology (TPSPOM).
Figure 5. Flow diagram of triple-threshold path-based static power optimization methodology (TPSPOM).
Applsci 13 03471 g005
Figure 6. The relationship between the number of traversed paths and the times of swapping cells.
Figure 6. The relationship between the number of traversed paths and the times of swapping cells.
Applsci 13 03471 g006
Figure 7. An example of the new way to update the cell weights.
Figure 7. An example of the new way to update the cell weights.
Applsci 13 03471 g007
Figure 8. The relationship between the number of traversed paths and the times of swapping cells when using the new way.
Figure 8. The relationship between the number of traversed paths and the times of swapping cells when using the new way.
Applsci 13 03471 g008
Figure 9. Static power consumption of the 28 nm circuit using CBLPRP, CAPCOM, and TPSPOM procedures with dual-threshold voltage.
Figure 9. Static power consumption of the 28 nm circuit using CBLPRP, CAPCOM, and TPSPOM procedures with dual-threshold voltage.
Applsci 13 03471 g009
Figure 10. Percentage of LVT cells of the 28 nm circuit using CBLPRP, CAPCOM, and TPSPOM procedures with dual-threshold voltage.
Figure 10. Percentage of LVT cells of the 28 nm circuit using CBLPRP, CAPCOM, and TPSPOM procedures with dual-threshold voltage.
Applsci 13 03471 g010
Figure 11. Static power consumption of the 28 nm circuit using CAPCOM, and TPSPOM procedures with triple-threshold voltage.
Figure 11. Static power consumption of the 28 nm circuit using CAPCOM, and TPSPOM procedures with triple-threshold voltage.
Applsci 13 03471 g011
Figure 12. Percentage of SVT/LVT cell of the 28 nm circuit CAPCOM, and TPSPOM procedures with triple-threshold voltage.
Figure 12. Percentage of SVT/LVT cell of the 28 nm circuit CAPCOM, and TPSPOM procedures with triple-threshold voltage.
Applsci 13 03471 g012
Table 1. Static power consumption, the number of low-threshold voltage (LVT) combinational cells and LVT sequential cells of the 28 nm circuit using CBLPRP, CAPCOM, and TPSPOM procedures with dual-threshold voltage.
Table 1. Static power consumption, the number of low-threshold voltage (LVT) combinational cells and LVT sequential cells of the 28 nm circuit using CBLPRP, CAPCOM, and TPSPOM procedures with dual-threshold voltage.
PeriodCBLPRPCAPCOMTPSPOM
PowerLVTLVTPowerLVTLVTPowerLVTLVT
(ns)( mW )C-CellsS-Cells( mW )C-CellsS-Cells( mW )C-CellsS-Cells
1.24.3813108,35421824.381865,11210,2842.622047,3263804
1.15.590612874943945.499885,96612,9233.5008645585545
1.06.9498146,03780016.9088109,38516,0354.8008883157858
Table 2. Static power consumption, the number of LVT/standard-threshold voltage (SVT) combinational cells and LVT/SVT sequential cells of the 28 nm circuit using CAPCOM, and TPSPOM procedures with triple-threshold voltage.
Table 2. Static power consumption, the number of LVT/standard-threshold voltage (SVT) combinational cells and LVT/SVT sequential cells of the 28 nm circuit using CAPCOM, and TPSPOM procedures with triple-threshold voltage.
PeriodCAPCOMTPSPOM
PowerSVTSVTLVTLVTPowerSVTSVTLVTLVT
(ns)( mW )C-CellsS-CellsC-CellsS-Cells( mW )C-CellsS-CellsC-CellsS-Cells
1.23.07423,650519041,46250941.328547,14140279848860
1.13.893330,022608055,94468431.558671,128633512,5881175
1.04.929836,291731173,09487241.9289105,021921617,4591617
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, P.; Zhu, S.; Xi, W.; Xu, C.; Zheng, D.; Huang, K. Triple-Threshold Path-Based Static Power-Optimization Methodology (TPSPOM) for Designing SOC Applications Using 28 nm MTCMOS Technology. Appl. Sci. 2023, 13, 3471. https://doi.org/10.3390/app13063471

AMA Style

Li P, Zhu S, Xi W, Xu C, Zheng D, Huang K. Triple-Threshold Path-Based Static Power-Optimization Methodology (TPSPOM) for Designing SOC Applications Using 28 nm MTCMOS Technology. Applied Sciences. 2023; 13(6):3471. https://doi.org/10.3390/app13063471

Chicago/Turabian Style

Li, Peng, Shite Zhu, Wei Xi, Changbao Xu, Dandan Zheng, and Kai Huang. 2023. "Triple-Threshold Path-Based Static Power-Optimization Methodology (TPSPOM) for Designing SOC Applications Using 28 nm MTCMOS Technology" Applied Sciences 13, no. 6: 3471. https://doi.org/10.3390/app13063471

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop