# Clock Network Power Saving Using Multi-Bit Flip-Flops in Multiple Voltage Island Design 

Jhen-Hong He ${ }^{1}$, Li-Wei Huang ${ }^{2}$, Jui-Hung Hung ${ }^{3}$, Yu-Cheng Lin ${ }^{4}$, Guo-syuan Liou ${ }^{5}$, Tsai-Ming Hsieh ${ }^{6}$<br>${ }^{1}$ Department of Information and Computer Engineering, Chung Yuan Christian University Chong-Li, Taiwan, R.O.C.<br>${ }^{2}$ Faraday Technology Inc., Taiwan, R.O.C.<br>${ }^{3}$ Department of Electronic Engineering, Chung Yuan Christian University, Chung-Li, Taiwan, R.O.C.<br>${ }^{4,5}$ Department of Multimedia and E-Commerce, Kainan University Luzhu Township, Taiwan, R.O.C.<br>${ }^{1,3,6}\left\{\mathrm{~g} 9977032\right.$, g9777036, hsieh\}@cycu.edu.tw, ${ }^{2}$ willie_h@faraday-tech.com, ${ }^{4,5}$ \{linyu, m10008006\}@mail.knu.edu.tw


#### Abstract

Power consumption is an important issue in modern high-frequency and low power design. Multi-bit flip-flops are used to reduce the clock system power. The scaling with multiple supply voltage is an effective way to minimize the dynamic power consumption. In this paper, we propose an effective multi-bit flip-flops merging approach to deal with the clock network power minimization problem and an placement method to avoid placing flip-flops in the congestion bins. Moreover, the proposed approach can be applied to both single and multiple supply voltage designs. Experimental results show that our approach can reduced the clock power up to $25 \%$. In addition, for multiple supply voltage designs, the proposed approach can reduce the number of level shifters significantly.


Keywords-Clock network, Multi-bit flip-flop, Multiple supply voltage, Level shifter, Low power

## I. INTRODUCTION

Due to the rapid growth of the chip density and the increasing of clock frequency in the modern high performance designs, power consumption is an important issue in chip manufacturing. A large portion of total power dissipation in synchronous systems is due to the operation of flip-flops in clock network [1, 2, 3, 4]. In conventional synchronous designs, all one-bit flip-flops are considered as independent components. In the recent years, as the process technology advances, feature size of IC is shrank, the minimum size of clock drivers can trigger more than one flip-flop. As a result, merging 1-bit flip-flops into one multi-bit flip-flop by sharing the inverters in the flip-flops can reduce the total clock dynamic power consumption, and the total area contributed by flip-flops [5], as shown in Figure 1.


[^0]Figure 1. The four single-bit flip-flops are merged into a multi-bit flip-flop.

On the other hand, multiple supply voltage (MSV) design is also an effective way to minimize the power consumption. It partitions a chip into regions called voltage islands [6]. The voltage islands can be operated at different voltage levels, or be turned off when it is idle [7], so that much power can be reduced. In order to meet the design specification, level shifters need to be inserted between cells with different supply voltage to ensure that the circuit works properly. However, too many level shifters are inserted in the circuit will lead to high design cost in terms of power and chip area [8].

In this paper, we present an effective and very efficient approach to solve two problems. The first one is to solve the problem with single supply voltage. The second one is to deal with the problem with multiple supply voltages. In addition, the number of inserted level shifters is also minimized to reduce the area overhead.
The remainder of this paper is organized as follow. Section II is problem descriptions. Section III is our algorithms. Then, Section IV reports our experimental results and Section V is our conclusions.

## II. PROBLEM DESCRIPTIONS

In a given design, there is a set of $m$ 1-bit flip-flops, $F=$ $\left\{F F_{i} \mid 1 \leq i \leq m\right\}$. The 1-bit flip-flops are distributed within the chip which is divided into $s^{\times} t$ placement bins, $B_{g h}, l \leq g \leq s, l \leq h \leq t$. The 1-bit flip-flop $i, F F_{i}$, has one input pin and one output pin which are located at $\left(x_{1 i}, y_{1 i}\right)$ and $\left(x_{2 i}, y_{2 i}\right)$ respectively.
To ensure the correctness of the wire delays after we merged flip-flops, two sets of timing slack (in terms of routing length ), $S_{I N}=\left\{s_{11}, s_{12}, \ldots, s_{1 m}\right\}$ and $S_{\text {OUT }}=\left\{s_{21}\right.$, $\left.s_{22}, \ldots, s_{2 m}\right\}$ are given to restrict the location of merged flip-flops. We call the above constraint as timing slack constraint. On the other hand, the available area in each bin is different, such that we need to determine the position of the merged multi-bit flip-flops. We call the available area constraint as the placement density constraint.
In this work, we don't consider the shape of blocks, flip-flops, and pins. We assume that their bottom left
corners are viewed as a point in the chip. Then the blocks, flip-flops and pins must be placed on grid point. Each grid can only contain one block, or one flip-flop or one pin.


Figure 2. An example of chip layout.
The main objective of the clock network power saving problem is to find the proper locations of multi-bit flip-flops, so that the total power reduction is maximized and the timing slack constraint and placement density constraint are satisfied.
For the single supply voltage system, the total power consumption is the sum of all flip-flops power consumption in the final design. For multiple supply voltage system, the total power consumption is the sum of the power consumption of all flip-flops and level shifters. Moreover, the proper locations of multi-bit flip-flops depend on not only the timing slack constraint and the placement density constraint but also the distribution of voltage islands. Figure 2 shows an example of the chip layout of this problem.

## A. Timing Slack Constraint

All new merged flip-flops must be allocated in a feasible region(FR) which timing slack constraints are satisfied. We define the slack value of each net as routing wirelength. Figure 3(a) illustrates feasible regions of 1-bit flip-flops. Flip-flop $F F_{i}$, is located at $\left(x_{i}, y_{i}\right)$. The variable $d_{l i}$ and $d_{2 i}$ denote the manhattan distance of input pin $\left(x_{I i}, y_{l i}\right)$ and output pin $\left(x_{2 i}, y_{2 i}\right)$ connected to $F F_{i}$ respectively. Hence, the maximum feasible routing length of each input and output pin connects to $F F_{i}, L_{1 i}$ and $L_{2 i}$, can be calculate as follow:

$$
\begin{align*}
L_{1 i} & =d_{1 i}+s_{1 i}  \tag{1}\\
L_{2 i} & =d_{2 i}+s_{2 i} \tag{2}
\end{align*}
$$

The feasible region of $F F_{i}$ is then defined as below:

$$
\begin{equation*}
F R_{i}=\left\{(x, y)| | x-x_{l i}\left|+\left|y-y_{l i}\right| \leq L_{1 i},\left|x-x_{2 i}\right|+\left|y-y_{2 i}\right| \leq L_{2 i}\right\} .\right. \tag{3}
\end{equation*}
$$

In other words, one new multi-bits flip-flop used to replace $F F_{i}$ must be placed in feasible region $F R_{i}$. Assume we want to merge two 1-bit flip-flops, $F F_{i}$ and $F F_{j}$, to a new 2-bit flip-flop, $M F F_{k}$. According to the timing slack constraints, the new merged 2-bit flip-flop $M F F_{k}$ must be allocated at the intersection of $F R_{i}$ and $F R_{j}$, as shown in Figure 3(b).

(a)

(b)
Figure 3. An example of feasible region.

## B. Placement Density Constraint

The placement density $D_{g h}$ of bin $B_{g h}$ is defined as the sum of total flip-flops area and the total blocks area in bin $B_{g h}$. Suppose that total available area for bin $B_{g h}$ is $A_{g h}$, which is less than or equal to the area of bin $B_{g h}$. When placing a multi-bit flip-flop in bin $B_{g h}$, the placement density constraint must be satisfied,

$$
\begin{equation*}
D_{g h} \leq A_{g h} . \tag{4}
\end{equation*}
$$

## C. Multiple Supply Voltage

Suppose that the distribution of voltage islands are given and we have calculated all the timing slack of input and output pins of 1-bit flip-flops in low voltage islands and high voltage islands. We want to minimize the total number of the inserted level shifters so that the power consumption caused by level shifters can be minimized. As an example, Figure 4 shows the distribution of two pins of flip-flops among voltage islands. A level shifter must be inserted between $\operatorname{pin}_{1}$ and $\mathrm{FF}_{\mathrm{A}}$, because the signal is delivered form low level voltage to high level voltage. Between $\mathrm{pin}_{5}$ and $\mathrm{FF}_{\mathrm{C}}$ should not be inserted the level shifter, because the signal delivers from high level voltage to low level voltage.


Figure 4. An example of level shifters insertion.


Figure 5. Flow Chart.

## III. ALGORITHM

To solve the clock network power reduction problem, three phase algorithm is proposed as shown in Figure 5.
In the phase 1, we find the feasible region of each 1-bit flip-flop and construct a feasible region graph. In phase 2, proper flip-flops are selected to be merged as a group. In the final phase, the multi-bit flip-flop will be placed in the appropriate location according to constraints

## A. Determining the Feasible Regions

The input and output signals of each 1-bit flip-flop have their own timing slacks, so that the available region of a merged multi-bit flip-flop is restricted. The feasible region graph $G(V, E)$ for a given distribution of $m$ 1-bit flip-flops in a chip is a constraint graph $G(V, E)$, where the node $v_{i}$ in $V$ is the feasible region $F R_{i}, l \leq i \leq m$, and there is an edge $e_{i, j}$ between $v_{i}$ and $v_{j}, l \leq i<j \leq m$ if and only if $F R_{i}$ $\cap F R_{j} \neq \phi$. For every node $v_{i}$ a weight pair of nonnegative numbers $\left(\alpha_{i}, \beta_{i}\right)$ is assigned, where $\alpha_{i}$ denotes the degree of node $v_{i}$ and $\beta_{i}$ denotes the number of bins which intersects with $F R_{i}$.
An example is shown in Figure 6(a), the gray region $F R_{2}$ of $F F_{2}$, it intersects with two other feasible regions $F R_{l}$ and $F R_{3}$, and it intersects with 7 bins, therefore, a pair $(2,7)$ is assigned to node $v_{2}=F R_{2}$ as shown in Figure 6(b).

(a)

(b)

Figure 6. Construction of feasible region graph.

## B. Essential Prime Cover

If a feasible region $F R_{i}$ can only be merged with another feasible region $F R_{j}$ due to the timing slack constraint, that is, the first component of the assigned weight pair $\alpha_{i}$ equals to 1, we call the group $\left\{F R_{i}, F R_{j}\right\}$ Essential Prime Cover (EPC). The EPC will be the first principle during the progress of flip-flop grouping.

According the design library, the total power consumption of $r$ 1-bit flip-flops is always higher than the power consumption of a $r$-bit flip-flop. The objective of finding EPCs is to minimize the number of 1-bit flip-flops in merging process.

In the Figure 7(a), $F R_{2}$ and $F R_{8}$ are selected to be merged as a group, because the group consisting of $F R_{2}$ and $F R_{8}$ is an EPC obviously. If flip-flop 8 merges with other flip-flips, the flip-flop 2 will always be a single-bit flip-flop. So, we have to merge flip-flop 2 and flip-flop 8 into a multi-bit flip-flop as shown in following Figure 7(b).

## C. Flip-flops Selection and Grouping

After Section III.B, we deal with the situation of EPC in our feasible region graph. If there is no EPC in the benchmarks, our merging criterion will be modified. The objective of our selection method is making the flip-flops which have minimum selectivity merge with other flip-flops first, and the more single-bit flip-flops we merge, the more power saving we achieve through our algorithm. The criterion of flip-flops selection in our algorithm is shown in Figure 8. In line 4, we sort the vertexes according to the degree of each vertex, and then record the minimum degree of the vertex. In line 7 to line 12, we will select a vertex with minimum intersection value in while loop if those vertexes have the same degree.

(c)

(e)

Figure 7. Flip-flops selection and grouping.

```
Lines Algorithm : flip-flops selection
    Input : G(V,E) /* feasible region graph */
    Output : a vertex with highest merging priority
    Flip-flops_Selection0 {
        Sort_vertexes(); /* increase order by \beta*/
        j=1;
        i=2; \quad/* 1<j\leqm*
        while ( }\mp@subsup{\beta}{i}{}=\mp@subsup{\beta}{j}{\prime}&&i\leqm)
        if (\alpha, 的)
            j=i;
        /* end while
        return j;
    /* end Flip_flop_Selection( */
```

Figure 8. Our algorithm of flip-flops selection.
When a flip-flop should be merged, we have to update the merging group of feasible regions. The sequence of adding a $F R$ into merging group can be determined according to the same criterion of flip-flops selection rule. Furthermore, before adding a $F R$ to a merging group which is located in bin, $B_{g h}$, there are two constraints should be satisfied: (1) $F R$ should connect to the $F R s$ in the merging group. (2) The merging constraint must be satisfied, that is,

$$
\begin{equation*}
D_{g h}+A_{M B F F} \leq A_{g h} \tag{4}
\end{equation*}
$$

The variable of $A_{\text {MBFF }}$ denotes the group corresponding to the area of the multi-bit flip-flop. The merging constraint can ensure that our program does not violate the placement density constraint.
In Figure 7(c), there is no EPC, so we select $F R_{7}$ which has minimum degree. Then we progress the grouping step. The $F R_{5}$ and $F R_{6}$ connect to $F R_{7}$, but $F R_{5}$ has smaller degree than $F R_{6}$. Therefore, the two feasible regions, $F R_{7}$ and $F R_{5}$, are merged into a group as the following Figure 7(d). The remaining flip-flops use the same selection criterion for merging and grouping. The merging result of feasible region graph is shown in the following Figure 7(e) which contains three groups.

## D. Flip-flops Placement

We take the merged groups as new multi-bit flip-flops and allocate their proper location. In single supply voltage system, we consider timing slack constraint and placement density constraint. In multiple supply voltage system, we consider not only the two constraints above but also the voltage island design.

In order to speed up the program execution and reduce the total wirelength, the center of mass of all input and output pins is calculated. If the bin covers the center of mass and satisfies the merging constraint, then the new merged multi-bit flip-flop will be allocated at the center of mass. If the congested bin covers the center of mass, then we will select the uncrowded bin that is included by the $F R$. Although it might increase few wirelength, the placed location can avoid the congestion bin. In MSV system, if the feasible region of the new flip-flop passes through both
low and high voltage island, we will choose the location which is in low voltage island to reduce the usage of level shifters. An example of a flip-flop placement is shown in Figure 9.



Figure 10.
The distribution of voltage
islands in experiments.

## IV. EXPERIMENTAL RESULTS

We implemented our algorithm in $\mathrm{C} / \mathrm{C}++$ on a 3.0 GHz Intel(R) Pentium (R) D machine with 3GB memory under Ubuntu OS 9.10 operation system. The benchmark consists of five circuits, t 1 , $\mathrm{t} 2, \mathrm{t} 3, \mathrm{t} 4$ and t 5 from Faraday Company [10]. These testcases t6 and t7 are extended by duplicating the circuit t5 625 and 900 times respectively, so the number of all flip-flops rises to 75000 and 108000 respectively.
The information of the seven circuits is described in Table 2. In Table 2, the 2 nd , 3 rd and 4 th columns are represented the chip size, bin size and grid size respectively. The 5th column (DC) is the placement density constraint. The 6 nd column is the number of flip-flops. The 9 th column is the number of pins. The 10th column is the number of blocks. In MSV system, the number of flip-flops in high voltage island and low voltage island is presented in 7 rd column, and the 8 h column. The information of the library table is shown in Table 1. In Table 1, the variable of bit and $P$ denote the bit number of flip-flop type and power consumption respectively.
For single voltage supply design, the experimental results are shown in Table 3. In testcase t1, we randomly select 6000 and 600 flip-flops as testcase t1A and t2B. In Table 3, the 2 nd column denotes the original power consumption, and the remainder columns can be divided into two parts. The first part denotes the experimental results of our algorithm without considering EPC. The second part is our algorithm with EPC consideration. The first part can reduce the total power consumption by averagely $23.85 \%$. However, in the second part, if we consider EPC, the power consumption could decline $25.48 \%$ by averagely and only required more $1 \%$ runtime. In the experimental results of $\mathrm{t} 1, \mathrm{t} 1 \mathrm{~A}$ and t 1 B , we find out that the fewer flip-flops in the testcase, the more power consumption we can save. Obviously, EPC is an effective approach to minimize the total clock network power consumption.
The experimental result of multiple supply voltage system is shown in Table 4 which includes two parts. The left part is our algorithm doesn't consider the number of
level shifters when placing merged flip-flops. And the right part is our algorithm considers the number of level shifters. In the Table 4, we assume that each level shifter consumes 5 units of power, and the power consumption of flip-flops in low level supply voltage is 0.7 times than flip-flops in high level supply voltage. Figure 10 is the distribution of voltage islands in this experiment, and our algorithm can work regularly in more complex distribution of voltage islands. The experimental results show our algorithm considering the number of level shifters is improved the number of level shifters $31.76 \%$ by averagely than without considering level shifters. Moreover, in MSV system, we can reduce the power consumption $26.16 \%$ in average as well.

## V. CONCLUSIONS

We propose an effective multi-bit flip-flops merging approach to deal with the clock network power minimization problems and an efficient placement method to avoid placing flip-flops in the congestion bins. Moreover, we can effectively reduce the clock power consumption by using the concept of EPC both single and multiple supply voltage designs. The experimental results show we can reduce the clock power $25.48 \%$ on the average in single supply voltage. In multiple supply voltage system, we can reduce the clock power up to $26.16 \%$, and the number of level shifters is decreased $31.76 \%$ on the average.

Table 1. Library table.

| Testcase |  | FF1 | FF2 | FF4 | FFL4 | FF6 | FF8 | FF13 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| t 1 | bit | 1 | 2 | 4 | - | - | 8 | - |
|  | $P$ | 100 | 172 | 312 | - | - | 560 | - |
|  | area | 100 | 192 | 385 | - | - | 725 | - |
| t 2 | bit | 1 | 2 | 4 | - | 6 | - | 13 |
|  | $P$ | 100 | 172 | 312 | - | 450 | - | 900 |
|  | area | 100 | 192 | 385 | - | 550 | - | 1205 |
| t3 | bit | 1 | 2 | 4 | 4 | - | 8 | - |


|  | $P$ | 100 | 172 | 312 | 299 | - | 560 | - |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | area | 1000 | 1920 | 3850 | 3980 | - | 7250 | - |
| t 4 | bit | 1 | 2 | 4 | - | - | - | - |
|  | $P$ | 100 | 172 | 312 | - | - | - | - |
|  | area | 100 | 192 | 385 | - | - | - | - |

## REFERENCES

[1] R. Y. Chen, N. Vijaykrishnan, M. J. Irwin, "Clock power issues in system-on-a-chip designs," In Proceedings of the IEEE Computer Society Workshop on VLSI'99, pp.48-53, 1999.
[2] M. Hansson, A. Alvandpour, "A low clock load conditional flip-flop," In Proceedings of the IEEE International SOC Conference, pp.169-170, 2004.
[3] T. Sakurai, H. Kawaguchi, T. Kuroda, " Low-power CMOS design through VTH control and low-swing circuits," In Proceedings of the 1997 International Symposium on Low Power Electronics and Design, pp.1-6, 1997.
[4] T.Sakurai, T.Kuroda, " Low-power circuit design for multimedia CMOS VLSI's," In Proceedings of the Workshop on Synthesis And System Integration of Mixed Information Technologies, 1996.
[5] D. Duarte, V. Narayanan, M. J. Irwin, "Impact of technology scaling in the clock system power". In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2002.
[6] D.E. Lackey, P.S. Zuchowski, T.R. Bednar, D.W. Stout, S.W. Gould, J.M. Cohn, " Managing power and performance for system-on-chip designs using voltage islands," In Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, pp.52-57, 2002.
[7] Q. Ma, E.F.Y. Young, " Voltage island-driven floorplanning," In Proceedings of the 2007 IEEE/ACM International Conference on Computer-aided design, pp.644-649, 2007.
[8] Q.A. Khan, S.K. Wadhwa, K. Misri, " A single supply level shifter for multi-voltage systems," In Proceedings of 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design, 2006.
[9] H.A Chien, C.C Lin, H.H Huang, T.M Hsieh, " Optimal supply voltage assignment under timing power and area constraints," IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences, pp.761-768, 2010.
[10] Computer Aided Design Contest of Integrated Circuit. (2010). Taiwan http://140.112.42.200/cad10

Table 2. Benchmark specifications.

| Testcase | Chip size(10 $\left.\mathbf{}^{\mathbf{3}}\right)$ | Bin size | Grid size | DC(10 ${ }^{\mathbf{3}} \mathbf{)}$ | No. FF | H. FF | L. FF | No. Pin | No. Block |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| t1 | $10 \times 15$ | $100 \times 50$ | $5 \times 5$ | 200 | 60000 | 30111 | 29889 | 120000 | 100000 |
| t2 | $10 \times 10$ | $100 \times 50$ | $5 \times 5$ | 70 | 5524 | 2061 | 3463 | 11048 | 142010 |
| t3 | $3 \times 8$ | $100 \times 50$ | $5 \times 5$ | 70 | 953 | 351 | 602 | 1906 | 15000 |
| t4 | $3 \times 3$ | $500 \times 500$ | $5 \times 5$ | 19 | 120 | 45 | 75 | 240 | 200 |
| t5 | $3 \times 3$ | $500 \times 500$ | $5 \times 5$ | 25 | 120 | 52 | 68 | 240 | 200 |
| t6 | $1875 \times 1875$ | $500 \times 500$ | $5 \times 5$ | 25 | 75000 | 37414 | 37586 | 150000 | 125000 |
| t7 | $2700 \times 2700$ | $500 \times 500$ | $5 \times 5$ | 25 | 108000 | 54010 | 53990 | 216000 | 180000 |

Table 3. Experimental Results of single supply voltage.

| Testcase | Original P | Our approach without considering EPC |  |  |  | Our approach with considering EPC |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | P | $\boldsymbol{P}$ ratio | D | Time(s) | P | $P$ ratio | D | Time(s) |
| t1 | 6000000 | 4205268 | 70.09\% | 48923 | 208.5 | 4200000 | 70.00\% | 48198 | 212.04 |
| t1A | 600000 | 424768 | 70.79\% | 48198 | 2.67 | 420000 | 70.00\% | 48198 | 2.7 |
| t1B | 60000 | 45972 | 76.62\% | 48198 | 0.36 | 44940 | 74.90\% | 48198 | 0.35 |
| t2 | 552400 | 416216 | 75.35\% | 60108 | 1.49 | 405546 | 73.42\% | 60108 | 1.49 |
| t3 | 95300 | 68232 | 71.60\% | 63413 | 0.16 | 66778 | 70.07\% | 63413 | 0.17 |
| t4 | 12000 | 9576 | 79.80\% | 17386 | 0.01 | 9392 | 78.27\% | 17342 | 0.01 |
| t5 | 12000 | 9664 | 80.53\% | 23469 | 0.01 | 9360 | 78.00\% | 23184 | 0.01 |
| t6 | 7500000 | 6021612 | 80.29\% | 23754 | 115.97 | 5851724 | 78.02\% | 23754 | 118.56 |
| t7 | 10800000 | 8671912 | 80.30\% | 23754 | 241.91 | 8426104 | 78.02\% | 23469 | 241.59 |
| Avg. | - | - | 76.15\% | - | 1 | - | 74.52\% | - | 1.01 |
| $P$ : the power consumption. |  |  | $D$ : the maximum density of bins. |  |  | $P$ ratio $=P /$ original $P^{*} 100 \%$. |  |  |  |

Table 4. Experimental Results of single supply voltage.

| $\begin{gathered} \text { Testcas } \\ \mathbf{e} \\ \hline \end{gathered}$ | $\begin{gathered} \text { Original } \\ P \end{gathered}$ | Without reducing No. LS |  |  |  | With reducing No.LS |  |  |  | Imp. No. LS ratio |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  |  | P | P ratio | Time(s) | No. LS_1 | P | P ratio | Time(s) | No. LS_2 |  |
| t1 | 5103330 | 3578074 | 70.11\% | 212.33 | 4289 | 3539890 | 69.36\% | 212.02 | 2921 | 31.90\% |
| t2 | 448510 | 329205 | 73.40\% | 1.49 | 163 | 328110 | 73.16\% | 1.5 | 129 | 20.86\% |
| t3 | 77240 | 53606.6 | 69.40\% | 0.18 | 71 | 52710.6 | 68.24\% | 0.17 | 43 | 39.44\% |
| t4 | 9750 | 7590 | 77.85\% | 0.001 | 14 | 7289.2 | 74.76\% | 0.001 | 4 | 71.43\% |
| t5 | 9960 | 7613.6 | 76.44\% | 0.001 | 16 | 7516 | 75.46\% | 0.001 | 14 | 12.50\% |
| t6 | 6372420 | 4971992 | 78.02\% | 118.97 | 625 | 4966584 | 77.94\% | 118.98 | 427 | 31.68\% |
| t7 | 9180300 | 7163627 | 78.03\% | 242.32 | 606 | 7159697 | 77.99\% | 242.94 | 518 | 14.52\% |
| Avg. | - | - | 74.75\% | - | - | - | 73.84\% | - | - | 31.76\% |
| No. LS_1, No. LS_2 : the number of level shifters we inserted. Imp. No. LS ratio = (1-No. LS_2 / No. LS_1 ) * 100\% |  |  |  |  |  |  |  |  |  |  |


[^0]:    *This work was partially supported by NSC of Taiwan under Grant No NSC 101-2221-E-033-074 and NSC 101-2221-E-424-008.

