# Design of Low Power 12.5Gb/s 10:1 Multiplexer 

Shenglong Zhuo, Jun Feng<br>Institute of RF \& OE-ICs, Southeast University, Nanjing, China<br>e-mail: peterzhuo2006@gmail.com


#### Abstract

Design of a $10: 1$ multiplexer in $0.18 \mu \mathrm{~m}$ CMOS technology is presented in this paper. This circuit can be used in $12.5 \mathrm{~Gb} / \mathrm{s}$ SerDes transmitter. The speed of traditional $10: 1$ is difficult to enhance due to the serial $5: 1$ part. In this paper, $5 B / 4 B$ converter is adopted to convert the 10 channel data into 8 channel data and use tree type topology to increase the speed and save the power consumption in the data path. At the same time, phase switching divider is used to decrease the power consumption in the clock path. The core area occupies $800 \mu \mathrm{~m} * 500 \mu \mathrm{~m}$. The post simulation results show that the multiplexer can operate correctly at $12.5 \mathrm{~Gb} / \mathrm{s}$, the power consumption is less than 125 mW , the single-ended output peak-to-peak value is 200 mV and the output jitter is less than 0.1UI.


## Key words—SerDes; Multiplexer

## I. INTRODUCTION

With the rapidly growing demand for high speed SerDes, multiplexer has been the bottleneck of speed. The speed of multiplexer tremendously affects the performance of the whole SerDes transceiver.

Several multiplexers have been reported in technologies such as SiGe , GaAs and InP at speeds of $10 \mathrm{~Gb} / \mathrm{s}$ or higher [1]-[3]. However, these process are costly and difficult to integrate with other CMOS blocks in SerDes systems. Multiplexers based on CMOS process have been presented at $10 \mathrm{~Gb} / \mathrm{s}$ data rate [4]-[5]. All of these multiplexers have $2^{\mathrm{N}}$ input channels and can be implemented by traditional treetype topology but cannot be applied directly into 8B/10B SerDes transmitter. A 20:1 multiplexer based on CMOS process has been reported in [6], a traditional structure has been used and the data rate is nearly $6 \mathrm{~Gb} / \mathrm{s}$ which cannot be applied in a $10 \mathrm{~Gb} / \mathrm{s}$ data rate SerDes.

A $10: 1$ multiplexer based on $0.18 \mu \mathrm{~m}$ CMOS process is presented in this paper and it can be directly integrated with $8 B / 10 B$ encoder. By means of utilizing a modified system structure and some circuit techniques, the multiplexer can achieve a data rate of $12.5 \mathrm{~Gb} / \mathrm{s}$. Section II shows system architecture. The circuit implementation is discussed in section III. The layout and simulation results are presented in section IV and section V respectively. Section VI draws the conclusion.

## II. SYSTEM DESIGN OF 10:1 MULTIPLEXER

The basic multiplexer typically has three kinds of structures: serial-type, parallel-type and tree-type. These three types of structure have their own advantages and
disadvantages. Proper choice should be made based on the specification of different applications. When the number of input data channels is more than three, only the combination of the basic types can achieve optimal performance.

The traditional 10:1 multiplexer is shown in figure 1. It is composed of high speed $2: 1$ part and low speed $5: 1$ part. In this work, the output data rate of high speed part is $12.5 \mathrm{~Gb} / \mathrm{s}$ and that of low speed part is $6.25 \mathrm{~Gb} / \mathrm{s}$.


Figure 1. The entire structure of traditional 10:1 multiplexer
Half rate structure ${ }^{[6]}$ is often used in the high speed 2:1 part, in which five latches and one selector operate at 6.25 GHz clock. In $0.18 \mu \mathrm{~m}$ CMOS process, this part should be implemented by CML circuit. Serial-type structure is often used in the low speed $5: 1$ part because the number of input channels is not equal to $2^{\mathrm{N}}$. Serial-type structure needs full-rate clock and there are ten latches and one 5 divider operating at 6.25 GHz clock frequency, which should be implemented by CML circuit. And this cause a huge power consumption.

Therefore, this work introduce a $5 \mathrm{~B} / 4 \mathrm{~B}$ converter to convert the five parallel data into four parallel data and then tree-type structure can be used in the low speed part. The modified low speed 5:1 multiplexer is shown in figure 2 . The input five parallel $1.25 \mathrm{~Gb} / \mathrm{s}$ data is converted into four parallel $1.5625 \mathrm{~Gb} / \mathrm{s}$ data and then multiplexed into one serial $6.25 \mathrm{~Gb} / \mathrm{s}$ data by the $4: 1$ multiplexer.


Figure 2. Modified low speed 5:1 multiplexer
The $4: 1$ multiplexer shown in figure 2 has $2^{\mathrm{N}}$ channels and can be implemented by tree-type structure shown in
figure 3. The input four parallel $1.5625 \mathrm{~Gb} / \mathrm{s}$ data is first multiplexed into two parallel $3.125 \mathrm{~Gb} / \mathrm{s}$ data by two $2: 1$ multiplexers and then multiplexed into one serial $6.25 \mathrm{~Gb} / \mathrm{s}$ data by one $2: 1$ multiplexer.


Figure 3. $4: 1$ tree-type multiplexer
As shown in figure 2 and 3, the clock frequency of the $5 \mathrm{~B} / 4 \mathrm{~B}$ converter is less than 1.5625 GHz . In the $4: 1$ tree-type multiplexer, the clock frequency of two $2: 1$ multiplexers is 1.5625 GHz and that of the other one is 3.125 GHz . The clock frequency is much lower compared to traditional structure and can be implemented by CMOS logic circuit, which can reduce the power consumption vastly.

As illustrated in figure 2, the low speed 5:1 multiplexer need clocks of three different frequencies containing $1.25 \mathrm{GHz}, 1.5625 \mathrm{GHz}$ and 3.125 GHz . And the duty cycle of them is $1: 1$. As shown in figure 1 , the input clock frequency is 6.25 GHz . Therefore, in the clock path, the $1 / 2,1 / 4$ and $1 / 5$ frequency signal of the input clock should be produced. The $1 / 2$ and $1 / 4$ frequency signal can be obtained at the same time by a four divider which is composed of two DFFs and one of them work at input clock frequency. A five divider composed of three DFFs is often used to get $1 / 5$ frequency signal and these three DFFs all operate at input frequency. As a result, there will be four DFFs working at 6.25 GHz which should be implemented by CML circuit and can cause large power consumption. Therefore, in this work, the clock path is redesigned based on the principle of non-balance phase switching divider ${ }^{[7]}$.

The modified clock path is shown in figure 4. The input clock frequency is 6.25 GHz .3 .125 GHz clock is produced by a 2 divider and another 2 divider generate four 1.5625 GHz signal. The phase difference between them is $90^{\circ}$ and the duty cycle of them is $1: 3$. These four signals are used by the phase switching circuit to produce the 1.25 GHz clock and two of them are utilized to produce a 1.5625 GHz clock, of which the duty cycle is $1: 1$. As a result, the clock path get clocks of three kinds of frequency as in the previous design but only one DFF work at the input clock frequency and should be implemented by CML circuit. The other circuit in the clock path can be implemented by CMOS logic circuit and the power consumption can be greatly reduced.


Figure 4. Modified clock path
The modified entire structure of the multiplexer is shown in figure 5. The whole clock path form a 5 divider, the 6.25 GHz input clock control the high speed $2: 1$ multiplexer implemented by CML circuit. The 3.125 GHz clock produced by the 2 divider control the high speed part of the $4: 1$ multiplexer and the 1.5625 GHz clock produced by another 2 divider control the low speed part of the $4: 1$ multiplexer. The 1.25 GHz clock produced by the whole 5 divider control the $5 \mathrm{~B} / 4 \mathrm{~B}$ converter.


Figure 5. Modified whole system architecture

## III. CIRCUIT DESIGN OF 10:1 MULTIPLEXER

## A. high speed half rate $2: 1$ multiplexer

The high speed half rate $2: 1$ multiplexer is implemented by CML circuit, the architecture is shown in figure 6 . It is composed of five latches and one selector.


Figure 6. The structure of high speed half rate $2: 1$ multiplexer
In this work, shunt peaking ${ }^{[8]}$ is used in the selector in order to increase the speed. A center tap inductor is used in
this work instead of two separate inductor to save the area. The selector is shown in figure 7.


Figure 7. CML selector

## B. $5 B / 4 B$ converter

The principle of $5 \mathrm{~B} / 4 \mathrm{~B}$ converter is shown in Fig. 8(a). The four switches controlled by $\varphi 0, \varphi 1, \varphi 2, \varphi 3$ put the input data into the corresponding latch orderly and the five switches controlled by $\Psi 0, ~ \Psi 1, ~ \Psi 2, ~ \Psi 3, ~ \Psi 4$ put the data into the output latch in sequence.


Figure 8. $5 B / 4 B$ (a)structure

(b)timing

The pulse of $\varphi 0$ and $\Psi 0$ should be separated properly in order to separate the write and read time of the data sufficiently. The block timing is shown in Fig. 8(b). The shortest time between write and read of data is 640 ps .

## C. Phase switching divider

The principle of phase switching divider is shown in Fig. 9. The frequency of the four input signals are the same, the duty cycle of them is $1: 3$ and the phase difference between them is $90^{\circ}$. The best switching moment is C and the completing time should be earlier than E. Therefore, the switching circuit has half of the input period time to finish switching.


Figure 9. Phase switching timing
The phase switching divider designed in this work is shown in figure 10. When the output of selector 1 is $\varphi 1$, the output of selector 2 is $\varphi 3$, and the state machine will change its state at the rising edge of VO 2 , which is the C moment in Fig. 9.


Figure 10. The phase switching circuit architecture

## IV. SIMULATED RESULT

Fig. 11 shows the layout of $10: 1$ multiplexer. The chip has a size of $1175 \mu \mathrm{~m} \times 975 \mu \mathrm{~m} \approx 1.15 \mathrm{~mm}^{2}$ and the core area is about $800 \mu \mathrm{~m} \times 500 \mu \mathrm{~m}=0.4 \mathrm{~mm}^{2}$.


Figure 11. 10:1 multiplexer layout
In post simulation, ten pseudo-random input signals (PRBS) are provided to the multiplexer. The input signal voltage swing is 1.8 V (from VDD to GND). The input and output logic data, simulated by Cadence software, is shown in Fig. 12. The multiplexer can operate correctly and the power consumption is 125 mW when the output data rate is $12.5 \mathrm{~Gb} / \mathrm{s}$.


Figure 12. The input and output data logic of 10:1 multiplexer
The $12.5 \mathrm{~Gb} / \mathrm{s}$ output eye diagram is shown in Fig. 13. The single-ended output peak-to-peak value is 200 mV with jitter less than 0.1UI.


Figure 13. Simulated eye diagram for $12.5 \mathrm{~Gb} / \mathrm{s} 10: 1$ multiplexer

## V. CONCLUSION

A $10: 1$ multiplexer is successfully designed in $0.18 \mu \mathrm{~m}$ CMOS process. The simulation proves that it operates well at $12.5 \mathrm{~Gb} / \mathrm{s}$ under normal temperature with about 125 mW power consumption.

TABLE I. SURRAMY OF THE 10:1 MULTIPLEXER

| Function | $10: 1$ multiplexer |
| :---: | :---: |
| Input data rate | $1.25 \mathrm{~Gb} / \mathrm{s}$ |
| Output data rate | $12.5 \mathrm{~Gb} / \mathrm{s}$ |
| Supply voltage | 1.8 V |
| Power consumption | 125 mW |
| Core area | $800 \mu \mathrm{~m} \times 500 \mu \mathrm{~m}=0.4 \mathrm{~mm}^{2}$ |
| Technology | $0.18 \mu \mathrm{~m} \mathrm{CMOS} \mathrm{f}_{\mathrm{T}}=49 \mathrm{GHz}$ |

## References

[1] M. Meghelli, "A 132-Gb/s 4:1 multiplexer in $0.13 \mu \mathrm{~m}$ SiGe-bipolar technology," IEEE J. Solid-State Circuits, vol. 39, no. 12, pp. 24032407,Dec. 2004
[2] S. Tanaka, and H. Hida, " $120-\mathrm{Gb} / \mathrm{s}$ multiplexing and $110-\mathrm{Gb} / \mathrm{s}$ demultiplexing ICs," IEEE J. Solid-State Circuits, vol. 39, no. 12, pp. 2397-2402, Dec. 2004.
[3] Joakim Hallin, Torgil Kjellberg, and Thomas Swahn, "A $165-\mathrm{Gb} / \mathrm{s}$ 4:1 multiplexer in InP DHBT technology," IEEE J. Solid-State Circuits, vol. 41, no. 10, pp. 2209-2214, Oct. 2006.
[4] K. Kanda, D. Yamazaki, T. Yamamoto, M. Horinaka, J. Ogawa, H. Tamura, and H. Onodera, " $40 \mathrm{~Gb} / \mathrm{s}$ 4:1 MUX/1:4 DEMUX in 90 nm standard CMOS technology," in IEEE ISSC Tech. Dig, Feb. 2005, pp.152-153.
[5] D. Kehrer, H. D. Wohlmuth, H. Knapp, and A. L. Scholtz, "A 15 $\mathrm{Gb} / \mathrm{s} 4: 1$ parallel-to-serial data multiplexer in 120 nm CMOS," in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC), Firenze, Italy, Sept. 2002, pp.227-230.
[6] Kao, M.S., Jen, C.H., Hsu, Y.H., Yang, P.L., Chiu, C.T., Wu, J.M., Hsu, S.H. Hsu, Y.S. "A 3.2 Gbit/s CML Transmitter With 20:1 Multiplexer In 0.18 CMOS Technology," in Mixed Design of Integrated Circuits and System, Gdynia, POLAND, June 2006, pp.179-183.
[7] X. P. Yu, M. A. Do, J. G. Ma, K. S. Yeo, R. Wu and G. Q. Yan, "Low power high speed CMOS dual-modulus prescaler design with imbalanced phase switching technique, " IEE Proc. Circuits, Devices \& Systems, Vol. 152, no. 2, pp. 127-132. Apr. 2005.
[8] Thomas H.Lee. The Design of CMOS Radio-Frequency Integrated Circuits, Second Edition, 2005.5

