# An Optimized Sample Rate Converter for a software radio receiver on FPGA

Dr.B.Lakshmi

Department of Electronics and Communication Engineering, National Institute of Technology, Warangal-506004

Ashok Agarwal

Department of Electronics and Communication Engineering, National Institute of Technology, Warangal-506004 ashok701143@nitw.ac.in

#### Abstract

Sample Rate Conversion (SRC) and channelization for a high intermediate frequency signal are the computational intensive tasks to be implemented for the design of a radio receiver. In a multi-standard radio receiver the phenomenon of band pass sampling over samples the required channel with high over sampling ratio (OSR). In this paper we present an optimized cascaded integrator comb filter implementation for SRC of various wireless standards by the method of factorization. Comparison of the hardware resource utilization of this architecture with non-optimized CIC filter reported more than twenty percent reduction in hardware.

Keywords: IF signal, OSR, SRC, FPGAs, CIC filter.

#### 1. Introduction

The realization of a software defined radio communication system provides a single device solution irrespective of the radio communication standard, geographic location etc. An ideal SDR comes into existence if analog to digital converter (ADC) is followed by the antenna immediately. An ideal SDR is infeasible due to two practical limitations on realization of ADCs.

- The requirement of an ADC which can digitize a signal of relatively high bandwidth of the order of few Giga Hertz at Radio frequency stage.
- High sampling rate requirements leading to high power dissipation.

The above limitations make the ideal SDR to be implemented in three stages. First, a Radio Frequency Stage consisting of an analog RF oscillator, RF Mixers, Low Noise Amplifiers, Band Pass filters etc and converts an RF signal to a high Intermediate Frequency (IF) signal and vice-versa for receiver and transmitter respectively. Secondly Intermediate Frequency Stage consisting of a Numerically Controlled Oscillator, quadrature mixers, filters etc and performs channelization and Digital Up conversion/Digital down conversion in Tx/Rx chain. Finally a Baseband Processing Stage which performs the tasks of Modulation and Demodulation, encoding and decoding etc based on the standard.

The scope of this work lies in the Intermediate Frequency stage. The process of sample rate conversion requires two DDCs to generate the inphase and quadrature signal of the baseband. Conventional Digital Signal Processor may not meet the computational complexities of the processor.

Four high performance DSP processors clocked at a rate of 600MHz with 50 percent spare capacity are needed to down convert an IF of 70MHz which is practically infeasible solution [2].

A prototype DDC architecture suitable for down conversion of an IF signal at 80MHz to baseband signal of multi-mode, multi-standard radio is proposed. The pipelined architecture is simulated with XC6VCX240t-2ff484 VIRTEX-6 device, operating at a maximum operating frequency of greater than 200MHz. We make an attempt to optimize the CORDIC architecture and filter structure to utilize less number of resources on the FPGA device. We have made a comparison of the hardware resource utilization on FPGA between non-optimized architecture and optimized architecture.

This paper is organized as follows. DDC, CORDIC algorithm, digital filters, Specifications of various wireless standards are discussed in section II. Section III presents the calculations for realization of multi-mode, multi-standard software radio and the implemented architecture for DDC. Section IV presents the results and comparison of utilization of hardware resources for the two architectures on Virtex-6 XC6VCX240t-2FF484 FPGA. Section V concludes the paper.

# 2. Theory

## 2.1 Digital Down Converter

Figure 1 shows the basic architecture of a digital down converter. Numerically Controlled Oscillators (NCO),



Figure 1 Block diagram of a Digital Down converter

Mixers, decimator and filters are the building blocks. NCOs in quadrature can be employed to generate waveforms and mixers to multiply the incoming IF signal with NCO outputs to produce the sum and difference components i.e., a high Frequency component and a low baseband frequency component respectively. Frequency translation in frequency domain is given by equation (1) and (2). The difference component is filtered out and then decimated.

$$\sin \omega_c t \, x(t) \leftrightarrow j0.5 [X(\omega - \omega_c) + X(\omega + \omega_c)] \cos \omega_c t \, x(t) \leftrightarrow 0.5 [X(\omega - \omega_c) + X(\omega + \omega_c)], \tag{1}$$

where x(t) is the input signal,  $X(\omega)$  is its Fourier transform and  $\omega_c$  is the carrier frequency of the signal.

## 2.2 CORDIC algorithm

COrdinate Rotation DIgital Computer was developed by J.E. Volder in 1959 for computation of trigonometric functions. CORDIC algorithm employs implementation of Given's Rotation transform for Circular Coordinate system using only shift and add operations [3]. Generalized CORDIC algorithm for circular, linear and hyperbolic coordinate system has been proposed by J.S.Walther in 1971 which facilitates the computation of other mathematical functions multiplication, division, exponential, logarithmic and so on [4]. Due to its structural regularity it is well suited for VLSI implementation. A detailed study of different CORDIC architectures proposed in the literature has been made by B. Lakshmi et al. Speed, power, throughput and area are the constraints governing the choice of the architecture to be employed for a particular application. An architecture optimized for area may have a low throughput where as a high throughput architecture may occupy more area [5].

Different number systems like radix-2, radix-4 can also be employed for its implementation. Radix-2 CORDIC architecture has a constant gain where as the gain of the CORDIC module in other radices is not constant [6]. Scale factor compensation unit has to be implemented explicitly through computation. As area is our optimization goal we choose non-redundant radix-2 architecture. The iteration equations of radix-2 CORDIC algorithm for vector rotation of coordinates in Cartesian coordinate system are given as

$$x_{i+1} = x_i - \sigma_i y_i 2^{-i}$$

$$y_{i+1} = y_i + \sigma_i y_i 2^{-i}$$

$$z_{i+1} = z_i - \sigma_i \tan^{-1}(2^{-i})$$
(2)

where  $\sigma_i$  represents the direction of rotation in each iteration and  $\tan^{-1}2^{-i}$  is an elementary rotation angle. As the CORDIC rotation results in a constant gain (K), the magnitude of the vector increases. The norm of the vector is preserved by scaling the final coordinates by  $K^{-1}$ , the CORDIC scale factor

$$\mathsf{K}^{-1} = \prod_{i=1}^n \sqrt{1 + \sigma_i^2 2^{-2i}} \tag{3}$$

For a CORDIC architecture implemented in radix-2 number system, scale factor compensation can be achieved through constant canonic signed digit multipliers. In the design of a receiver for communication systems this parameter can be taken care in the automatic gain control unit of the receiver.

### 2.3 Digital Filters

Sampling rate decimation requires digital filters to sup-press the aliasing components. A decimator always follows a low pass filter. Implementation of digital filters employs multiplication and accumulation as its fundamental operations. Hence a high speed multiplier which operates at twice the frequency of the IF signals must be employed before decimation. To avoid multiplications a special class of multipliers less digital FIR filters called Cascaded Integrated Comb Filters are proposed by Hogenaeur [8]. The process of sample rate conversion is highly aided by CIC filters due to their multiplier less architectures. An Nth order CIC filter designed for decimation has N integrators at a high sampling rate and N differentiators operating at a decimated sampling rate. The transfer function of CIC filter is given by equation 7. In principle CIC filters are simple in design, but integrators become highly instable due to bit growth. A large pass band droop is observed in the frequency response of these filters. To restore the signal strength in the pass band a droop compensation filter has to be designed.

$$H(z) = \left\{ \frac{1 - z^{-RM}}{1 - z^{-1}} \right\}^{N}$$
 (4)

where R = Decimation Factor, M = Differential delay (1 or 2), N = No of Integrators/Combs

Table 1 Specifications for multi-standard software radio

| Radio Standard                | WiMAX 802.16 | WCDMA | CDMA 2000 | GSM 900    |
|-------------------------------|--------------|-------|-----------|------------|
| Intermediate Frequency(MHz)   | 80           | 80    | 80        | 80         |
| Sampling Rate(MSps)           | 160          | 160   | 160       | 160        |
| Single Channel Bandwidth(MHz) | 20           | 5     | 1.25      | 0.2        |
| Required Sampling Rate(MSps)  | 40           | 10    | 2.5       | 0.4        |
| Over Sampling Ratio           | 4            | 16    | 64        | 400        |
| Target Data Rate(Mbps)        | 10           | 3.84  | 1.2288    | 0.270      |
| Sample Rate Conversion Ratio  | 1/8          | 6/125 | 48/3125   | 677/200000 |

### 2.4 Specifications

The intermediate frequency of signal for Mobile communications is taken as 80MHz. This signal is digitized with high speed analog to digital converters and has to be passed through a DDC. DDC performs the required frequency translation to produce a baseband signal. Table I shows the specifications for multi-mode multi-standard radio communication systems.

#### 3. Architecture

In order to minimize the hardware a numerically con-trolled oscillator capable of generating quadrature wave-forms is realized using CORDIC architecture. CORDIC improves the signal to noise ratio due to elimination of phase to phase mapping and phase distortions. As stated by J. Valls et al CORDIC based DDCs exhibit good Spurious Free Dynamic Range (SFDR) in comparison to the conventional LUT based approach [7]. We have implemented a non-redundant radix-2 CORDIC architecture using carry look a-head adders and a series of optimized CIC filters to utilize the hardware resources efficiently.

#### 3.1. Numerically Controlled Oscillator

The structure of a numerically controlled oscillator consists of two blocks, a phase accumulator and phase to amplitude (sine/cosine) generator. Phase increment provided to the phase accumulator which is also termed as frequency control word  $f_{\rm cw}$ , tunes the frequency of the NCO to the required frequency. In each clock cycle the phase accumulator increments itself by that value until it overflows and wraps around. Thus the frequency of NCO is given by

$$f_c = F_{clk} * \frac{f_{cw}}{2^{M-1}} \tag{5}$$

where,  $F_{clk}$  = Frequency of the clock,  $f_c$  = Required local oscillator frequency and  $f_{cw}$  =Frequency control word.

Digital to Analog converter followed by a Low Pass Filter produces analog wave along with removal of unwanted aliasing components. The required throughput for NCO depends on the intermediate frequency signal of the radio which is 80MHz requiring a throughput of 160MSPs. A non-redundant radix-2 CORDIC architecture implemented has a throughput of 355MSps on virtex-6 device which is adequate for our design. Significance of non-redundant CORDIC architecture is it reduces the hardware by almost 50 percent as well as in the subsequent stages of system design. Latency of this architecture is higher than the redundant architecture. In

the design of NCO the throughput of the architecture capable of producing sinusoids at required maximum frequency is of more concern than the latency. We have configured the frequency control word such that the numerically controlled oscillator frequency is tuned to 80MHz.

### 3.2. Digital Down Converter block

As stated earlier, Equation (1) describes the rotation of the vectors by an angle  $\theta$ . The CORDIC module is configured in the circular rotation mode with inputs being  $x_0 = x_{if} \cos(\omega_{if} n)$ ,  $y_0 = 0$ ,  $z_0 = \omega_c n$  as shown in figure. The output of the CORDIC module is given by the equation (6), generating the in phase and quadrature phase mixer outputs.

$$x_{out} = x_{if} \cos(\omega_{if} n) \cos(\omega_{c} n)$$

$$y_{out} = x_{if} \cos(\omega_{if} n) \sin(\omega_{c} n)$$
(6)

According to the trigonometric identities, the output of the CORDIC module has two frequency components viz,  $\omega_{if} + \omega_{c}$  and  $\omega_{if} - \omega_{c}$ . we have chosen  $\omega_{if} = \omega_{m} + \omega_{c}$ ,  $\omega_{c} = 80 MHz$  to generate a difference component of  $\omega_{m}$  with 160MS/s, which has to be decimated by a decimation factor as shown in table I for Nyquist sampling rate.

### 3.3 Filtering and Decimation

The process of decimation employs a low pass filter along with a decimator. As the incoming signal is at a very high sampling rate, high speed multipliers are needed to implement the MAC operations in low pass filters. An alternate is employing a multiplier-less filter architecture at high sampling rate and conventional MAC operations at lower sampling rate. Multiplier less CIC filter architecture is the best choice to down convert a signal when high decimation rate is required and a narrow band signal has to be extracted from a wide band signal [8].

We have implemented a pipelined architecture for CIC filters in four stages using three integrators and three differentiators. Table II shows the decimation factors as calculated by the factorization method stated by Sheikh et al [1]. As the CIC filter is implemented in four stages apart from the first stage CIC filter, all subsequent CIC filters in the filter chain are clocked at a frequency of  $f_{\rm clk}/4$ , hence power dissipation is also reduced to a great extent.



Figure 2 Implemented Architecture of Stage 1 CIC filter



Figure 3 Architecture of Multi-stage CIC filter

Table 2
Decimation Factor of CIC Filters for multi-standard Radio

| Standard | OSR | CIC 1 | CIC 2 | CIC 3 | CIC 4 |
|----------|-----|-------|-------|-------|-------|
| WiMAX    | 4   | 4     | 1     | 1     | 1     |
| WCDMA    | 16  | 4     | 4     | 1     | 1     |
| CDMA2000 | 64  | 4     | 4     | 4     | 1     |
| GSM      | 400 | 4     | 4     | 4     | 6     |

We have optimized the architecture of CIC by reducing the number of bits within a particular CIC stage and from one stage to another by normalizing the gain of filter to unity. The total bit growth of the filter is given by equation (8).

$$B_{Total} = N * \log_2 R + B_{in} \tag{7}$$

where  $B_{Total}$ : the total number of bits required, N: Order of the CIC filter, R: Required Decimation rate,  $B_{in}$ : Input width of the data.

The bit growth requirements for implementation of CIC filter is directly proportional to N. A CIC filter implementation with R=384, N=12 and 16-bit precision requires 124 bits at its input leading to very high hardware resources. On the other hand the same filter when implemented in four stages with N=3 require 43 bits at its input [8].

To further optimize the hardware resources the gain of the CIC filter is normalized to unity. The gain of the filter in each stage is given as

$$G_{\text{CIC}} = (R_i)^{N_i} \tag{8}$$

To minimize the hardware resources within the stage itself, the output at each integrator is arithmetic right shifted by  $log_2(R_i)$  bits normalizing the gain of the filter to unity. Figure 3 shows the gain normalized in the stage 1 which can be extended to the other stages of the CIC filter. Equation 9 shows the number of bits reduced and the number of adders optimized per CIC stage respectively.

$$\begin{split} B_{i,opt} &= N_i \ * log_2(R_i) \\ A_{i,opt} &= log_2(R_i) + 2* log_2(R_i) + ..... N_i * [N_i * log_2(R_i)] \end{split} \tag{9}$$

where  $B_{i,opt}$  Number of bits optimized in each integrator stage,  $A_{i,opt}$ : Number of adders optimized in  $i^{th}$  stage,  $R_i$ : Decimation factor in  $i^{th}$  stage,  $N_i$ : Order of  $i^{th}$  stage CIC filter.

A filter architecture when implemented using p stages with filter order  $N=\sum_{i=1}^{p} N_i$  and the required decimation factor  $R=\prod_{i=1}^{p} R_i$  the total number of adders that can be optimized is given by equation (10).

$$A_{opt} = \sum_{i=1}^{P} \log_2 R_i \left[ \frac{1}{2} (3N_i^2 - N_i) \right] + (2N_i) \sum_{j=0}^{i-1} N_j \log_2 R_j$$
(10)

where  $A_{opt}$ : Number of adders that can be optimized, P: Number of CIC stages

### 4. Results

The simulation setup for testing prototype DDC shown in figure 5 uses three CORDIC based digital synthesizers/mixers to generate message signal of different frequencies, Amplitude modulated wave with a carrier frequency of 80MHz and demodulates with CORDIC mixer respectively. CIC filters as stated in section 3.3 are implemented to extract the inphase and quadrature component of the baseband signal at a lower sampling rate. The design is implemented on VIRTEX-6 XC6VCX240t-2ff484 FPGA and the simulation results are shown in figures [5-6]. Table III shows comparison of device utilization summary for Digital Down Conversion of the inphase and the quadrature channel for a multi-channel radio with and without optimization of number of bits in the stages of the CIC filter implementation. From the table it is clear that hardware resources have been reduced significantly in the optimized architecture as compared with the non-optimized architecture.

We have configured the frequency control word of the first CORDIC module such that the sinusoidal signals of frequencies 20KHZ, 39KHz, 60KHz, 80KHz are generated. The second CORDIC cell generates an amplitude modulated wave with a carrier frequency of 80MHz and the message signal being the sinusoidal signal of the first module. Finally the third CORDIC module is used as a demodulator which generates the sum and the difference component having frequencies in the range of twice the carrier frequency and message signal frequency respectively. This demodulated signal is passed through CIC filter for sampling rate decimation.

Figure 6 shows that mixer output with  $F_m$ = 39KHz and  $F_c$ =80MHz when decimated by a factor of 384 for GSM900 specifications, the number of samples contained in message signal is found to be approximately 10 which was originally 4096. The simulation results were found to be satisfactory for CDMA2000, WCDMA and WiMAX standard with decimation factors of 64, 16 and 4 respectively.

Table 3 Hardware Resource Utilization on FPGA XC6VCX240T-2FF484

| Hardware Resources        | Optimized | Non-<br>Optimized |
|---------------------------|-----------|-------------------|
| No. of Slice Registers    | 3337      | 3969              |
| No. of Slice LUTs         | 5898      | 8406              |
| No. of Slice LUT-FF Pairs | 2486      | 2636              |
| Frequency(MHz)            | 355       | 323               |

### 5. Conclusions

We have designed and simulated an optimized



Figure 4 Simulation Setup for CORDIC based DDC



Figure 5 Simulation result of GSM 900 down converted output Decimation factor=384



Figure 6 Simulation result of WiMAX802.16 down converted output Decimation factor=4

architecture for sample rate conversion of a multi-standard radio on Virtex-6 XC6VCX240t-2FF484 FPGA based on the method of factorization. The use of CORDIC architecture facilitates the implementation of numerically controlled oscillator and quadrature mixer producing the in-phase and quadrature component without employing explicit phase shifter and multipliers, thus reducing hardware resources. We have also made a comparison of the hardware resource utilization of multi-standard radio employing optimum number of adders in the CIC filter structure with that of non-optimized CIC structure. The comparison shows a significant reduction of about twenty percent in the hardware resources for the optimized architecture. Further a programmable interpolation and sample rate conversion filter has to be implemented on FPGA.

## References

- Faheem Sheikh, Shahid Masud, "Sample rate conversion filter design for multi-standard software radios", *Elsevier*, *Digital Signal Processing*, vol.20, pp.3-12, 2009.
- 2. Paul Burns "Software Defined Radio for 3G", Chapter 2, 6, Artech House, *Mobile Communication Series*, 2003.

- 3. J. E. Volder, "The CORDIC trigonometric computing technique, "*IRE Trans. Electronic Computers*, vol. 8, no. 3, pp. 330{334, Sept. 1959.
- 4. J.S.Walther, "A unified algorithm for elementary functions," in *AFIPS Spring Joint Computer Conference*, vol.38, pp. 379-85, 1971.
- B. Lakshmi, A.S. Dhar, "CORDIC Architectures: A Survey", *Journal of VLSI Design*, vol.2010, article ID 794891, 19 pages
- 6. B. Lakshmi, A.S. Dhar, "VLSI architecture for low latency radix-4 CORDIC", *Journal of Computers and Electrical Engineering*, vol.37, pp.1032-1042, 2011.
- J. Valls, T. Sansaloni, Perez-Pascual.A, Torres. V, Almenar. V
  "The use of CORDIC in Software Defined Radios: A tutorial",
  IEEE communications magazine, vol.44, 2006, pages 46-50.
- 8. E.B. Hogenauer, "An economic class of Digital Filters for Decimation and Interpolation", *IEEE transactions on Acoustic Speech and Signal processing*, vol.2, 1981.
- Jacques C. Rudell, Jeffrey A. Weldon, Jia-Jiunn Ou, Li Lin, and Paul Gray "An integrated GSM/DECT Receiver: Design specifications," UCB Electronics Research Laboratory Memorandum, Memo: UCB/ERL M97/82, 1998.