# TMR-Based Soft Error Tolerance Techniques in ASIC Design

He Xin<sup>1, a</sup>, Huang Xu<sup>2,b</sup> and Li Yujing<sup>3,c</sup>

<sup>1</sup>Sichuan Institute of Solid State Circuits, Chongqing, P.R. China <sup>2</sup>Sichuan Institute of Solid State Circuits, Chongqing, P.R. China <sup>3</sup>Sichuan Institute of Solid State Circuits, Chongqing, P.R. China <sup>a</sup>letter1988@163.com, <sup>b</sup>hxtt103@163.com, <sup>c</sup>martinlyj@163.com

#### Keywords: TMR; Soft Error; SEU; ASIC.

**Abstract.** With the evolution of process and innovation of design technology, it makes soft errors on the reliability of the integrated circuit to bring more and more serious threat. Space radiation environment caused by high-energy particles Single Event Upset (SEU) event seriously affected the reliability of the integrated circuit. On the basis of the process level and layout-level and other complex soft error tolerance techniques, a redundant structure can achieve shielding effect of Single Event Upset. Triple modular redundant structure is easy to implement, also makes the anti-radiation performance of the entire circuit has been greatly improved.

#### Introduction

Digital integrated circuits have been widely used in military applications, radiation and nuclear explosions in outer space and other harsh environment has a significant impact on the integrated circuit, the integrated circuit led to a serious decline in performance or even loss of function. In the radiation environment, a single high-energy particle passes through the sensitive region of the device, is deposited charges on its trajectory, these charges are collected by the device collector, and then causing upset, latch, voltage and current mistakes of information stored. Since these errors are transient and reversible, that is also known Soft Error[1]. With the evolution of process and innovation of design technology, reduction logic gate size, reducing the supply voltage, the increase in circuit scale and the frequency of the circuit, so that soft errors on the reliability of the integrated circuit to bring more serious threat, thereby reliability of the circuit put forward higher requirements, making soft error tolerance harden techniques has become an important issue.

## **Soft Error Tolerance**

From the perspective of design methodologies, soft error tolerance techniques in a digital integrated circuit design can implement in each stage of design: the process level, the layout-level, the circuit level and the gate level.

The current digital integrated circuits carried out based on the manufacturing process, while the capacity of a soft error tolerance of a normal manufacturing process is relatively poor. So we need to adopt a special manufacturing process, the foremost technical is silicon on insulator (SOI) technology. On SOI process, a single crystal silicon thin film is epitaxially grown on the insulator, and then forming a p-channel and n-channel devices with different masking and doping techniques. Since the device insulating can be achieved in an integrated circuit, it can completely eliminate the parasitic latch effects of bulk silicon CMOS latch circuit, but its manufacturing cost is much higher than the CMOS process.

To improve reliability, the manufacturing process can also be a three-well process, in addition to the p-well adds a deep n-well. Deep n-well can effectively isolate substrate noise, thereby increasing the performance of NMOS transistors, can effectively protect the storage unit without interference from substrate noise and  $\alpha$  particles.

In the layout level the circuit can also implement soft error tolerance. Ionizing radiation not only causes degradation of the gate dielectric, and the also have the same damage to field media, mainly in

the presence of a large number of oxide positive charge accumulation in field media, the field parasitic channel is formed, the increasing of the leakage current leads to failure of field area. Since the drain of annular gate device is surrounded by gate, it can completely eliminate the radiation parasitic leakage current of a MOS device field edge, but due to the large chip area occupied by the gate ring, for small and medium-scale digital integrated circuits using a ring gate, and to large scale integrated circuits using a strip gate. An example of layout soft error tolerance with a ring gate is shown as Fig.1.



Fig.1 (a) Standard cell layout (b) Cell layout harden with a ring gate

The circuit level soft error tolerance technique is the use of a novel circuit configuration to prevent the occurrence of various type transient faults of circuit. The circuit level harden mainly includes the combinational logic single event transient (SET) glitch filtering technology, as well as flip-flops, latches, and SRAM memory cell SEU harden technology[2]. The circuit level soft error tolerance cells above-mentioned need to be full-custom design, process design complexity, long implementation cycle.

The gate level soft error tolerance techniques is introducing space or time redundancy to achieve the single event upset (SEU) faults shield or single event transient (SET) fault filter.

## Triple modular redundancy (TMR)

For integrated circuit design, SEU is a complex and difficult problem. Triple modular redundancy is very widely used means of circuit design fault tolerance, it is often used in a number of FPGA devices fault tolerance design[3]. The core of triple modular redundancy is to use redundant architecture to shield the fault impact on the entire circuit. First, it is a brief introduce of triple modular redundancy structure.

**TMR theory.** The basic idea of triple modular redundancy is to generate two same function modules for the module to be harden, the output of the three modules to get the final result by the voting machine, so that even if there is a module failure , output circuit can still function properly, it greatly improve circuit reliability.

**TMR structure.** For sequential cells components, copy the sequential cell need to be harden to three cells, with the same signal, the same clock frequency and phase, which is mainly due to the processing logic errors caused by single event upset[4].

The voting of triple modular redundancy circuit is generally the majority of the voting device, assuming that the three inputs of the vote are the A, B, C, the output is Q, the truth table and the logic view of the voting is shown as Fig.2(a)(b), and the output equation is Eq.1.

$$Q = AB + AC + BC.$$
(1)



Fig.2 (a) Truth table of voting (b)Logic view of the voting

Sequential cell TMR structure is shown in Fig.3(a). Considering the particularity of the clock signal in the circuit, if the clock signal occurs on the single particle flip, the glitch generated pulse resulted in reversal of all the sequential units, then three sequential units will output wrong values, then vote also lost the role, resulting in the failure of the circuit. In order to prevent the occurrence of such events, as shown in Fig.3(b), the clock of D flip-flops can be divided into three, three clocks to do the clock tree respectively, and ensure that the skew clock between the three clock tree is small enough.



Fig.3 (a)Sequential cell TMR structure (b)Sequential cell space-time TMR structure For combinational logic unit, the method is simple because it has no clock input. The combinational logic cell is copied to three cells, the input signal is also copied to three parts, and then connecting to three identical cell respectively, Fig.4 for the combinational logic cell TMR structure.



Fig.4 Combinationl logic cell TMR structure

**TMR method**. This paper presents a triple modular redundancy design based on integrated circuit for gate-level netlist. Firstly, to do the first synthesis of the design code, and get the initial netlist. Then to do the triple modular redundant design replacement of the initial netlist, to obtain an intermediate netlist, and the run the automated synthesis by the tool, then get a new netlist with triple modular redundancy. Equivalence check is to determine the function of the netlist before triple modular redundancy design and the netlist with triple modular redundancy design are the same, after equivalence checking, that will give a correct triple modular redundancy netlist.

#### **TMR-Based Design and Analysis**

Independent research project SoC chip, based on ARM CPU, that can be used to control applications, complete the SPI bus control, CAN bus access, data processing and other functions. Considering the performance of circuit and anti-radiation indicators, the circuit using 0.18 µm CMOS harden process, internal SRAMs are full custom harden design. In addition to hardening the design process and library

units, for requirements for SEULET, this circuit uses the above-described triple modular redundancy design architecture.

The total number of circuit pads is 168, power pin 58, 8 pairs of the core power and ground, 21 pairs of the power ground for PAD, the other for I / O ports, chip area is 6.75mm×3.55mm. The chip layout is shown as Fig.5.



Fig.5 The chip layout

In Table 1, the area of the design with overall triple modular redundancy increase of 426% of the original area, the critical timing path grew 60%, SEULET indicators of the TMR design circuit has been greatly improved.

| Table 1 Result of design | without TMR and with TMR |
|--------------------------|--------------------------|
|--------------------------|--------------------------|

|                     | Without TMR                                       | With TMR                          |
|---------------------|---------------------------------------------------|-----------------------------------|
| Gate Count          | 10254                                             | 43685                             |
| Ciritical Path Time | 3.5ns                                             | 5.6ns                             |
| Anti-γ total dose   | ≥100 K Rad(Si)                                    | ≥100 K Rad(Si)                    |
| SELLET              | ≥75 MeV·cm²/mg                                    | ≥75 MeV·cm <sup>2</sup> /mg       |
| SEULET              | $\geq 15 \text{ MeV} \cdot \text{cm}^2/\text{mg}$ | $\geq$ 37 MeV·cm <sup>2</sup> /mg |

Triple modular redundancy method is more reliable, implementation is relatively simple, but requires three times the size, costing power consumption, and increasing the delay, this cost is too large, and therefore in constrained designs may be only a part of the most important modules or subsystems do triple modular redundancy process.

## Conclusions

With the continuous advancement of aerospace, it puts forward higher requirements in the integrated circuit design. Based on using the process level and layout level soft error tolerance techniques, this circuit used overall triple modular redundancy architecture. At the expense of area and timing, the circuit indicators meet the requirements, and also improve the soft error tolerance performance of the entire circuit.

## References

- [1] M.P.Baze et al. A Digital CMOS Design Technique for SEU Hardening. IEEE Transactions on Nuclear Science. (2000).
- [2] T.Colin, M.Nuclides, R.Delanco. Upset Hardened Memory Design for Submicron CMOS Technology, IEEE Trans. Nuclear Science. (1996).
- [3] ROLLINS N, WIRTH LIN M, CAFFREY M, et al. Evaluating TMR techniques in the presence of single event upsets. Proc. of Conf. on Military and Aerospace Programmable Logic Devices. Washington, DC.(2003).
- [4] W.Chen, R.Gong, et al. Two New Space-Time Triple Modular Redundancy Techniques for Improving Fault Tolerance of Computer Systems. In: Proceedings of The Sixth IEEE International Conference on Computer and Information Technology. (2006).