A Memristor-Crossbar/CMOS Integrated Network for Pattern Classification and Recognition

L. Zhang  
Institute of Microelectronics  
Tsinghua University  
Beijing 100084, China

Z.J. Chang  
Medical School  
Tsinghua University  
Beijing 100084, China

Abstract—A novel circuit model based on a trainable memristor-crossbar network integrated with a CMOS circuit for pattern classification and recognition is proposed and analyzed in this paper. The configurable memristors along each column wires of the crossbar are trained by a standard pattern input from the row wires of the crossbar to represent the pattern. After the training, the crossbar network can classify unknown patterns input from the row wires, and the output current from each column wire will be normalized by the CMOS circuits to denote the probability to classify the unknown patterns with respect to the standard pattern associated with the column wire. The probabilities can be further processed by a winner-take-all competition circuit for decision making. The circuit simulation results demonstrate that the proposed circuit based on our experimentally demonstrated memristor devices can classify patterns by calculating the probabilities and recognize patterns with distortions. Moreover, the circuit delay for classifying a pattern remains below 1 μs even when the pattern scales up to large dimensions. The large-scale parallel signal processing by the memristor-crossbar/CMOS circuit enables it to classify and recognize patterns with high dimensionality and complexity at a much faster speed than the software-based computers.

Keywords—pattern classification; recognition; memristor; crossbar; CMOS analog integrated circuits; probability

I. INTRODUCTION

It is challenging for machines to recognize large dimensional complex patterns in speech, handwritten characters, medical images, bioinformatics, and stocks, etc. Although human brains may classify and recognize those patterns in a seemingly effortless fashion, it is immensely difficult for a computer to achieve the same tasks. The software-based computing time for the pattern classification increases exponentially with the pattern dimensions (so-called the curse of dimensionality), which makes the classification of large dimensional patterns extremely time-consuming and even prohibitive for a modern computer and hardware [1]. The complementary metal-oxide-semiconductor (CMOS) based analog integrated circuits have also been designed and fabricated for pattern classification purpose [2-4], but the circuits needed to recognize large dimensional patterns are complex, expensive, and energy-consuming.

In this paper, we propose an electronic circuit for large dimensional patterns classification and recognition based on the experimentally demonstrated memristors [5], configurable resistors [6], and crossbar networks [7]. The conductance of the memristors at the cross point of the crossbar circuit can be modified to an arbitrary analog value proportional to the amplitude of the input signal from the pattern during the training process, which is similar to the memory function of a synapse in a neural network. The crossbar network of the memristor is a simple network architecture that can be easily expanded to large-scale to process the signals from a pattern with large dimensional inputs in parallel efficiently, and the device density can potentially reach > 10^11/cm^2 in the crossbar circuits fabricated by nanoscale lithography [7]. The memristor-crossbar array is integrated with a Si-based CMOS circuit, and the output of the circuit gives the probability distribution to classify the input pattern with respect to the memorized patterns. The electronic pattern classification is a high-speed parallel process and can recognize patterns efficiently with high dimensionality and complexity. The low-cost fabrication process for the high-density nanoscale circuits and modern CMOS technology also make the proposed circuit model practically viable and cost-efficient.

II. CIRCUIT ARCHITECTURE

The pattern recognition circuit aims to classify patterns based on the knowledge of the patterns established statistically from supervised or unsupervised training processes, and the circuit classifies the pattern based on the probability belonging to each class. An arbitrary pattern can be abstracted as a vector \( \vec{y} = (y_1, y_2, \cdots, y_N) \), where \( y_i \) represents statistically independent variables of a pattern at \( i \)th dimension with \( i = 1, 2, \cdots, N \). The pattern can also be represented by a normalized probability \( p_i = y_i / \sum_{i=1}^{N} y_i \) at the \( i \)th dimension. Assuming that the patterns can be classified into \( M \) different classes denoted by \( j \) with \( j = 1, 2, \cdots, M \), based on the law of total probability we can have,

\[
p_i \sum_j p_i^j = \sum_j p_i^j p_i^j = \sum_j p_i^j p_i^j \quad (1)
\]

where \( p_i^j \) is the probability for the pattern with the probability function \( p_i \) to be classified to class \( j \), sometimes referred as the likelihood function. Eqn. (1) can be modified to an alternative format by multiplying \( p_i^k \) on both sides and take the summation with respect to \( i \),

\[
\sum_i p_i p_i^k = \sum_j p_i^j p_i^k = \sum_j p_i^j p_i^k \quad (2)
\]
The goal of the circuit is to classify an input pattern by obtaining $p^j$ based on eqn. (2). The latency for solving the equation increases exponentially with the dimensions of the patterns in the software-based methodologies, therefore it is necessary to design an electronic circuit that can solve the problem efficiently.

The proposed pattern classification circuit is shown in fig. 1. The circuit can be divided into two parts: (1) signal inference circuit composed of a crossbar memristor network and CMOS operational amplifiers (opamp), and (2) a winner-take-all (WTA) decision-making competition circuit.

In the inference stage, an input pattern can be represented by a vector $\vec{V}_i = (V_{i1}, V_{i2}, \cdots, V_{iN})$, where $V_{ij}$ denotes the voltage input to the $i$th row wire in the memristor-crossbar circuit (shown in fig. 1) with $i = 1, 2, \cdots, N$. $V_{ij}$ is applied on a memristor with a conductance of $g_{ij}$ on each row wire. A memristor with a configurable conductance $g_{ij}$ at each cross point of the crossbar connects the $i$th row wire and the $j$th column wire. The voltage on the $j$th column wires is denoted by $V_{j}^p$ with $j = 1, 2, \cdots, M$. $V_{j}^p$ is applied on a resistor with a conductance of $g_{ij}^o$ on each column wire and also to the negative input terminal of the opamp $A_{1j}$. The structure of opamp is illustrated in fig. 2(a). It uses the complementary input differential pairs composed of $M_{0j}-M_{4}$ and $M_{7}-M_{11}$ to provide the rail-to-rail input dynamic range, and employs the Class-AB output stage realized by $M_{5}$, $M_{6}$, $M_{12}$, and $M_{13}$ to achieve the maximized output current swing. $M_{6}$ and $M_{13}$ also provide a duplicated output to the WTA competition circuit, and transistors $M_{14}-M_{16}$ are serving for the biasing purposes. $V_{j}^o$ denotes the output voltage from the opamp $A_{1j}$ connected to the $j$th column wire of the crossbar with $j = 1, 2, \cdots, M$. $V_{j}^o$ and its duplication are output from the “o” and “oc” terminals of opamp $A_{1j}$, respectively.

Following the Kirchhoff’s Current Law at both the row and column wires of the crossbar circuit shown in fig. 1,

\[
(V_{i}^i - V_{i}^M)g_j^i - \sum_{j} (V_{i}^M - V_{j}^P)g_{ij} = 0 \quad (3)
\]

\[
\sum_{i} (V_{i}^M - V_{j}^P)g_{ij} - (V_{j}^P - V_{j}^O)g_{j}^O = 0 \quad (4)
\]

where $V_{i}^M$ is the voltage on the $i$th row wire. Assuming the condition $g_j > \sum_{j} g_{ij}$ is held in the circuit design, it can be obtained from eqn. (3) that $V_{i}^M \approx V_{i}^i + \sum_{j} V_{j}^P g_{ij} / g_j^i$, which can substitute $V_{i}^M$ in eqn. (4).

\[
\sum_{j} V_{j}^P g_{jk} + \frac{\sum_{j} g_{jk} g_{j}}{g_j^O} - V_{i}^i \sum_{j} g_{jk} - (V_{j}^i - V_{j}^O)g_{j}^O = 0 \quad (5)
\]

The conductance $g_{ij}$ can be modified to be proportional to $p^j_i$ in eqn. (2) by the training process as described in the following, then we have $p^j_i = g_{ij} / \sum_{j} g_{ij} = g_{ij} / g_j^O$. The input voltage $V_{i}^i$ is also proportional to $p^j_i$ in eqn. (2), comparing eqn. (2) and (6), $p^j_i$, the probability for an input pattern to be classified to class $j$, can be expressed as,

\[
p^j_i = \frac{V_{j}^o g_{j}^O}{2 \sum_{j} V_{j}^o g_{j}^O} = \frac{I_{j}^O}{2 \sum_{j} I_{j}^O} \quad (7)
\]

where $I_{j}^O$, the current on conductor $g_j^O$, determines the distribution of $p^j_i$.

Due to the virtual short property of opamp, the condition $V_{j}^O = 2V_{j}^P$ is ensured by doubling $V_{j}^P$ by the resistor ladder.
The condition \(g^O_k = \sum_j g_{ik}\) is enforced by the transistors \(M^O_j\), resistors \(g_j\), capacitors \(C^O_j\), as shown in fig. 1 and fig. 2(b), and the incorporated opamps \(A_{ij}\) and \(A_3\) with their structures shown in fig. 2(a). When the circuit is operated in the normalization mode, all the switches \(S_{ij}\) and \(S_3\) are closed, an all “1” input voltage vector \(\overline{V^T} = (V^T_1, V^T_2, \ldots, V^T_R)\) is imposed on all the row wires with \(V_R\) representing the reference voltage, while the output voltage \(V^O_j\) is enforced to -\(V_R\) on all the column wires by the feedback mechanism introduced by the second stage of opamp \(A_3\). At the equilibrium, \(g^O_j\) the series resistance of \(g_j\) and the transistors \(M^O_j\), is self-adjusted to the value of \(\sum_j g_{ij}\) in all column wires, and the gate-source voltage of \(M^O_j\) are memorized in capacitor \(C^O_j\). When the circuit is operated in the recognition mode, the switches \(S_{ij}\) and \(S_3\) are open, while \(S_{3j}\) still remains closed, the voltage memorized by the capacitor \(C^O_j\) biases \(M^O_j\) in the triode region, and the condition \(g^O_j = \sum_i g_{ij}\) is thus sustained. Due to the current leakage through \(S_{2j}\), the memorized voltages on \(C^O_j\) have to be refreshed periodically.

The comparison for the probability \(p^I\) is realized by the WTA competition among the input currents \(I^O_j\) from different columns. In the WTA circuit shown in fig. 2(c), each unit associated to one of the column wires adopts the duplication of \(I^O_j\) from terminal “oc” of the opamp \(A_{ij}\) to the port \(“y”\) and charges the capacitor \(C_0\). When the voltage exceeds the threshold of the transistor \(N_3\), \(N_1\) is turned on and feeds a current back to charge \(C_0\) further and increases the output voltage at “y”, meanwhile provides the leakage current at “yout” port to all the other units to decrease their conductances. The current mirror composed of \(N_4\) and \(N_5\) duplicates the leaking currents from all other units associated to other column wires input from “yin” to discharge \(C_0\). As shown in fig. 1, all the WTA units are connected through the “yin” and “yout” ports to a global inhibitory node, and the transistors \(M^R_1, M^R_2, \ldots, M^R_M\) in fig. 1 serve to initiate and reset the competition.

### III. Simulation and Discussions

The proposed pattern recognition circuit has been designed and simulated using a 0.18-\(\mu m\) CMOS process. The conductance \(g^I\) has to be carefully selected to satisfy the condition \(g^I >> \sum_j g_{ij}\), but meanwhile confined by the maximal output current of the opamp \(A_{ij}\). In this circuit, \(g^I\) is set to 50 k\(\Omega\).

During the training process, all the switches \(S_{ij}\), \(S_{2j}\), and \(S_3\) are open, by applying the input voltage vector \(\overline{V^T} = (V^T_1, V^T_2, \ldots, V^T_R)\) representing a standard pattern to the row wires, and setting the training voltage, \(V^T_k\), on a corresponding column wire to a value exceeding the training threshold voltage, the configurable memristors at the cross points along the column can be trained to the conductance, \(g_{ij}\), proportional to the component \(V^T_i\) of the input voltage vector \(\overline{V^T}\). The voltages on the other column wires, \(V^T_j\) (\(j \neq k\)), are set to a value below the training threshold voltage, therefore the configurable memristors at the cross points along the other column wires are not configured. More advanced feedback training mechanism can be included to tolerate and correct the defects in \(g_{ij}\) values to achieve the expected classification probability \(p^I\).

The condition \(V^O_k = 2V^P_k\) is satisfied by the resistor ladder with two 1 M\(\Omega\) resistors in series. The opamp draws a biasing current of about 40 \(\muA\), and the aspect ratios of nMOS and pMOS in the opamps and WTA units are chosen as \(W_p/L=12/0.6\) and \(W_n/L=24/0.6\), respectively. Since the application of global inhibitory node, all transistors in the WTA units are designed with the standard sizes (\(W_p=24 \mu m\), \(W_n=12 \mu m\), and \(L=0.6 \mu m\)), which in turn saves the physical areas. In order to remove the body effects and ensure the resistances of \(M^O_j\) being independent of their source voltages, \(V_{BS}\) of \(M^O_j\) is enforced to be zero by utilizing pMOS with shorted body and source terminals.

#### A. Pattern Classification

In the simulation, four characters, “A”, “C”, “L”, and “U”, as are shown in fig. 3(a), are represented by sixteen pixel patterns with their corresponding input voltage vectors \(\overline{V^I} = (0,1,1,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1)\), \(\overline{V^I} = (1,0,0,1,0,0,0,0,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)\), \(\overline{V^I} = (1,0,0,1,0,0,0,0,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)\), where the black and white pixels in the patterns are denoted by “1” (\(V^I_i = 50 mV\)) and “0” (\(V^I_i = 0 V\)), respectively. Four columns in the crossbar are set to represent the characters, “A”, “C”, “L”, and “U”, respectively, and the conductance of the configurable memristors in the crossbar are set to \(10^{-6} S\) or 0 S, corresponding to the 1 or 0 components of their corresponding vector \(V^I\), respectively.

To evaluate the circuits, the four different input voltage vectors \(V^I_A, V^I_C, V^I_L, \) and \(V^I_U\) were applied on the row wires of the crossbar. The output current \(I^O_j\) was simulated as the function of time. For example, when the vector \(V^I_U\) is input to
the circuit, the current $I_j^0$ is shown as the function of time in fig. 3(b). After the initial increase for all $I_j^0$ for a period of 567 ns, only $I_j^0$, increases, while $I_j^0$, $I_j^0$, and $I_j^0$ decreases gradually. After initiating the WTA competition, within 160 ns the output probability $p^j$ from the circuit with the largest $I_j^0$ ($I_j^0$ in this case) is set to 1, and the $p^j$ from the rest of the columns are set to 0, the final output from the circuit can be expressed as a vector $\vec{p} = (p^A, p^C, p^L, p^U) = (0, 0, 0, 1)$.

When the different voltage vectors $V^j_A, V^j_C, V^j_L$, and $V^j_U$ are input, the output probability vectors obtained from the circuit are $\vec{p} = (1, 0, 0, 0), \vec{p} = (0, 1, 0, 0), \vec{p} = (0, 0, 1, 0), \vec{p} = (0, 0, 0, 1)$, respectively, which demonstrates the primary classification function of the circuit.

B. Distorted Patterns

The proposed circuit is further assessed in terms of the responses to distorted patterns. Fig. 4(a) shows the distorted patterns when they transform from standard “A” to “C", and the hamming distance, which is defined as the count of the different components of the input voltage vectors of the distorted and original patterns, increases from 0 to 8 (the intrinsic hamming distance between standard “A” and “C” is 8). Fig. 4(b) characterizes the output of the circuit to the different input patterns transformed from “A” to “C" in terms of the output classification probability $p^j$ (before WTA competition) as the function of the hamming distances. It can be seen that when the pattern transforms from “A” to “C", $p^A$, the probability to classify the pattern as “A”, decreases from 1 to 0, meanwhile $p^C$, the probability to classify the pattern as “C", increases from 0 to 1. The patterns with approximately equal hamming distance from “A” and “C" may confuse the competition circuit and lead to erroneous decisions.

C. Circuit Delay

High speed is one of the major advantages of the proposed pattern recognition circuits. The circuit delays, defined as the time for the circuit outputs being bifurcated toward different directions in the inference stage (as shown in fig. 3(b)), and the time for the outputs being settled to the stable values in the competition stage, are analyzed.

In the proposed circuit, the delay not only depends on parameters of the device elements, but also influenced by the input patterns, as shown in fig. 4(c). When pattern transforms from “A” to “C", both the delays on the inference and competition stages change as the function of the hamming distance of the patterns distorted from the standard patterns, and are maximized at the position with approximately equal hamming distance from “A” and “C". The delay on the inference stage takes 70%–80% of the overall delay.

D. Dimension Scaling

When the dimension of the input pattern, $N$, scales up, the equivalent circuit for characterizing the delay has been illustrated in fig. 5(a), where $g^j$ is the average input conductance, $\bar{g}$ and $\bar{C}$ are the average conductance and capacitance at cross points of the crossbar, and $R_L$ and $C_L$ are the loading resistance and capacitance of opamp, respectively. The opamp can be modelled as a single pole system with the trans-conductance function of $g_m(s) = g_m^0 \omega_p / (s + \omega_p)$, where $g_m^0$ is the DC trans-conductance, and $\omega_p$ is the dominant pole of the opamp, and $s$ is the Laplace operator. By applying the Kirchhoff’s Law at the points A and B shown in fig. 6(a),

$$\frac{(g + sC)g'(V - v_n)}{g(g + sC + g')} = v_n - 2v_p$$ (8)
\[
\frac{(v_y - v_r)g_m^0 + \left(\frac{g + sC}{g + sC + g^0}\right)N g (V - v_r)}{s + \omega_p} = v_y \left(\frac{1}{R_t + sC_L} + \frac{1}{R}\right)
\] (9)

Based on our circuit design, the conditions \(g_m^0 (R_t \parallel R) \gg 1, \quad g R_t \sim 1, \quad \text{and} \quad N \gg 1\) are satisfied, we may have the simplified characterization function of

\[
2N\left(\frac{g + sC}{s + \omega_p}\right) - g_m^0 (R_t \parallel R) \omega_p = 0, \quad \text{and} \quad s \quad \text{is solved as}
\]

\[
s = \frac{-\omega_p + \sqrt{\frac{\omega_p^2}{4} - \frac{g_m^0 (R_t \parallel R) \omega_p^2}{4N}}}{2} \quad \text{and} \quad \omega_p = \frac{g_m^0 (R_t \parallel R) \omega_p}{4N}
\] (10)

The circuit 3-dB bandwidth approaches \(\omega_p\) when the input vector dimension \(N\) scales up, therefore, the circuit delay initially increases with \(N\) and saturates at the intrinsic delay of the opamp. Fig. 5(b) shows the circuit speed characteristics based on the simulation on the inference stage when the input vector dimension scales up. It can be seen that the circuit delay drastically increases with \(N\) over the low dimensional range (\(N < 8\)), however, saturates approximately at 800 ns on the large dimensional range (\(N > 16\)). The results indicate that the proposed circuit is potentially capable to classify patterns with large dimensions at a much faster speed than the software-based computer.

**FIGURE V.** (A) EQUIVALENT CIRCUIT FOR DELAY ANALYSIS. (B) CIRCUIT DELAY ShOWN AS THE FUNCTION OF THE DIMENSION OF INPUT PATTERNS. THE SQUARES ARE THE SIMULATED RESULTS ON INPUT DIMENSIONS OF 2, 4, 16, 32, 64, 128, AND 256, RESPECTIVELY.

**E. Energy Consumption**

The energy consumption for the circuit is calculated, it takes \(-1\) pJ per dimension to recognize a pattern. The maximal power density at each cross point in the crossbar is \(-1 \mu W/\mu m^2\), which falls in the range of the power density in a standard Si CMOS circuit. The typical operation power for each opamp in the circuit is 40 μW. It is worthwhile to notice that the delay of the proposed pattern classification circuit is associated with the 3-dB bandwidth of the opamp. One can further miniaturize the power consumption by incorporating opamps with smaller bandwidth, however, at the cost of an increased delay and therefore lower speed. Careful trade-off has to be made between power and speed for specific applications.

**IV. CONCLUSION**

A novel circuit model for pattern classification has been proposed based on statistic theorem. The circuit architecture is composed of a memristor-crossbar/CMOS hybrid network, with each column of the crossbar representing a pattern class. When a pattern is input from the row wires of the crossbar, the output current from each column is normalized by a CMOS analog circuit, and the probability to classify the input pattern to a class is denoted in terms of the normalized output current from each column. By comparing the output currents, the input pattern can be finally classified to a class with the largest probability in a winner-take-all competition circuit. The delay to classify a pattern is at the order of sub-μs and does not increase when the pattern dimension scales up, which endows the circuit with the capability to recognize patterns with large dimensions at a speed much faster than the software-based computer.

**ACKNOWLEDGEMENTS**

The author would like to thank the support from National Natural Science Foundation of China under grant 61101001 and 61204026, and Tsinghua University Initiative Scientific Research Program.

**REFERENCES**


