# EBC-Estimator of Multidimensional Bayesian Threshold in Case of Two Classes

^{*}

^{, }

^{}

^{*}Email: o.kubaichuk@kpi.ua

- DOI
- 10.2991/jsta.d.200824.001How to use a DOI?
- Keywords
- Estimator; Multidimensional Bayesian threshold; Mixture with varying concentrations
- Abstract
Some threshold-based classification rules in case of two classes are defined. In assumption, that a learning sample is obtained from a mixture with varying concentration, the empirical-Bayesian classification (EBC)-estimator of multidimensional Bayesian threshold is constructed. The conditions of convergence in probability of estimator are found.

- Copyright
- © 2020 The Authors. Published by Atlantis Press B.V.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

## 1. INTRODUCTION

The model of a mixture of several probability distributions was mentioned for the first time by Newcomb [1] and Pearson [2]. Such mixtures naturally arise in many areas. In particular, in the theory of reliability and time of life, mixtures of gamma distributions [3] are used. Examples of the use of mixtures of normal distributions in the processing of biological and physiological data are given in [4]. In Slud [5], a mixture of two exponential distributions is used to describe the debugging process of the software. Some applications of the model of mixtures in medical diagnostics were given in [6,7].

The technique of a nonparametric analysis of mixtures where concentrations changes from observation to observation develops, actively. The problem of distributions estimating in case at known concentrations is considered in the works of Maiboroda [8,9]. Estimates of concentrations in two-component mixtures in work [10]. Works by Sugakova [11] and Ivanko [12] are devoted to the evaluation of component distribution densities. The correction algorithms for weighted empirical distribution functions are proposed in [13].

For the theoretical study of problems of nonparametric regression the nonhomogeneous weighted empirical distribution functions used by Stoune [14]. These were applied by Maiboroda in the tasks of analyzing the mixture. In particular, in Maiboroda [8] found conditions under which the weighted empirical distribution functions are unbiased and minimal estimators of unknown distribution functions of components of the mixture.

Object classification by its numerical characteristic is an important theoretical problem and has practical significance, for example, the definition of a person as “not healthy,” if the temperature of its body exceeds 37°C. To solve this problem we consider the threshold-based rule of type

According to this rule, an object is classified to the first class if its characteristic does not exceed a threshold

However, it is often necessary to classify an object in case of more than one threshold, for example, the definition of a person as “not healthy,” if the temperature of its body exceeds 37°C or lower than 36°C. Another example: the person is sick, if the level of its hemoglobin exceeds 84 units or lower than 72 units. Accordingly, one can apply the classifiers of type

In particular, this problem is discussed in [20,21].

The case of two thresholds and three prescribed classes deserves special attention. An example is the classification of the disease stages. Thus, during the diagnosis of breast cancer a tumor marker CA 15-3 is used. If the value is less than 22 IU/mL, then the person is healthy; if its level is in the range from 22 to 30 IU/mL—precancerous conditions can be diagnosed; if the index is above 30 IU/mL—patient has cancer. When solving some technical problems it is needed to consider the substance in its various aggregate forms: gaseous, liquid, solid. The transition from state to state occurs at a specific temperature. According to this, a boiling point and a melting point are used. Accordingly, 6 classifiers of forms

## 2. SETTING OF THE PROBLEM

The problem of classification of an object

The a priori probabilities

The family of classifiers is denoted by

Let,

Analogically, for

A classification rule

Let,

Denote Bayesian threshold for classifier

Analogically, let

For

Let,

For

Let,

For

Denote

When determining the best threshold, one faces the problem of estimating the threshold by using a learning sample, whose members are classified correctly. We consider the Bayesian empirical classification method, in assumption, that a learning sample is obtained from a mixture with varying concentration.

The distribution functions

Here

To estimate the distribution functions

One can apply kernel estimators to estimate the densities of distributions:

The empirical-Bayesian estimator is constructed as follows. At first, one determines the set

Second, one chooses

An example of multidimensional threshold in case of two classes is shown on Figures 1 and 2 (Mathcad v.13 was used).

## 3. MAIN RESULTS

## 3.1. Choice of Classifier

The choice of the classifier

### Theorem 3.1.1.

*If root of* (4) *minimizes* *then the classifier* *is selected, but if it minimizes* *then it is selected*

### Proof.

The statement follows from the properties

### Remark 3.1.1.

The next theorem can be proved analogically to Theorem 3.1.1.

### Theorem 3.1.2.

*If root of* (3) *minimizes* *then the* *is selected, but if it minimizes* *then it is selected*

### Proof.

The statement follows from the properties

### Remark 3.1.2.

## 3.2. The Convergence in Probability of EBC-Estimator

In what follows we assume that

(A). The threshold

(Bk). The limits

### Lemma 3.2.1.

*Let conditions (A) and (Bk) hold. Assume that densities* *and* *exist and are continuous*, *is a continuous function, and*

*Then* *for* *where* *and*

### Proof.

According to Theorem 1 of [11], the assumptions of the theorem imply that

Now we shall that

Since,

Thus,

### Lemma 3.2.2.

*Let assumptions of* *Lemma 3.2.1* *for* *hold. Then*

*for*

*for*

### Proof.

Fix

Choose

Fix an arbitrary

### Theorem 3.2.1.

*Assume that conditions of* *Lemma 3.2.1* *hold. Then* *or* *in probability as* *namely* *or* *in probability as*

### Proof.

Since (5)

This completes the proof of the theorem, since

## 4. CONCLUSION

In this paper, we found the conditions of convergence in probability of the estimator for the Bayesian threshold constructed by the method of empirical-Bayesian classification for a sample from a mixture with variable concentrations.

## CONFLICT OF INTEREST

The author has no conflicts of interest to declare.

## ACKNOWLEDGMENTS

I thank the reviewers whose insightful comments helped me to improve this paper.

## REFERENCES

### Cite this article

TY - JOUR AU - Oksana Kubaychuk PY - 2020 DA - 2020/09/08 TI - EBC-Estimator of Multidimensional Bayesian Threshold in Case of Two Classes JO - Journal of Statistical Theory and Applications SP - 342 EP - 351 VL - 19 IS - 3 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.d.200824.001 DO - 10.2991/jsta.d.200824.001 ID - Kubaychuk2020 ER -