An Application of Dynamic Bayesian Networks to Condition Monitoring and Fault Prediction in a Sensored System: a Case Study

Javier Cózar; José M. Puerta; José A. Gámez

doi:10.2991/ijcis.2017.10.1.13

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Volume 10, Issue 1, 2017, Pages 176 - 195

An Application of Dynamic Bayesian Networks to Condition Monitoring and Fault Prediction in a Sensored System: a Case Study

Authors

Javier Cózar¹^,javier.cozar@uclm.es, José M. Puerta¹^,jose.puerta@uclm.es, José A. Gámez¹^,jose.gamez@uclm.es

Received 8 June 2016, Accepted 24 September 2016, Available Online 1 January 2017.

DOI: 10.2991/ijcis.2017.10.1.13 How to use a DOI?
Keywords: Bayesian networks; anomaly detection; sensor networks; predictive maintenance; condition monitoring
Abstract: Bayesian networks have been widely used for classification problems. These models, structure of the network and/or its parameters (probability distributions), are usually built from a data set. Sometimes we do not have information about all the possible values of the class variable, e.g. data about a reactor failure in a nuclear power station. This problem is usually focused as an anomaly detection problem. Based on this idea, we have designed a decision support system tool of general purpose.
Copyright: © 2017, the Authors. Published by Atlantis Press.
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

Decision support systems (DSS) are widely used for industrial damage detection ^2,6,19. It is not an easy, nor cheap, task to maintain industrial units as every component suffers damage continuously due to usage. However, there are several advantages in using automated systems to carry out this maintenance. This type of system provides a health status prediction about the monitored units ^5,24, which is very useful for human beings because it gives them information about what components might be failing or are close to fail. This fact translates into an improvement in the time required to detect the failure and in the determination to detect what unit is failing.

There are a wide range of industrial environments which have different requirements. For some of them, e.g. electric companies, working without interruptions is a priority ^1,15. For those cases, these decision support tools are used to carry out a preventive maintenance, helping to detect any problem and solving it as soon as possible or even before it occurs.

Different data mining techniques has been used for, what is usually called, health management ². They use information about the machinery components to learn a behavioural model which is used to predict their health status. Usually, this kind of problems are treated as a classification one, which means that the prediction will be one choice of a finite set of options ^29,30,31,32. Supervised classification techniques need data from all these different options to carry out the learning process. However, in some situations this is not the case. For instance, in a nuclear power station, there might not be data from a situation of the reactor failure. For this kind of problems there are some approaches which instead of learning a behaviour model for all the possible outcomes, they focus on the detection of anomalies with respect to standard functioning ^9,25. In this work we propose a method based on Bayesian networks to detect anomalous behaviours in this type of systems, encouraged by the significant examples of successful applications of Bayesian networks to condition monitoring from ^27,22, predictive maintenance from ^6,3,18 and fault detection/diagnosis from ^28,26.

In order to design that DSS, we will start from the methodology for detecting faults and abnormal behaviours described in ²². Afterwards, we will design a new metric to improve the behaviour of the previous methodology in some general cases. In order to demonstrate the usefulness of our new proposal, we will generate some artificial datasets and we will compare the performance between the original methodology and our proposal.

This study is structured as follows. In Section 2 we briefly introduce Bayesian networks. Section 3 contains the description of our DSS, describing as well the methodology proposed in ²². Later, in Section 4 we will explain the kind of industrial system our application will work for, and we will discuss the benefits of our proposal through an artificial dataset. Also some tests are carried out in order to evaluate its performance. Section 5 contains our concluding remarks. Finally, in Appendix 1 we describe the designed decision support web-based application, while in Appendix B we show the parameters of the Bayesian networks (probability tables) used to generate the synthetic datasets.

2. Bayesian Networks

Bayesian networks (BNs) are mathematical objects which inherently deal with uncertainty ^11,13. When used for probabilistic reasoning, a BN represents the knowledge base of a probabilistic expert system. From a descriptive point of view, we can distinguish two different parts in a BN, which respectively accounts for the qualitative and quantitative parts of the model. Figure 1 shows an example of a simple BN.

The qualitative part of the network is represented by a directed acyclic graph (DAG), 𝒢, whose nodes represent the random variables in the problem domain and whose edges codify relevance relations between the variables they connect. When the network is built by hand with the help of domain experts, these relevance relations are usually of causal nature, while when the network is learnt from data, we can only talk about probabilistic dependence, but not causality. The whole graphical model codifies the (in)dependence relations among all variables and can be interpreted by using the D-separation criterion ²³ in order to carry out a qualitative or relevance analysis.

On the other hand, the quantitative part of the model consists of a set of conditional probability distributions, one for each node (variable): P(X_i|pa_𝒢 (X_i)), where pa_𝒢 (X_i)^a are the parent nodes of X_i in the DAG 𝒢. From the independences codified by the DAG, the joint probability distribution can be recovered from the BN factorization as shows the Eq. (1).

(1)P(X1,X2,…,Xn)=∏i=1nP(Xi|pa(Xi)).

Once a BN has been built for a given problem domain, it becomes a powerful tool for probabilistic reasoning, with a great range of exact and approximate convenient algorithms to form the inference engine ⁴. Depending on the target domain and the availability of domain experts and/or data, the network can be manually constructed by using knowledge engineering techniques ^14,12, automatically learnt from data ^7,21, or combining both techniques.

An important and attractive issue of BNs is their ability to incorporate the temporal dimension, allowing in this way reasoning over time. Thus, while a static BN represents a fixed capture of the domain environment in a given instant, dynamic BNs (DBNs) allow to explicitly represent different instances of the same variables over time, as well as temporal relations between them. Figure 2 shows the most used way of dealing with DBNs in the literature ²⁰, which consists of a basic structure (slice), which represents a static BN, together with a set of temporal relations representing the dependences from time t − 1 to time t. This structure is unfolded as many times as needed in order to forecast the values of variables at time t + k. As can be noticed, the Markovian condition is assumed in DBNs.

3. Fault Diagnosis Methodology

One of the main applications of DSS is helping to carry out a predictive maintenance. It builds a model from a set of examples which is used to predict the health status of the monitored system ². Usually, data is labelled and we know the class for each instance (if it is a normal behaviour, or on the contrary if some component is failing), so supervised data mining techniques can be used to build the prediction model. However, sometimes this information cannot be acquired, and other approaches have to be used. For example, nearest neighbour based anomaly detection techniques uses the definition of a distance or similarity measure between data instances to determine if an example is an anomaly or not ^47,46. Clustering based anomaly detection techniques tries to group similar data instances into clusters. Afterwards, if an incoming instance does not belong to any cluster it would be an outlier ^33,34. Sometimes, this assumption is relaxed and an incoming instance is marked as an outlier if it is far away from the centroid of the cluster that it belongs to ^35,36, or if it belongs to a sparse cluster ^37,38. Statistical anomaly detection techniques say that an observation which is not generated by the assumed stochastic model is an anomaly. These models are built from data using parametric ^45,44,43 or non-parametric techniques ^42,41. Also, classification based anomaly detection techniques has been widely used ^40,39,22. These techniques learn a model which distinguishes between normal and anomalous classes.

In this paper we aim to propose a failure detection tool whose goal is to detect generic failures from the information collected in a sensored system. This tool is based on a technique included in the last mentioned group (classification based anomaly detection techniques), and in more detail, using Bayesian networks as models to represent the behaviour of the system. Therefore, our BN-based system is not targeted for the detection of a concrete type of failure in a particular machine or set of machines, but on the contrary, what we aim is to be able of detecting any anomalous behaviour of the system (that can be inferred from sensors readings). Obviously, dealing with such a so general problem represents a disadvantage with respect to tailored systems, due to the absence of problem domain knowledge. On the other hand, it is more general and realistic because we can detect previously unknown failures. In our case, the system being monitored can be viewed as a black box (Figure 3).

We cannot consider to deal with this problem by using supervised classification methodologies because of the nature of the input data. In particular, we do not have data about each possible wrong behaviour of the system, in fact, we only dispose of data corresponding to one or more status of the machinery working in absence of failures. Our goal is to predict whether the current behavioural state of the physical system is correct or on the contrary it stands for some kind of failure. Therefore, instead of a fully supervised classification methodology, we opt for the use of an anomaly detection ² based one.

In this type of methodologies, a model is constructed to represent the failure-free behaviour of the physical system, and it is used to check if an incoming sensor reading does not match it, producing a failure warning (or alarm) in such a case. We will start from the methodology detailed in ²², adapted to our case of such general problems. However, as we will discuss in Section 4, the previous methodology has problems in some specific circumstances. In order to improve the performance of our DSS in such cases, we have introduced a new model definition (in addition to the failure-free behaviour one) to represent the anomalous behaviour. As an overview, the workflow of our DSS is shown in Figure 4.

Every cycle (reading-detection) has the following behaviour: (1) readings are taken from the sensors and stored in a data base; (2) these readings, possibly manipulated, give rise to the observations entered into the expert system for classification; (3) the probabilistic expert system (composed of two Bayesian networks) computes the prediction (fault or non-fault); (4) if a possible failure has been identified, an alarm is generated and a human operator supervises it; finally (5) the knowledge-base is updated according to this information (false or true positive alarm).

Therefore, the proposed DSS has been designed as a generic tool, where no specific problem domain has been considered. Nevertheless, it is noteworthy that if some previous knowledge is available, it can be incorporated to our system, as this is one of the main advantages of using a Bayesian network to represent the knowledge base. In order to deal with a so generic failure detection system, we impose the two following assumptions:

•
The tool needs a first stage in which it collects data from the monitored system (sensors readings) working properly, i.e. without failures. From these readings, the system will construct/learn a correct behaviour model. This is not a hard assumption, since long periods in the absence of failures are common in industrial machinery.
•
Some degree of supervision is needed. As we do not have prior information about how a failure looks like, once a suspicious behaviour is detected from the sensors readings because it deviates from the (learnt) correct behaviour model, we need that a human operator confirms if the detected anomalous behaviour actually corresponds to a failure in the system or if it is just a false positive. This information will be used as feedback to improve the prediction models.

This section is structured in the following subsections. First, in Subsection 3.1 we are going to describe both models for failure-free and anomalous behaviour. Then, in Subsection 3.2 we detail the metrics used in our DSS and how we combine them to detect anomalous behaviours. Finally, in Subsection 3.3 we detail how these models are updated when a human operator tells the system about false or true positive alarms.

3.1. Bayesian networks to detect anomalous behaviours

As we said before, we are going to use two Bayesian networks: ℳ_c, the model to represent failure-free behaviours and ℳ_f for anomalous behaviour.

Bayesian networks assume discrete values as inputs. However, this is not usual in the case of sensor readings, e.g. temperature. One option would be to use Hybrid Bayesian networks ⁴⁸ instead, which can deal with real input values using a conditional gaussian model. However, this approach has some drawbacks, as imposing certain restrictions about the topology of the network. Another option, used in this work, consists on applying a pre-processing discretization step, where the discretization intervals can be provided by domain experts, obtained by applying some unsupervised discretization technique ¹⁶ or just using equal width binning.

In the following subsubsections we detail how ℳ_c and ℳ_f are built (Subsubsection 3.1.1 and 3.1.2 respectively).

3.1.1. Probabilistic model for failure-free behaviour: ℳ_c

The physical system for which we intend to carry out the predictive maintenance can be in a serie of (hidden) states. We have no direct access to these inner states, but to some observable measures provided by a set of sensors (e.g. light, vibration, temperature, etc.) attached to the machinery. Let us assume we have n sensors {S₁,…,S_n} each one taking values in dom(Si)={v1i,…,vri} . That is, we assume each sensor can take value in a finite set of discrete/nominal values.

The main goal is to obtain a probabilistic model which represents the behaviour of the system working properly, that is, without any defective component and so assuming correct sensors readings. This probabilistic model, which we will call ℳ_c needs to deal with the n sensors but also with the temporal relations among them, as the evolution on the value of a given sensor, will provide clues on its correct or incorrect functioning. Because of these requirements, we have chosen the formalism of DBNs ²⁰ to represent our knowledge base.

For the sake of simplicity and because of the lack of prior domain knowledge we have resorted to a simple DBN model with fixed graphical structure. In the model, each sensor is independent of the rest^b but depends on itself at time t −1, that is, there is no arcs of type S_i → S_j, but we include temporal arcs of type Sit−1→Sit . In this way, as there are no relations between different sensors, it can be seen as independent Bayesian networks (one per sensor). However, the definition that we propose in this work contemplate a more general situation, where sensors can be dependant of others at the same time t.

In practice, for better readability reasons, we actually deal with a static BN, in which temporal relations are taking into account by unfolding the DBN for a given number of temporal slices (in this work equals to 24). In fact, the way to procede with DBN is to unfold 1 layer at each time, forgetting (or not) the past layers. However, we determined to use this strategy (unfolding the 24 layers at once) because of the monotonous behaviour of these types of environments: even if the machinery carries out different operations depending on the hour of the day, this pattern is usually the same every day (24 hours). The resulting model is shown in Figure 5, where we represent 24 consecutive hourly measures for each sensor, that is, we model a full day of the physical system.

Once we have described the structure of our probabilistic graphical model ℳ_c, we need to provide the numerical parameters, that is, the conditional probability tables. To do this we use the dataset containing sensors readings captured during a period of correct functioning of the monitored system. Our first task is to transform the captured dataset, containing hourly readings for the n variables (sensors) to a new one containing 24 × n variables. The process is described in Figure 6.

From the transformed dataset, where each sik belongs to dom(S_i), we estimate the marginal {P(Si1)}i=1n and conditional {P(Sit|Sit−1)}i=1,t=2i=n,t=24 probability distributions (tables) by using Laplace smoothing ¹⁷. In general, if more dependence relations are included in the graphical model, then the set of conditional probability tables would be estimated as {P(Sit|pa(Sit)}i=1,t=1i=n,t=24 .

3.1.2. Probabilistic model for anomalous behaviour: ℳ_f

Apart from ℳ_c that models the normal behaviour of the physical system, we propose the use of a failure model ℳ_f which models the anomalous behaviour of the physical system, that is, how the readings look-like when some failure is happening or is close to happen. In our case, as we have previously described, due to the generality of the approach we have no data to learn this model. Nonetheless, we have decided to use a random failure model, ℳ_f, which has the same graphical structure as ℳ_c, but whose parameters (local probability distributions) have been randomly generated using an uniform distribution. The rationale behind using this model is that normal sensors readings will have a high likelihood of being generated by ℳ_c and a very low likelihood of being generated by ℳ_f. However, in the case of abnormal sensors readings, the likelihood with respect to ℳ_c should decrease, while the one with respect to ℳ_f shoud increase or at least do not change substantially.

3.2. Anomaly detection procedure

After learning ℳ_c and generating ℳ_f, they are used to process new sensor readings. The goal is to predict whether a new reading represents a failure-free behaviour for the machinery, or on the contrary it represents some anomalous behaviour, which can be associated with a failure or with a warning that indicates a forthcoming failure.

In the following subsubsections we present two metrics and the way they are combined in order to classify input readings into failure-free and anomalous behaviours. In Subsubsection 3.2.1 we show a metric based exclusively in ℳ_c, used in the methodology proposed in ²². Then, in Subsubsection 3.2.2, we explain our proposal to improve the performance of the previous methodology. Finally, in Subsubsection 3.2.3 we detail how we detect anomalous behaviours combining the previous metrics.

3.2.1. Metric conf (e)

The methodology proposed in ²² is based on measuring the conflict between the model and the sensor readings. A fault (an anomaly) will be detected when that measurement reaches a threshold. To detect if a particular case or reading is coherent with the model ℳ_c or not, it uses a procedure which is based on the well known conflict (conf) measure proposed in ¹⁰ (see also ¹¹). The measurement is described in Eq. (2).

(2)conf(e)=lnPℳ(e1)×⋯×Pℳ(en)Pℳ(e),

where e stands for the joint configuration of n findings, i.e., e = (e₁,e₂,…,e_n).

The idea behind conf measure is that findings coherent with the model should be positively correlated, thus, P_ℳ (e) should be greater than the product of independent (marginal) probabilities, P_ℳ (e₁),…,P_ℳ (e_n). Therefore, a negative number would indicate that e is coherent with the model ℳ, while a positive one is an indicator of a conflict. The bigger the number, the more probable the conflict is.

From a computational point of view, computing conf (e) requires two propagations (inferences)¹¹ over the probabilistic graphical model: (1) in the first one, no evidence is entered into the network, so, marginal probabilities P(e₁),…,P(e_n) are obtained; and (2) in the second one, all the findings e₁,…,e_n are entered as evidence, and P(e) = P(e₁,…,e_n) is obtained after the propagation by computing in any node the normalization constant.

In our case, as we are interested in the predictive maintenance of the machinery, we use as findings the sequence of readings from time t_i to time t_j, where 1 ≤ i, j ≤ 24 and w = j − i is a positive integer. This is because a failure usually does not happen abruptly, but gradually, so we consider the sequence of readings in order to evaluate possible trends. However, we consider a maximum time window size w (temporary difference between t_j and t_i), because if we take into account the whole set of readings, the information provided by the last measurements would have a small impact as they are diluted by the rest of the readings.

3.2.2. Metric rcf (e)

The previous metric detects anomalies paying attention to the dependencies between variables. As it will be discussed in Section 4, an uncommon sequence of readings might be considered a failure-free behaviour. In order to improve the performance of our DSS, we will introduce a second measurement (which uses both models, ℳ_c and ℳ_f). This new measure, called ratio correct vs fault (rcf), is shown in Eq. (3).

(3)rcf(e)=lnPℳf(e)Pℳc(e)

At the begining, when the parameters of ℳ_f has been initialized randomly, a change in the behaviour of the system should not affect substantially to P_{ℳ_f} (e). On the contrary, if that situation corresponds to an abnormal behaviour, P_{ℳ_c}(e) should decrease and therefore the ratio rfc(e) should increase. On the other hand, if readings correspond to a normal situation, P_{ℳ_c}(e) should be higher than P_{ℳ_f} (e) as the parameters of ℳ_c has been learnt from data in abscence of failures. Moreover, when the model ℳ_f has been updated with data from anomalous behaviours, the measure rcf (e) should improve its performance as the parameters of the model ℳ_f has been updated using data of anomalous situations, and therefore P_{ℳ_c}(e) and P_{ℳ_f} (e) should behave oppositely.

As in conf(e), we will use as evidence the sequence of readings from time t_i to t_j, where e has the same meaning as in Eq. (2) and the same window size w is used.

3.2.3. Use of conf (e) and rcf (e) for anomaly detection

In this section we are going to explain how we use conf(e) and rcf(e) to detect anomalous behaviours. The basic idea behind this is to use two thresholds, one for each metric. Each time one measure is greater than its associated threshold, then an anomalous behaviour will be detected. However, we realised that using the whole evidence e to compute each measure has some drawbacks. As aforementioned, it is an unusual situation that a component fails, and usually when it happens only a few sensors would be involved. Because of that, if the system is composed of a large number of sensors, the information of an anomaly can be diluted by the rest of sensors and the failure detection can be noteless.

In order to deal with that problem, the previously described measures (conf and rcf) are separately computed for each different sensor in our system, using as evidence the readings from that sensor and from all its ascendants in the network (for time window w). This gives us information about the probability that a concrete sensor reading comes from an anomalous behaviour or not. Moreover, we are able to detect if there is a failure on the machinery and if so, the defective component.

Finally, we use the information of all these individual measures to decide if the input represents a (forthcoming) failure, and so an alarm must be fired. We have set two thresholds (t_conf and t_rcf), one for each measure respectively. If any of the computed measures is greater than its associated threshold, then an alarm (related to the evaluated sensor) is triggered. For the sake of clarity, in our experiments (see Section 4) we do not pay attention to what component is failing, but only if any component in the whole system is failing or not.

3.3. Models updating

Even if no domain knowledge is used to build the model, the information about its performance can be used as feedback. Therefore, if at some point the system triggers an alarm, a person who checks the status of the monitored system can tell the DSS if the behaviour has been correctly classified or not. This information can be used to update the models and improve their predictions.

Given a certain alarm, if it is marked as a false positive it means that data come from a correct behaviour, so the model ℳ_c will be updated using this information (as described afterwards). On the contrary, if it is marked as an actual failure, it means that data come from an anomalous behaviour, so the model ℳ_f will be updated. The data used to make the model updates correspond to those contiguous readings from the first detection of the alarm to the last one (the reading before data is considered again as normal behaviour).

The update process keeps the model structure invariable but changes the parameters (probability tables). Let {P(Si1)}i=1n be the marginal probability table for the sensor i in the first layer, and {P(Sit|Sit−1)}i=1,t=2i=n,t=24 the conditional probability table for the sensor i in layer t. First we calculate the probabilities using only the data from the alarm detection, following the same procedure detailed in Subsubsection 3.1.1. That is, first we apply the database transformation procedure (see Figure 6) and then we estimate the marginal {P(Si1)′}i=1n and conditional {P(Sit|Sit−1)′}i=1,t=2i=n,t=24 probability tables by using Laplace smoothing¹⁷. Finally, we combine both parameters to get the updated probability tables ({Pu(Si1)}i=1n and {Pu(Sit|Sit−1)}i=1,t=2i=n,t=24) using a weighted average as it is shown in Eq. (4) and Eq. (5):

(4){Pu(Si1)}i=1n=α·{P(Si1)′}i=1n+(1−α)·{P(Si1)}i=1n

(5){Pu(Sit|Sit−1)}i=1, t=2i=n, t=24=α·{P(Sit|Sit−1)′}i=1, t=2i=n, t=24+(1−α)·{P(Sit|Sit−1)}i=1, t=2i=n, t=24

The parameter α determines the “memory” of the system about past readings. Its range is [0 − 1]. As it closes to 0, it gives more importance to the information about the past. Therefore, the model requires more time to be adapted to a new behaviour. On the contrary, as it closes to 1, the model tends to represent exclusively the recent behaviour of the monitored system, forgeting the past behaviour in a small window of time.

4. Simulated Case of Study

In this section we are going to test the predictive capability of the original methodology explained in ²² and compare it with our proposed DSS. In order to do that, we are going to generate synthetic data representing different scenarios. Our aim is to test the following situations. The system has been trained using data in abscence of failures. Then:

•
The behaviour does not change, so no alarms should be triggered.
•
The behaviour suddenly changes. Alarms should be triggered, but two scenarios can be tested in this case depending on the given feedback: It comes from an anomaly, or from a change in the behaviour of the monitored system.

For that, we have designed a simulated environment with four different sensors usually presented in industrial machinery: Temperature (T), Humidity (H), Vibration (V) and Active Power (AP). To generate the synthetic data from that environment, we have modeled its behaviour using a Bayesian network ℬ_b and then we have generated samples from it. In order to represent a change in the behaviour of the monitored system, we have used two alternatives:

•
Keep the Bayesian network structure of ℬ_b but change its parameters.
•
Change the structure of the Bayesian network ℬ_b.

Next we are going to explain in detail the models and the process to generate the synthetic data from them.

The first model ℬ_b, used to generate the initial behaviour and the first alternative, is shown in Figure 7. We have set a direct dependence between Active Power and Temperature, and also between Humidity and Vibration. These dependences have been replicated in the 24 hourly layers, that is from [0:00-1:00) to [23:00-0:00). Regarding the domain for each variable, we have considered that the four variables take values in the set {Low, Medium, High}. Therefore, as we deal directly with discrete values instead of real numbers, there is no need to apply the discretization stage.

After setting the structure of the model, we have parameterized it into two different ways:

•
Basic behaviour, which represents the usual machinery functioning (𝒲_b). In this case, ℬ_b has been parameterized in the following way: all sensors tend to generate the Low value with more probability than the Medium one, and this Medium with more probability than the High one. Furthermore, and due to the dependences in the graphical model, Vibration and Temperature tend to follow the measures of Humidity and Active Power respectively, and each sensor tends to follow its own measure in the previous layer (time). To see the specific values, see Table B.1 in the Appendix B.
•
Alternative behaviour 1, which represents a situation in which we detect more vibration than usually (𝒲_a1). In this case, the network ℬ_b is parameterized in a different way: Temperature, Humidity and Active Power have a similar behaviour as in basic behaviour (𝒲_b), but Vibration will tend to get higher values. Specific values for Vibration variables are shown in the Appendix B, Table B.2.

The second model structure, ℬ_a is similar to ℬ_b but any dependence of the variable Vibration has been removed (see Figure 8). The parametrization of this network is as follows:

•
Alternative behaviour 2, which represents a situation in which vibration readings follow the uniform distribution (𝒲_a2). Therefore, we take (ℬ_a) as basic structure and parameterized it according to the following description: All the states of Vibration have the same probability while the remaining probability distributions are set as in the basic behaviour (𝒲_b).

In order to obtain the datasets for the simulations, the models are sampled by layers (first t = 1, then t = 2, etc.) and inside each layer, a probabilistic logic sampling ⁸ is guided by a topological ordering (e.g. AP, H, T, V). We consider two cases:

•
Initial time slice (t = 1). First, variables without parents in the network are (independently) sampled from their marginal distribution. That is, P(AP₁) and P(H₁) for 𝒲_b and 𝒲_a1, and P(AP₁), P(H₁) and P(V₁) for 𝒲_a2. Once the values for these variables are known (call them ap, h and v), the rest are sampled from the marginal distributions: P(T₁|AP₁ = ap) and P(V₁|H₁ = h) for 𝒲_b and 𝒲_a1, and P(T₁|AP₁ = ap) for 𝒲_a2.
•
Rest of time slices (t ≥ 2). Now the values for all the variables in the time slice t −1 are known, therefore the marginal distribution to be sampled by order are: (1) P(AP_t|AP_t−1 = ap_t−1), P(H_t|H_t−1 = h_t−1), P(T_t|AP_t = ap,T_t−1 = t_t−1) and P(V_t|H_t = h,V_t−1 = v_t−1) for 𝒲_b and 𝒲_a1; (2) P(V_t), P(AP_t|AP_t−1 = ap_t−1), P(H_t|H_t−1 = h_t−1) and P(T_t|AP_t = ap,T_t−1 = t_t−1) for 𝒲_a2.

From these models we have sampled four datasets. Each one contains 4320 readings (〈ap,t,v,h〉), corresponding to 6 months, 30 days per month and one reading every hour.

•
BasicT. This dataset is sampled from the Basic behaviour model 𝒲_b and will be used to Train the correct behaviour model (ℳ_c).
•
BasicV. This dataset is sampled from the Basic behaviour model 𝒲_b and will be used to validate the correct behaviour model (ℳ_c).
•
AlternativeU. This dataset is sampled from the Alternative behaviour model 1 𝒲_a1 and will be used in two different ways, to update the correct and failure behaviour models (ℳ_c and ℳ_f). In other words, telling the DSS that alarms correspond to a change in the behaviour of the system or to an anomaly.
•
AlternativeR. This dataset is sampled from the Alternative behaviour model 2 𝒲_a2 and will be used as well in two different ways, to update the correct (change in the behavuour of the system) and failure (anomaly) behaviour models (ℳ_c and ℳ_f).

4.1. Simulation data

In this section we will discuse how the two models ℳ_c and ℳ_f and their behaviours evolve during the simulation. The parameters used are w = 24 for the time window, t_conf = t_rcf = 1.0 for the alarm thresholds and α = 0.5 for updating the models. Note that even if w = 24, in this experimentation the nodes in the last layer of the Bayesian networks are not connected to the nodes in the first layer. That means that when t = 1 it only uses data from the first hour of the day, while when t = 24 it uses the information of the whole day (it can be seen as a variable window size, from w = 1 to w = 24). The experiment is as follows.

Firstly, we create the structure of both models ℳ_c and ℳ_f as described in Section 3, that is, dependence relations between sensors in the same layer are not included, because we suppose that we do not have such domain information. Of course, if these relations (or others) are available as problem domain knowledge, they can be added to the graphical structure. Then, we use BasicT dataset to learn the parameters for ℳ_c, that is, P(AP₁),P(T₁),P(H₁),P(V₁) and P(AP_t|AP_t−1),P(T_t|T_t−1), P(H_t|H_t−1), P(V_t|V_t−1). In the case of ℳ_f these distributions are initialized at random (uniform distribution).

After ℳ_c and ℳ_f have been built, five different situations are tested: BasicV is used to validate the correct behaviour model (ℳ_c); AlternativeU is interpreted first as normal behaviour and afterwards as anomalous behaviour, in order to check the adaptability of the system; Finally, the same experimentation is done with the data set AlternativeR.

For all the experiments we show three graphics. The first two correspond to the measures (per hour) for conf and rcf respectively, while the last one is the number of detected anomalies per day by our proposal, and so the number of alarms sent. For the first and second graphics we show the first 1440 (two months) measures instead all the 4320 (the whole six months) because graphics get clear and the change in the data trend is inappreciable thenceforth.

4.1.1. BasicV as correct behaviour

The dataset BasicV is used to test the proposal. As BasicV comes from the same distribution as BasicT, the process should detect few anomalies, and so the number of alarms sent should also be small. In this case, as we know the inputs (sensors readings) correspond to correct machinery functioning, the operator will identify the alarm as false and model ℳ_c will be accordingly updated/refined.

As we can observe in Figure 9, the proposed process works properly and the number of alarms stays low or even decreases as the days go on and the model is refined. In this case, both measures, conf and rcf, give the maximum value in the early hours of the day and the minimum at the last one. This is because at the beginning of the day is when these measures takes the lowest amount of information, as we only use readings from the same day. Because of that, as the day progresses and more information can be used, both conf and rcf go to their lowest values. As both measures have a similar behaviour, the performance of the initial methodology and our proposal would be very similar.

4.1.2. AlternativeU as correct behaviour

The dataset AlternativeU is now used to test the proposal, but interpreting it as a change in the operation mode of the machinery. That is, something in the functioning, environmental condition, etc. has changed, which produces the differences in the sensors readings regarding the data used for training, however, each time an alarm is sent, the operator marks it as correct behaviour (i.e. a false positive).

As we can observe in Figure 10, the number of anomalies detected is very low even at the first days. This is because the model ℳ_c is quickly updated/refined according to the false anomalies detected, and the new data is understood as normal behaviour. It is worthpointing that only vibration readings has changed with respect to the basic behaviour (BasicV), and this change consists on higher values for these readings over the time. Therefore, the most relevant change in the probability distributions will lies in P(V₁), because in the following layers (as well as for the basic behaviour BasicV) sensors tend to follow their own measures in the previous layer.

Finally, again as both measures have a similar behaviour, the performance of the DSS would be very similar, whether we use the proposed improvement or not.

4.1.3. AlternativeU as anomalous behaviour

Now we use the same dataset AlternativeU but understanding its readings as failures. Thus, we start with the model ℳ_c trained with BasicT data. As in this case all the alarms sent by the algorithms are confirmed as anomalies by the operator, the updated model is ℳ_f and not ℳ_c.

As we can observe in Figure 11, in the first days the number of anomalies detected is small. However, after a few days, when ℳ_f is refined, almost all the (24) readings are classified as anomalies. We can see that the values of conf are similar to those in the previous case, where AlternativeU is understood as normal behaviour (just in the early hours of the day values tend to be higher). This is due to conf detects anomalies paying attention to the dependencies between variables, and as we said above, the model ℳ_c learnt that sensors tend to follow their own measure in the previous layer, so if vibration readings are high it will consider the most probable next reading will be also a high value. On the contrary, once ℳ_f is refined, rcf gets higher values as P_{ℳ_f} (e) ≥ P_{ℳ_c}(e).

In this case, the measure conf does not detect any anomalous behaviour. However, once the first triggered alarms are marked as failures, rcf starts to identify correctly the new ones, and is directly responsible of the increase in the number of alarms sent. In this case, our proposal is able to adapt the new situation and classify the new instances as anomalous behaviour while the methodology which only uses the measure conf is not.

4.1.4. AlternativeR as correct behaviour

The dataset AlternativeR is now used to test the proposal, but interpreting it as a change in the operation mode of the machinery. That is, something in the functioning, environmental condition, etc. has changed, which produces the differences in the sensors readings regarding the data used for training, however, each time an alarm is sent, the operator mark is as correct behaviour.

As we can observe in Figure 12 the number of anomalies detected is high at the first days, while this number decreases as the model ℳ_c is updated, and the new data is understood as normal behaviour. This is because now we are in a more complex situation than when using AlternativeU as correct behaviour. Now, apart from updating P(V₁), also P(V_t|V_t−1) needs to be re-trained in order to incorporate the behavioural change in ℳ_c.

4.1.5. AlternativeR as anomalous behaviour

Finally we use the same dataset AlternativeR but understanding it as failures. Thus, we start with the model ℳ_c trained with BasicT data. As in this case all the alarms sent by the algorithms are confirmed as anomalies by the operator, the updated model is ℳ_f and not ℳ_c.

As we can observe in Figure 13, in the first days almost all the (24) readings are classified as anomalies. It is worthpointing that, even if it looks like both measures conf and rcf have the same importance in this case, the first measure has more importance. If we pay attention to the first day, we can see that rcf follow the tend of conf. What is really happening is that first conf detects the anomaly (but not rcf), and after a few updates of ℳ_f then rcf will be able to detect as well as conf the anomalies (but no before these first updates). However, our proposal uses a combination of both measures. Therefore all the cases are detected correctly as failures, so the performance of both methodologies would be quite similar.

5. Conclusions

We have designed a general and robust decision support system tool for health management in industrial environments. The core of the system is a probabilistic expert system based on dynamic Bayesian networks. Fault detection is based on both conflict analysis and likelihood-ratio test.

Different types of failures has been tested, and due to the use of two measures to trigger alarms, they have been correctly detected. It is worth pointing that the second measure based on likelihood-ratio test only affects directly in one of the tested cases. However, it is an important case because it could represent a change in the usual operation mode of the monitored system. Additionally, although dependences between sensors are not considered by default, if some knowledge about the problem is available it can be included as a consequence of using Bayesian networks for modelling.

The expert system-based application has been implemented using multi-platform technology, so it can be deployed on any operative system. In order to avoid problems derived from editing the system configuration in parallel, we only allow one person to be editing the the system description at the same time. Because of that, it is recommendable that only one person would be the manager of the system.

Finally, even if the tool can be used on any kind of system, the time window w and the thresholds used to trigger alarms have to be fixed by an expert in order to obtain a good performance.

Acknowledgments

This work has been partially funded by FEDER funds, the Spanish Government and JCCM through projects TIN2013-46638-C3-3-P, TSI-020100-2011-140 and PEII-2014-049-P. Javier Cózar is also funded by the MICINN grant FPU12/05102.

Appendix A Application

The designed software is used as a DSS, so it can be used for both monitoring data and check if the system might be failing or not through the predictions. It follows the web-like client/server model: the information is managed and stored in a centralized system (the server) and clients can access to this information on demand.

On the server side we have two components, which can be deployed in the same machine or not. These are the database server and the web page server. The first one stores the data provided by the sensors, while the web page server provides a web interface for clients to use the application. The technology used to construct the models and make predictions is JAVA, and PHP to generate the web page and to implement web services (used by clients throughout AJAX). There are also some XML files to store information about the monitored system and preferences that clients can configure.

On the client side we use HTML5 plus Javascript to generate the webpage. It also will use AJAX to dynamically load the requested data.

The monitored system is logically divided into a hierarchical structure (see Figure A.1). Motes are the basic components which represent the physical sensors defined by the way we can access to their measures. To give a flexible abstraction layer, the way we can access to those measures is through a database. Hence, physical sensors send their measures to a server in charge of storing the data in a database. Because of that, there is a small delay introduced between the sensor readings and its processing in our DSS. However, for the scope of this application, this delay is assumable.

Machines represent a whole working unit, formed by a set of motes. It is not required that the sets of motes are disjoint. This allows the user to specify physical working units (physical machines with their associated sensors) as well as logical working units (a set of components in charge of some specific tasks, which might be shared between different physical machines).

Operations are associated with machines. They are represented by a subet of sensors (motes) from the associated machine. This is useful because sometimes is not desirable to monitor all the sensors of a particular machine, i.e: if we know the activity of a machine under supervision (cutting, polishing, etc.) we probably would prefer to monitor only the sensors allocated in the module in charge of doing that operation.

In Figure A.2 we can see the interface to manage the motes configuration. Throughout this interface we can specify the whole set of sensors used in our environment.

Respect to the visualization, the interface is divided into two parts. The first one, designed to manage machine and operation definitions, as well as to monitor the sensor readings. The second part of this interface corresponds to the health status prediction.

We can see the first part in Figure A.3. For the management, we can select the desired machine or operation from their respective drop list and use the edit or remove buttons. To add new machines or operations, the proper option appear in the drop lists.

For the monitorization, the requested data is plotted in two different widgets: a speedometer and a timeline. The first one is used to show the last measure while the last is used to see the trend. In time-line widgets we can define intervals and associate colors to each one. Finally, we can set a time window to select the data to be monitored. Every fixed amount of time (specified in a configuration XML file) this window time will go forward the same period of time.

The second part of this interface corresponds to the health status prediction (see Figure A.4). We can learn the models specifying a period of time (data in that interval will be used to build such models), or delete them in order to re-learn later. Once the model has been learnt we can see the measures given by the functions conf (e) and rcf (e) described in section 3 throughout two timeline plots. When the interpretation of these formulas means a failure alert, this information is shown in the table below.

Appendix B Bayesian network parameters

In this appendix we are going to detail the parameters for the BN ℬ_b. Note that the parameters for the BN ℬ_a are the same but those for the variable Vibration, which are 0.33 for each one of its three possible values (Low, Medium and High).

In Table B.1 we show the parameters for the network ℬ_b. There are three tables. In the first one (Table B.1a) we show the parameters for nodes without parents, that is AP₁ and H₁. In Table B.2 we show the parameters for nodes with only one ascendant, which are ∀t > 1,X ∈ {T₁,V₁,AP_t,H_t}. Parent(T₁) refers to AP₁, Parent(V₁) to H₁, Parent(AP_t) to AP_t−1 and Parent(H_t) to H_t−1. Finally, the third table (Table B.1c) shows the parameters for nodes with exactly two ascendants, which are ∀t > 1,X ∈ {T_t,V_t}. Parent(T_t) refers to AP_t and Parent(V_t) to V_t.

	Low	Medium	High
P(X)	0.600	0.300	0.100

(a) Parameters for nodes without ascendants (X ∈ {AP₁,H₁}).

Parent(X)	Low	Medium	High
P(X =Low)	0.750	0.200	0.050
P(X =Medium)	0.500	0.400	0.100
P(X =High)	0.350	0.450	0.200

(b) Parameters for nodes with only one ascendant (∀t > 1, X ∈ {T₁,V₁,AP_t,H_t}).

X_t−1 Parent(X_t)	Low			Medium			High
X_t−1 Parent(X_t)	Low	Medium	High	Low	Medium	High	Low	Medium	High
P(X_t =Low)	0.930	0.066	0.004	0.815	0.174	0.011	0.724	0.248	0.028
P(X_t =Medium)	0.815	0.174	0.011	0.595	0.381	0.024	0.467	0.480	0.053
P(X_t =High)	0.724	0.248	0.028	0.467	0.480	0.053	0.335	0.555	0.110

(c) Parameters for nodes with two ascendants (∀t > 1, X ∈ {T_t,V_t}).

Table B.1:

Parameters for nodes in the Bayesian network ℬ_b.

H₁X	Low	Medium	High
P(V₁ =Low)	0.029	0.194	0.777
P(V₁ =Medium)	0.010	0.198	0.792
P(V₁ =High)	0.004	0.123	0.873

(a) Parameters for V₁.

V_t−1 H_t)	Low			Medium			High
V_t−1 H_t)	Low	Medium	High	Low	Medium	High	Low	Medium	High
P(V_t =Low)	0.220	0.390	0.390	0.086	0.457	0.457	0.040	0.345	0.614
P(V_t =Medium)	0.086	0.457	0.457	0.030	0.485	0.485	0.014	0.355	0.631
P(V_t =High)	0.040	0.346	0.614	0.014	0.355	0.631	0.006	0.234	0.760

(b) Parameters for V_t where t > 1.

Table B.2:

Parameters for Vibration nodes in the Bayesian network ℬ_b for the Alternative behaviour 1.

In Table B.2 we show the parameters for ℬ_b used exclusively for the Alternative behaviour 1. It contains two tables. In Table B.2a we show the parameters for V₁, while Table B.2b shows the parameters for V_t where t > 1.

Footnotes

a

From now on we will simply write pa(X_i) instead of pa_𝒢 (X_i) when no confusion about the graph/network is possible

b

Notice that this is only a modelling assumption, not an strong constraint. In fact, if problem domain knowledge is available indicating direct dependence between two sensors, then this dependence can be simply added to the model, as we will see in Section 4.

References

1.C Athanasopoulou and V Chatziathanasiou, Intelligent system for identification and replacement of faulty sensor measurements in thermal power plants (ippamas: Part 1), Expert Systems With Applications, Vol. 36, No. 5, 2009, pp. 8750-8757.

2.V Chandola, A Banerjee, and V Kumar, Anomaly detection: A survey, ACM Comput. Surveys, Vol. 41, No. 3, 2009, pp. 15:1-15:58.

3.S-P Cheon, S Kim, S-Y Lee, and C-B Lee, Bayesian networks based rare event prediction with sensor data, Knowledge-Based Systems, Vol. 22, No. 5, 2009, pp. 336-343.

4.A Darwiche, Modeling and reasoning with Bayesian networks, Cambridge University Press, 2009.

5.MC Garcia, MA Sanz-Bobi, and J del Pico, SIMAP: Intelligent system for predictive maintenance: Application to the health condition monitoring of a windturbine gearbox, Computers in Industry, Vol. 57, No. 6, 2006, pp. 552-568.

6.E Gilabert and A Arnaiz, Intelligent automation systems for predictive maintenance: A case study, Robotics and Computer-Integrated Manufacturing, Vol. 22, No. 5, 2006, pp. 543-549.

7.D Heckerman, D Geiger, and DM Chickering, Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning, Vol. 20, No. 3, 1995, pp. 197-243.

8.M Henrion, Propagating uncertainty in Bayesian networks by probabilistic logic sampling, Elsevier Science, in In Uncertainty in Artificial Intelligence 2 Annual Conference on Uncertainty in Artificial Intelligence (UAI-86) (1986), pp. 149-163.

9.DJ Hill, BS Minsker, and E Amir, Real-time Bayesian anomaly detection in streaming environmental data, Water Resources Research, Vol. 45, No. 4, 2007, pp. 1-16.

10.FV Jensen, B Chamberlain, T Nordahl, and F Jensen, Analysis in hugin of data conflict, Uncertainty in Artificial Intelligence, Elsevier, Vol. 6, 1991, pp. 519-528.

11.FV Jensen and TD Nielsen, Bayesian networks and decision graphs, 2nd edition, Springer, 2007.

12.UB Kjaerulff and AL Madsen, Bayesian networks and influence diagrams: A guide to construction and analysis, 1st edition, Springer Publishing Company, Incorporated, 2010.

13.D Koller and N Friedman, Probabilistic graphical models: Principles and techniques, The MIT Press, 2009.

14.K Korb and AE Nicholson, Bayesian artificial intelligence, CRC Press, Inc., Boca Raton, FL, USA, 2003.

15.A Kusiak and W Li, The prediction and diagnosis of wind turbine faults, Renewable Energy, Vol. 36, No. 1, 2011, pp. 16-23.

16.H Liu, F Hussain, CL Tan, and M Dash, Discretization: An enabling technique, Data Mining and Knowledge Discovery, Vol. 6, No. 4, 2002, pp. 393-423.

17.CD Manning and H Schütze, Foundations of statistical natural language processing, MIT press, 1999.

18.N Mehranbod, M Soroush, and C Panjapornpon, A method of sensor fault detection and identification, Journal of Process Control, Vol. 15, No. 3, 2005, pp. 321-339.

19.A Muller, M-C Suhner, and B Iung, Formalisation of a new prognosis model for supporting proactive maintenance implementation on industrial system, Reliability Engineering & System Safety, Vol. 93, No. 2, 2008, pp. 234-253.

20.K Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, UC Berkeley, Computer Science Division, 2002. PhD thesis,

21.RE Neapolitan, Learning Bayesian networks, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2003.

22.TD Nielsen and FV Jensen, On-line alert systems for production plants: A conflict based approach, International Journal of Approximate Reasoning, Vol. 45, No. 2, 2007, pp. 255-270.

23.J Pearl, Probabilistic reasoning in intelligent systems: Networks of plausible inference, Morgan Kaufmann Publishers Inc, 1988.

24.M Pecht, Prognostics and health management of electronics, Wiley Online Library, 2008.

25.J Rabatel, S Bringay, and P Poncelet, Anomaly detection in monitoring sensor data for preventive maintenance, Expert Systems with Applications, Vol. 38, No. 6, 2011, pp. 7003-7015.

26.S Verron, T Tiplica, and A Kobi, Fault diagnosis of industrial systems by conditional Gaussian network including a distance rejection criterion, Engineering Applications of Artificial Intelligence, Vol. 23, No. 7, 2010, pp. 1229-1235.

27.G Weidl, A Madsen, and S Israelson, Applications of object-oriented Bayesian networks for condition monitoring, root cause analysis and decision support on operation of complex continuous processes, Computers & Chemical Engineering, Vol. 29, No. 9, 2005, pp. 1996-2009.

28.BG Xu, Intelligent fault inference for rotating flexible rotors using Bayesian belief network, Expert Systems with Applications, Vol. 39, No. 1, 2012, pp. 816-822.

29.S Jakubek and T Strasser, Fault-diagnosis using neural networks with ellipsoidal basis functions, American Control Conference, Vol. 5, 2002, pp. 3846-3851.

30.G Williams, R Baxter, H He, S Hawkins, and L Gu, A comparative study of RNN for outlier detection in data mining, IEEE, 2002, pp. 709-709.

31.C De Stefano, C Sansone, and M Vento, To reject or not to reject: that is the question-an answer in case of neural classifiers, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, Vol. 30, 2000, pp. 849-94.

32.D Barbará, J Couto, S Jajodia, and N Wu, ADAM: a testbed for exploring the use of data mining in intrusion detection, ACM Sigmod Record, Vol. 30, No. 4, 2001, pp. 15-24.

33.M Ester, H Kriegel, J Sander, and X Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Kdd, Vol. 96, No. 34, 1996, pp. 226-231.

34.L Ertöz, M Steinbach, and V Kumar, Finding topics in collections of documents: A shared nearest neighbor approach, Clustering and Information Retrieval, 2003, pp. 83-84.

35.Smith, Bivens, Embrechts, Palagiri, and Szymanski, Clustering approaches for anomaly based intrusion detection, Proceedings of intelligent engineering systems through artificial neural networks, 2002, pp. 579-584.

36.M Ramadas, S Ostermann, and B Tjaden, Detecting anomalous network traffic with self-organizing maps, Recent Advances in Intrusion Detection, 2003, pp. 36-54.

37.A Pires and C Santos-Pereira, Using clustering and robust estimators to detect outliers in multivariate data, in Proceedings of the International Conference on Robust Statistics (2005).

38.M Otey, S Parthasarathy, A Ghoting, G Li, S Narravula, and D Panda, Towards nic-based intrusion detection, in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (2003), pp. 723-728.

39.Das and Schneider, Detecting anomalous records in categorical datasets, in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (2007), pp. 220-229.

40.D Barbara, N Wu, and S Jajodia, Detecting Novel Network Intrusions Using Bayes Estimators, SDM, 2001, pp. 1-17.

41.Yeung and Chow, Parzen-window network intrusion detectors, in Pattern Recognition, 2002. Proceedings. 16th International Conference on (2002), Vol. 4, pp. 385-388.

42.L Eskin and Stolfo, Modeling system calls for intrusion detection with dynamic window sizes, in DARPA Information Survivability Conference & Exposition II, 2001. DISCEX’01. Proceedings (2001), Vol. 1, pp. 165-175.

43.Agarwal, An empirical bayes approach to detect anomalies in dynamic multidimensional arrays, in Data Mining, Fifth IEEE International Conference on (2001), pp. 8. –

44.Abraham and Chuang, Outlier detection and time series modeling, Technometrics, Vol. 31, No. 2, 1989, pp. 241-248.

45.Solberg and Lahti, Detection of outliers in reference distributions: performance of Horns algorithm, Clinical chemistry, Vol. 51, No. 12, 2005, pp. 2326-2332.

46.Breunig, Kriegel, Ng, and Sander, Optics-of: Identifying local outliers, Principles of data mining and knowledge discovery, 1999, pp. 262-270.

47.Byers and Raftery, Nearest-neighbor clutter removal for estimating features in spatial point processes, Journal of the American Statistical Association, Vol. 93, No. 442, 1998, pp. 577-584.

48.PA Aguilera, A Fernández, F Reche, and R Rum´ı, Hybrid Bayesian network classifiers: application to species distribution models, Environmental Modelling & Software, Vol. 25, No. 12, 2010, pp. 1630-1639.

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Journal: International Journal of Computational Intelligence Systems
Volume-Issue: 10 - 1
Pages: 176 - 195
Publication Date: 2017/01/01
ISSN (Online): 1875-6883
ISSN (Print): 1875-6891
DOI: 10.2991/ijcis.2017.10.1.13 How to use a DOI?
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Javier Cózar
AU  - José M. Puerta
AU  - José A. Gámez
PY  - 2017
DA  - 2017/01/01
TI  - An Application of Dynamic Bayesian Networks to Condition Monitoring and Fault Prediction in a Sensored System: a Case Study
JO  - International Journal of Computational Intelligence Systems
SP  - 176
EP  - 195
VL  - 10
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.2017.10.1.13
DO  - 10.2991/ijcis.2017.10.1.13
ID  - Cózar2017
ER  -

download .riscopy to clipboard

International Journal of Computational Intelligence Systems

An Application of Dynamic Bayesian Networks to Condition Monitoring and Fault Prediction in a Sensored System: a Case Study

1. Introduction

2. Bayesian Networks

3. Fault Diagnosis Methodology

3.1. Bayesian networks to detect anomalous behaviours

3.1.1. Probabilistic model for failure-free behaviour: ℳc

3.1.2. Probabilistic model for anomalous behaviour: ℳf

3.2. Anomaly detection procedure

3.2.1. Metric conf (e)

3.2.2. Metric rcf (e)

3.2.3. Use of conf (e) and rcf (e) for anomaly detection

3.3. Models updating

4. Simulated Case of Study

4.1. Simulation data

4.1.1. BasicV as correct behaviour

4.1.2. AlternativeU as correct behaviour

4.1.3. AlternativeU as anomalous behaviour

4.1.4. AlternativeR as correct behaviour

4.1.5. AlternativeR as anomalous behaviour

5. Conclusions

Acknowledgments

Appendix A Application

Appendix B Bayesian network parameters

Footnotes

References

Cite this article

3.1.1. Probabilistic model for failure-free behaviour: ℳ_c

3.1.2. Probabilistic model for anomalous behaviour: ℳ_f