Fine-Tuning the Fuzziness of Strong Fuzzy Partitions through PSO

We study the influence of fuzziness of trapezoidal fuzzy sets in the strong fuzzy partitions (SFPs) that constitute the database of a fuzzy rule-based classifier. To this end, we develop a particular representation of the trapezoidal fuzzy sets that is based on the concept of cuts, which are the cross-points of fuzzy sets in a SFP and fix the position of the fuzzy sets in the Universe of Discourse. In this way, it is possible to isolate the parameters that characterize the fuzziness of the fuzzy sets, which are subject to fine-tuning through particle swarm optimization (PSO). In this paper, we propose a formulation of the parameter space that enables the exploration of all possible levels of fuzziness in a SFP. The experimental results show that the impact of fuzziness is stronglydependentonthedefuzzificationprocedureusedin fuzzyrule-based classifiers.Fuzzinesshaslittle influenceinthecase of winner-takes-all defuzzification, while it is more influential in weighted sum defuzzification, which however may pose some interpretation problems


INTRODUCTION
The design of fuzzy inference systems promotes interpretability as a key factor to express the embedded knowledge in a plain readable and understandable way.As a matter of fact, interpretability is the most important quality that justifies the adoption of fuzzy inference systems in real-world applications [1,2,3,4].When such systems have to be acquired through data-driven approaches, two main design issues arise: (i) the resulting fuzzy inference system should adequately fit data; (ii) the knowledge base should be interpretable to end-users.This led to the development of design methodologies that take into account both accuracy and interpretability [5,6,7,8,9,10,11]; in parallel, the very concept of interpretability, its definition and assessment are matter of current research [12,13,14,15].
In this paper we focus on a specific type of fuzzy inference system, namely the fuzzy rule-based classifier, which adopts a knowledge base consisting of a set of fuzzy classification rules in the form Rule r: 1 AND ⋯ AND x m is A (r) m THEN c (r)  where a complex antecedent consists of a conjunction of soft constraints binding each variable x i to a linguistic term A (r) i , and a simple consequent represents a class label c (r) .(Further details on fuzzy rule-based classifiers can be found in literature [16,17,18].)The linguistic terms, bound to the same variable in all rules, form a linguistic variable, which includes information to map each linguistic term to a fuzzy set through an operation of interpretation.In other words, the (explicit) semantics of a linguistic term is defined by a fuzzy set, whose membership function is usually determined through some data-driven process.On the other hand, a linguistic term is usually drawn from natural language (e.g."low, " "old, " etc.), therefore it carries an implicit semantics that, in a given context, is assigned by a user when reading the term.The ultimate objective of interpretability-driven design of linguistic variables is to define fuzzy sets so that explicit and implicit semantics of the corresponding linguistic terms highly overlap [19].
The interpretation of linguistic terms determines a collection of fuzzy sets: a standard approach to impose interpretability of linguistic variables is to consider such an ensemble of fuzzy sets as a granulation (or fuzzy partition) of the variable domain.In particular, a convenient way to define such fuzzy sets is through strong fuzzy partitions (SFP), i.e., collections of complementary fuzzy sets where the sum of the membership degrees over all fuzzy sets is always equal to the unity, whatever the variable value in the domain.SFPs are convenient tools because they help in satisfying a number of basic interpretability constraints and enable efficient inference since complementarity avoids that many rules are simultaneously active for a given input [20].Usually, the fuzzy sets of a SFPs are defined through triangular or trapezoidal membership functions, although less common alternatives exist [21,22,23].In this work we focus on trapezoidal fuzzy sets because they can be efficiently computed and offer greater flexibility than triangular partitions, the latter being considered as a special case.
Typical data-driven design techniques use optimization methods to set the parameters of the fuzzy sets so as to adapt to data.(Eventually, such optimization is constrained to preserve the well-formedness of the involved SFPs.)In most cases, fuzzy sets are optimized in terms of both their position and their fuzziness.The first is related to the sub-regions of the domain where the fuzzy sets mostly influence the inference; the latter indicates how much the fuzzy sets are different from crisp sets.Fuzziness and position can be differently measured [24,25].For the sake of the present study, position can be defined in terms of 0.5 cut; fuzziness can be defined in terms of the slopes of the oblique slides of the trapezoidal fuzzy sets.In this way, it is possible to well separate position and fuzziness, and analyze them separately.
In this work we address the problem of assessing the impact of fuzziness in the performance of a fuzzy rule-based classifier, provided that its position is kept fixed.We consider this issue to be a very important one because many data-driven design methods mainly focus on position of fuzzy sets, while fuzziness is relegated to a more marginal role.As an example, in some works SFPs are defined by looking at prototypes (often obtained after some optimization process), then fuzzy sets of triangular shape are defined so as to form a SFP [26,27,28,29]: in such a case, the information coming from prototypes actually settles the fuzzy sets position, while the determination of fuzziness is only functional to preserve the well-formedness of the partition.
In a previous work of ours, we showed that trapezoidal fuzzy sets offer more degrees of freedom than triangular fuzzy sets, but they still preserve the well-formedness of SFPs [30].The adoption of triangular fuzzy sets is therefore self-limiting as it introduces a bias which may hinder the reach of acceptable accuracy levels.On the other hand, trapezoidal fuzzy sets require the setting of more hyperparameters, therefore some existing design techniques (which compute the prototypes of fuzzy sets) cannot be directly applied.To overcome this problem, data-driven optimization can be split into two steps: optimization of position and fine-tuning of fuzziness.We already proposed two methods for fine-tuning the fuzziness of trapezoidal fuzzy sets [31].Those methods, however, cannot guarantee that the space of all possible SFPs is thoroughly spanned in search of an optimal solution.In this paper we propose an extension of those methods and we prove that this extension is capable to explore the entire search space.In this way, we are able to empirically assess the effects of fuzziness on the overall accuracy of a fuzzy rule-based classifier (while position is kept unchanged), with different inference settings.
After a brief outline of the related work reported in Section 2, in Section 3 we formally define the concepts of SFPs and trapezoidal fuzzy sets, together with the constraints that such fuzzy sets must fulfill to preserve the well-formedness of a SFP.In Section 4 we introduce PSO and we sketch the way to represent trapezoidal SFPs as particles.In Section 5 we formally prove that the proposed representation allows PSO to span the entire space of SFPs with fixed positions.Section 6 summarizes the experimental results on synthetic and benchmark datasets.Finally, Section 7 concludes the work by highlighting the main results and setting the direction of future research.

RELATED WORK
The design of SFPs in fuzzy modeling is long-stated.The simplest design approach is uniform granulation, where the domain of a feature (assumed to be a closed interval in the real line) is partitioned by detecting a number of equally-spaced points in the domain, which eventually serve as prototypes of triangular fuzzy sets.The number of such prototypes is usually user-defined although some computational methods have been proposed in literature for its determination [32].
Noticeably, the position and fuzziness of fuzzy sets determined by uniform granulation do not depend on data: the reasons to use this method are related to the appreciable interpretability of the resulting partitions and the relative difficulty to determine data-driven SFPs with comparable interpretability level.Nevertheless, uniform granulation is too rigid as it does not adapt to data.For such a reason, several methods have been proposed to enable a data-driven design of fuzzy partitions, especially in the realm of evolutionary computation [33].
Starting from some pioneering works on data-driven structure identification of fuzzy rule bases [34], a huge literature developed around the problem of data-driven design of fuzzy partitions [5,35,36,37].Two general approaches can be identified when attempting to classify data-driven design methods for fuzzy partitions [38]: First Interpretability Then Accuracy (FITA), where fuzzy partitions are first defined (often through uniform granulation) then fine-attuned to data in order to maximize accuracy; First Accuracy Then Interpretability (FATI), where fuzzy partitions are firstly generated from data (often through data clustering) then adjusted to meet a number of interpretability requirements.In most cases, fuzzy sets are adjusted both in their position and fuzziness in order to optimize some objective function.
In some cases, fuzziness is explicitly addressed separately from position.As an example, Alcalá et al. [39] use a particular representation of linguistic terms involving two different parameters expressing the variation of position and fuzziness with respect to a reference SFP.Specifically, fuzziness is quantified as the length of the support of triangular fuzzy sets.These variations are used as parameters to be optimized through a genetic algorithm (GA).Results show that by adjusting both position and fuzziness, the accuracy of the resulting fuzzy rule-based systems can be improved.That is paid in terms of a decreased interpretability since the resulting partitions are no longer strong.Noticeably, the reported experiments show that the introduction of fuzziness optimization leads to a slight improvement with respect to a schema where only position is optimized.
Sanz et al. [40] use GAs for improving the performance of fuzzy rule-based classification systems by adapting the fuzziness of (triangular) fuzzy sets as measured by the length of their support.In this work, interval-valued fuzzy sets are used to represent linguistic terms, and inference depends on lower and upper triangular fuzzy sets that may not satisfy the requirement of SFPs.Also, the use of triangular fuzzy sets only (though of type-2) does not assure that all possible fuzziness degrees are exploited to find the final fuzzy partitions.Nevertheless, experimental results show that accuracy can be improved by acting on the fuzziness of the fuzzy sets.
In many cases, triangular fuzzy sets are used to define the semantics of linguistic terms.In few works, however, trapezoidal fuzzy sets have been employed with the result of a greater flexibility.For example, Nguyen et al. [41] showed that the extension of hedge algebra semantics by trapezoidal fuzzy sets (in place of triangular) leads to an average improvement of the resulting fuzzy rule-based classifiers in terms of both structural complexity and accuracy.
Evolutionary algorithms have been employed to optimize trapezoidal fuzzy sets for data-driven design of fuzzy partitions [42].However, differently from triangular fuzzy sets, special care must be paid on the way fuzzy sets' parameters are fixed: without proper constraints, it is not guaranteed that the resulting partition satisfies the requirements of a SFP.
The aforementioned research works, alongside subsequent developments, stimulated a research question concerning the evaluation of the influence of the sole fuzziness in the data-driven optimization of a fuzzy partition, given a certain type of fuzzy rule-based system.To conduct this study, an optimization framework is needed with some key features: i. the optimization process must operate in the space of fuzziness only, so as to avoid the interference due to changing the position of fuzzy sets.To this purpose, a proper formalization of fuzziness is required; ii. the optimization process should potentially span all possible values of fuzziness for all the fuzzy sets involved in the fuzzy partitions that constitute the database of a fuzzy rule-based system; iii. the optimization process must not consider configurations of fuzziness that violate the requirements of SFPs.
The following sections are devoted to illustrate the proposed approach for optimizing the fuzziness in the way declared above.
The main assumptions in this work are 1.we consider fuzzy rule-based classifiers only, with different defuzzification schemes and no rule weights involved.This is the simplest structure of rules that can be useful to highlight the influence of fuzziness in the final quality of the model; 2. we use trapezoidal fuzzy sets because they are more flexible than triangular fuzzy sets and provide a wider search space in terms of fuzziness.Additionally, fuzziness of trapezoidal fuzzy sets can be easily formalized in terms of the slopes of their oblique slides.
As a final note of caution, we underline how our proposal does not concern a new modeling technique (which would be improper since fuzzy sets are not modified in terms of position), rather it is focused on the analysis of the influence of fuzziness in the performance of the selected models.

CUT-BASED REPRESENTATION OF SFPs
With the purpose of separating position and fuzziness of the trapezoidal fuzzy sets involved in a fuzzy partition, we resort to a specific representation that is focused on the crossing points between adjacent fuzzy sets, which are henceforth called cuts.
A SFP can be composed on U by a sequence of normal and convex For i = 1, … , n, we assume that the fuzzy sets involved in any couple (A i , A i+1 ) intersect in such a way that there exists a single point 5, due to the particular arrangement of the SFP.In other words, a sequence of points t 1 , … , t n , such that t i < t i+1 , can be identified in U and any range defines the boundaries of the 0.5-cut related to the fuzzy set A i+1 (see Figure 1 as an illustrative example with n = 4).Such points are referred to as cuts and play a crucial role in the context of our study.In fact, we are going to elect the sequence of cuts as a constraint to arrange an optimal SFP: assigned the points t 1 , … , t n , we want to explore the way to build onto them a partition which could prove to produce better results when employed in a fuzzy inference system.For each fuzzy set in the SFP, this approach allows to decouple its position, which is identified by two consecutive cuts, from its fuzziness, which stands as a degree of freedom that is susceptible of optimization.
Different types of fuzzy sets can be involved into the realization of a SFP; among them, the trapezoidal fuzzy sets are characterized by a four-parameter function T[a, b, c, d] defined as with the parameters a, b, c, d subjected to the condition: which ensures the well formedness of the fuzzy sets.Obviously, a trapezoidal fuzzy set assumes a triangular shape if the particular case b = c occurs.
Indeed, the choice of the fuzzy set typology is significant in our research.Since we are dealing with the problem to design SFPs constrained by cuts, it is worth recalling that the employment of triangular fuzzy sets is a limiting choice, which would lead to the impossibility to produce triangular SFPs when some particular sequences of cuts are assigned.The adoption of trapezoidal fuzzy sets, instead, is flexible enough to guarantee the realization of wellformed SFPs whatever sequence of cuts is assigned, as demonstrated in a previous work of ours [30].
The fuzzy sets involved in a trapezoidal SFP must be such that it can be noticed how the sequence of fuzzy sets is confined at the extremes by rectangular trapezoids leaning against the limits m U and M U of U. To tailor a trapezoidal SFP to an assigned sequence of cuts t 1 , … , t n , an additional couple of conditions (which are of course equivalent due to (2)) must hold: The conditions expressed in (3) allow the cut t i to be in the middle point between two consecutive vertices of a trapezoidal fuzzy set.
Several methods may be adopted to build a trapezoidal SFP from cuts.The Constant Slope (CS) method [30] is a very simple and intuitive one, which can be applied in full compliance with the aforementioned conditions (2-3).According to the CS method, a common slope is set for all the trapezoidal fuzzy sets involved in the resulting SFP.This specific slope is the one1 associated to the sides of the triangular fuzzy set centered on the middle point between the nearest cuts in the assigned sequence.Figure 2 illustrates an application of the CS method: the sequence of cuts t 1 , t 2 , t 3 is assigned to produce a partition involving 4 fuzzy sets.Firstly, the minimum distance Δ min among cuts is identified where m U = t 0 and M U = t 4 have been imposed to integrate the sequence of cuts (this practice is going to be replied in the rest of the present work).Then, a triangular fuzzy set (labeled as A 2 ) is built onto Δ min and its specific slopes on the left and the right sides are imposed to construct the remaining trapezoidal fuzzy sets composing the resulting SFP.
Other methods can be adopted to derive trapezoidal SFPs from cuts: each of them is supposed to provide different results [30].All of the partitions are characterized by a number of trapezoidal fuzzy sets with fixed positions (due to the prearranged sequence of cuts) and exhibit dissimilar fuzziness as determined by the different slope configurations of the oblique slides.However, a single method is able to produce a single partition of fuzzy sets, against a number of possibilities which is infinite.Hence the necessity to devise a way for fully exploring this space of possibilities to properly investigate how fuzziness affects the performance of such SFPs when they are applied in some contexts of application.

PARTICLE SWARM OPTIMIZATION AS A TOOL TO OPTIMIZE FUZZINESS
Particle swarm optimization (PSO) is a computational method devised to perform stochastic optimization [43].Originally introduced in 1995 [44], it soon started attracting interest from researchers and underwent further analysis, variations, and development [45,46,47,48].
PSO emulates social behavior with special references to such organized groups as insect swarms, fish schools, and bird flocks.All of them are composed by a large number of individuals which all together interact and coordinate their movements to contribute to the global repositioning of the group.In this way, the whole set of individuals is able to reach some goal of common interest.Drawing inspiration from this natural behavior, PSO incorporates a number of entities moving through a search space to find a (nearly-)optimal solution for a given problem.Movement and velocity are key concepts, so the involved entities are called "particles, " a term which best suits these notions.Just like in a flock or a swarm, the particles interact with one other and, at the same time, they learn from their own experience.To keep track of advancements, PSO evaluates the particles' positions through an objective function (which must be minimized) and then moves the population at each successive iteration of the process.The movement of each particle through the search space is determined by combining information about its current and best position, the best positions reached by one or more other particles in the swarm, and some random perturbations.As a global result, the population gradually moves toward better regions of the search space, reaching a final position which can be regarded as the solution of the problem at hand.
Given an objective function f ∶ E → ℝ to be minimized, a formal description of the R-dimensional search space E to be explored by PSO can be expressed as follows: being [l r , u r ] ⊂ ℝ for r = 1, 2, … , R. Let S be the number of particles composing the swarm which is going to explore E: at time , each of them is associated to three R-dimensional vectors: • x i (), a position vector encoding the current location of the particle; • v i (), a velocity vector which previously contributed to determine the current position vector of the particle; • p i (), a memory vector encoding the best position reached so far, for i = 1, … , S. A further piece of information regards the particle neighborhood (possibly, the entire swarm) and it is encoded in the R-dimensional vector: • p g (), a position vector encoding the best position globally reached so far.
By the term "best" position, we refer to the evaluation of a position vector performed through the objective function, which is purposely designed depending on the problem at hand.
The PSO is organized as an iterative process: at each step, the position and velocity vectors of each particle are updated as follows:2 where ∘ is the component-wise multiplication;  is a positive value (the inertia weight) which can be regarded as the fluidity of the swarm motion; u(0,  i ) are R-dimensional vectors composed by random numbers uniformly distributed in [0,  i ];  1 ,  2 are positive values (the cognitive and social coefficients) which can be regarded as the attraction of the i-th particle towards the best of its positions and the global best position, respectively.
To some degree, the PSO working machinery resembles another nature-inspired algorithm, i.e., the GA, which is also a stochastic population-based optimization process.Both particles and chromosomes similarly act as candidate solutions and the objective function in PSO clearly recalls the fitness function of GAs.However, some differences between these methods can be highlighted [49].PSO does not implement selection and lacks genetic operators such as crossover and mutation (even if a kind of crossover is represented by the combined information of particle and neighborhood best positions, useful to evaluate acceleration).Also, PSO is intrinsically directional in its process, while the GA mutation is omnidirectional in its nature.All in all, PSO is able to produce satisfactory results in terms of optimization while showing the benefit of an easier implementation and a reduced number of parameters to adjust when compared with GA.It applies, therefore, as a suitable candidate to carry on the exploration of the search space we are going to investigate in the present work.
The scenario under study is such that the SFPs are meant to be exploited to build up a fuzzy rule-based inference system from data.Therefore, we deal with the following setting: (a) the cuts are supposed to be imposed by the specific context and derived from the analysis of the dataset at hand (by means of a particular clustering process); (b) the objective function steering the PSO process is represented by the average accuracy obtained while evaluating the rule base corresponding to the derived SFP; (c) the particles' position in the search space must be related to the shape of trapezoidal fuzzy sets.
It can be observed how, according to (a), the locations of the involved trapezoidal fuzzy sets are fixed in our setting.
We already applied PSO to design trapezoidal SFPs in a preliminary study of ours [31].In particular, concerning point (c), we put a direct correspondence between the particles and the edges of a trapezoidal fuzzy set.More precisely, given a sequence of cuts t 1 , t 2 , … , t n , recalling that a SFP is fully characterized by the sequence of 4(n + 1) parameters we observe that each sub-sequence c i , d i , a i+1 , b i+1 can be reduced to c i .In fact, from (2) and (3) we get a i+1 = c i and b i+1 = d i = 2t i − c i .From being also a 1 = b 1 = m U and c n+1 = d n+1 = M U , we conclude that the sequence (6) can be fully recovered by the sequence which includes the only free parameters that can be modified in order to get different SFPs, provided that the bounds m U , M U and the cuts t 1 , … , t n are preserved.
Moving from those assumptions, it is straightforward to consider an n-dimensional search space 3 where the particles can shift their position vectors which ultimately correspond to sequences of trapezoid edges as in (7).From (4) we know that the search space is composed by bounded intervals and in our past study we related them to the ranges where each trapezoid edge c i ∈ [l i , u i ] must find place. 3Up to now, we considered the SFP which can be drawn on a single UoD.When dealing with real-world problems, the related datasets usually involve several dimensions (say, D), thus implying that the same reasoning scheme must be replied for all the involved UoDs.In this sense, a sequence of cuts must be provided for each dimension, (t 1 , t 2 , … , t n 1 ), … , (t 1 , t 2 , … , t n D ), which in turn requires the PSO to look for a number of D sequences of free parameters (c 1 , c 2 , … , c n 1 ), … , (c 1 , c 2 , … , c n D ), thus enlarging the dimension of the search space up to the value n 1 +n 2 +…+n D .For the sake of simplicity, we take for granted such a clarification and we keep it implicit, while continuing to refer in the following to the case of a single UoD.
Since the sequence of cuts m U = t 0 , t 1 , t 2 , … , t n , t n+1 = M U is assigned, it can be easily inferred that and special attention must be paid to correctly specify the subsets of the intervals defined by the cuts where every c i should be placed.In fact, their arbitrary positioning inside [t i−1 , t i ] could compromise the well-formedness of the resulting trapezoidal fuzzy sets, which is to be preserved instead.
Therefore, we developed a couple of strategies to constrain the placement of the c i edges inside each range, basically founded on a geometrical analysis of the SFP design problem.The strategies we adopted were termed Leftmost Slope Constraint and Constant Slope Constraint (we address the interested reader to Ref. [31 for further details about them): they have been exploited to properly define the bounded intervals composing the search space which has been then explored by the PSO process being assured about the well-formedness of all the trapezoidal SFPs.
The results we obtained were encouraging, however we were conscious that the imposed constraints reduced the search space as a side effect.In other words, during the PSO process the particles could not explore all the infinite possible solutions of the problem at hand, but a subset of them (which is different depending on the specific strategy adopted to constrain the ranges [l i , u i ]).This is the reason why we developed an alternative strategy to relate the particles' positions and the edges of the trapezoidal fuzzy sets, as we are going to discuss in the following section.

A NEW DEFINITION OF THE PSO SEARCH SPACE
The preliminary tests showed that PSO stands as a suitable tool to support the design of SFPs based on cuts.However, to fully appreciate the effectiveness of this approach, a thorough exploration of the search space must be performed.In this way, we can evaluate how much the PSO algorithm is able to fine-tune the fuzziness of the fuzzy sets involved in an inference system whose realization is constrained by some assigned cuts.In this section, therefore, we introduce a novel formalization of the particles involved in the PSO process and we demonstrate that it is instrumental in shaping a search space including all (and only) the trapezoidal SFPs possibly standing on a UoD.
To this aim, we alter the meaning of the particle positions in the search space: instead of being regarded as the set of edges c 1 , … , c n of the trapezoidal fuzzy sets, they are simply interpreted as a set of values  1 , … ,  n ∈ [0, 1].By doing so, the bounded intervals composing the search space (see equation ( 4)) assume a fixed form: To derive the edges of the fuzzy sets, we proceed as follows.Assigned a sequence of cuts m U = t 0 , t 1 , t 2 , … , t n , t n+1 = M U over a UoD U = [m U , M U ], we define: It can be observed how all the points a i , b i , c i , d i framing the fuzzy sets are defined for i = 1, … , n + 1. Particularly, the sequence c 1 , … , c n is evaluated in terms of  1 , … ,  n , i.e. the previously introduced values provided by the PSO search.Also, they are defined in terms of s 1 , … , s n , being Figure 3 provides an intuitive understanding of the s i placement with respect to the cuts positions.The configuration depicted in Figure 3(a) is such that b i < (2t i − t i+1 ): in this case a safe area to set the c i point is determined (represented by the green region in figure), ranging from (2t i − t i+1 ) to t i .Setting c i outside from this area would disrupt the SFP design, as shown by the red dashed trapezoidal fuzzy set in figure which intersects the cut t i+1 .The configuration depicted in Figure 3(b) is such that (2t i − t i+1 ) < b i : in this case a safe area to set the c i point is ranging from b i to t i .Setting c i outside from the safe area would disrupt the well-formedness of the trapezoidal fuzzy set.
Moving from the above assumptions, we are going to demonstrate that, as the values  1 , … ,  n vary in [0, 1], it is possible to explore the whole space of the trapezoidal SFPs which can be drawn on a UoD.The proof is split into two theorems to prove that each sequence  1 , … ,  n allows to derive: (i) only well-formed trapezoidal SFPs; (ii) all the obtainable well-formed trapezoidal SFPs.
Firstly we must consider the following lemmas: given the position (8b), the following inequality holds: we can evaluate c i by distinguishing two alternative cases: From (11a) we obtain On the other hand, from (11b) we obtain Lemma 2 Given the same premises of Lemma 1, for i = 1, … , n the following inequality holds: Proof.For i = 1 we should prove that s 1 ⩽ t 1 while considering the possible values of s 1 : For i = 2, … , n, we can prove that s i ⩽ t i by reductio ad absurdum while considering the possible values of s i .Whenever (2t i − t i+1 ) > b i we get s i = 2t i −t i+1 and, by contradiction with (13), the following inequality would result: which is absurd since we posed that t 1 , t 2 , … , t n ∈ U, with t i < t i+1 . Whenever and, by contradiction with (13), the following inequality would result: which is absurd due to Lemma 1.  Proof.To construct a well-formed trapezoidal SFP the following relationship must hold: as well as the equalities expressed in (8b)-(8d).
From Lemma 2 we argue that c i ∈ [s i , t i ] for i = 1, … , n. From being b i ⩽ s i , we are certain about the relationship b i ⩽ c i ⩽ t i for i = 1, … , n.Therefore, from (8d), we get also c i ⩽ t i ⩽ d i for i = 1, … , n (the case related to i = n + 1 refers to the rightmost rectangular trapezoid, where c i , d i , and t i are collapsing to the same point M U ). Finally, from (8c), we get a i+1 ⩽ t i ⩽ b i+1 for i = 2, … , n (the case related to i = 1 refers to the leftmost rectangular trapezoid, where a i , b i , and t 0 are collapsing to the same point m U ).
In summary, the relationship ( 15) is proven and we can argue that, however set the values of ( 1 ,  2 , … ,  n ) in [0, 1] n , the resulting partition of the UoD is a well-formed trapezoidal SFP.Proof.If the values of ( 1 , … ,  n ) are to be involved into the construction of the SFPs, then all the vertexes framing the trapezoidal fuzzy sets must be such that c i ∈ [s i , t i ].This is due to the definition expressed in (8b) when we consider that s i ⩽ t i (from Lemma 2) and  i ∈ [0, 1] for i = 1, … , n.By contradiction of the thesis, therefore, there should be a trapezoidal SFP such that However, the inequality in (16b) contrasts with (15), thus ruining the well-formedness of the resulting trapezoidal SFP.Concerning the inequality in (16a), from being The inequality in (17a) again contrasts with (15).The inequality in (17b) implies which would ruin the well-formedness of the resulting trapezoidal SFP.In summary, neither (16a) nor (16b) would lead to the definition of well-formed trapezoidal SFPs, thus proving that all of them can be only obtained in compliance with the theorem hypotheses.
We have shown that by varying the values of ( 1 ,  2 , … ,  n ) inside the hypercube [0, 1] n it is possible to derive all and only the trapezoidal SFPs which can be designed on the UoD U. The alternative design strategy represented by the positions (8a-8e), therefore, can be adopted to exhaustively explore the SFP search space, exploiting the PSO algorithm to generate optimal sequences of the parameters  i .
It can be observed that the parameters c i do not linearly depend on the parameters a j (j < i) used to determine them.In principle, this means that the SFPs are not uniformly distributed in the search space to be explored.That is true at the beginning of the PSO process, even if the parameters c i are generated by a uniform random sequence ( 1 ,  2 , … ,  n ).To attenuate the effects of the lack of uniformity, PSO will be set up with a large number of particles and a large number of iterations to ensure that low-probability configurations have the chance of being explored if their fitness is high.

EXPERIMENTAL RESULTS
Trapezoidal SFPs may be useful to design the structure of inference systems based on fuzzy rules.We already mentioned how this idea has been injected in the very definition of the previously discussed PSO algorithm, whose objective function is represented by the accuracy evaluation of the resulting fuzzy system applied on some dataset.The cuts themselves can be interpreted as constraints intrinsic to the problem at hand, which can be extracted from the dataset analysis.In our work, we are not interested in tackling some specific problem aiming at designing an optimal fuzzy inference engine to solve that particular task at its best.Our goal is to test the suitability of the developed PSO strategy in fine-tuning the design of trapezoidal SFPs based on cuts, instead.To this aim, we refer to a number of datasets related to classification problems: we are going to use them to evaluate the performance evolution of some fuzzy rule-based systems while optimizing the fuzziness of the fuzzy sets involved in the underlying fuzzy partitions.
We set up an array of datasets, including both synthetic and real benchmark data.The first are described in Table 1: they are all bidimensional datasets purposely designed to include a diverse number of samples and classes.The benchmark datasets are described in Table 2: they come from publicly accessible repositories and can be retrieved online.50,51,52].Again, they represent a variety of cases in terms of samples, features, and classes.
The cuts needed to trigger the design process can be acquired directly from data by means of a clustering process.In particular, we employed DC*, an algorithm which takes its name from Double Clustering with A*, i.e., a mechanism oriented to extract interpretable information from data [53].That is accomplished by going through a two-step process which firstly clusters the data around some detected prototypes and then projects the prototypes themselves along each feature axis to allow a further clustering performed by the A* algorithm.By doing so it is possible to derive a partition over each dimension: this represents the setting where the SFPs can be grounded, thus enabling the design of a fuzzy rule-based system incorporating the fuzzy sets which compose the obtained fuzzy partitions.This process is carried out by the DC* algorithm intervening on every dimension at the same time: some of them could be unaffected by the partition construction, thus implicitly reducing the dimensionality of the problem at hand.The interested reader can find a thorough description of DC* in Ref. 53.
The cuts play a pivotal role in the previously described algorithm since they represent the mid-points placed on each axis between a couple of projections referring to prototypes of different classes.As such, they contribute to the definition of the final configuration of partitions: a subset of cuts is identified through the A* search to split the space of the problem into sub-spaces containing homogeneous prototypes.
For the purposes related to the present work we are going to exploit DC* not to benefit of its main results (namely, the definition of interpretable granules of information to be embedded into an inference engine in form of fuzzy rules), but to profit from its side products, i.e., the cuts derived from data.As previously asserted, they are necessary to start up the trapezoidal SFP optimization process.
In this sense, we are not concerned with the attainment of the best clustering results which DC* may provide.For that reason, in some cases we contented with sub-optimal cut configurations.That can be noticed by referring to Figure 4 where the datasets sd1-sd8 are represented together with the cuts obtained at the end of the application of the DC* algorithm (being bi-dimensional, the synthetic datasets lend themselves to a handy graphical illustration).In a similar way, a configuration of cuts has been obtained through DC* for each of the benchmark datasets listed in Table 2.
Once the cuts are assigned, an infinite number of partitions can be drawn on the space of data.To assess the capability of the PSO algorithm as a fine-tuning tool of trapezoidal SFPs, we firstly set a landmark method to be employed in the following for the sake of comparison.That is represented by the CS algorithm described in Section 3, which is one of the default methods adopted by DC* to finally produce the trapezoidal fuzzy sets composing the fuzzy inference system resulting from the data clustering process.We launched the PSO algorithm following the scheme reported in Equation ( 5), posing  = 0.7298 and  1 =  2 = 1.49618: these parameter values are consistent with some suggestions reported in literature [54,55,56].Furthermore, we set 1000 particles per swarm and 1000 iterations, with an early stopping after 200 iterations without advancement of the objective function.It should be recalled that the particles are free to move in the search space E and the trapezoidal fuzzy set corresponding to each particle position is derived by applying equations (8a-8e), as reported in Section 5.
A twofold inference strategy has been set up to assess the performance of the fuzzy systems built up on the SFPs obtained through the CS method.On the one hand, a winner-take-all strategy (labeled as "max" method) assigns to each sample in the dataset the class related to the fuzzy rule which activates at the highest degree.On the other hand, an alternative strategy (labeled as "sum" method) distinguishes the groups of fuzzy rules providing the same classes and assigns to each sample in the dataset the class related to the group producing the highest cumulative strength.Similarly, the objective function driving the search of the PSO algorithm embeds both the max and the sum methods, so that the fuzzy systems corresponding to each particle position produce two kinds of inference results too.
In this way, a congruent comparison can be performed and the finetuning capabilities of the PSO can be assessed for both the inference methods.
Tables 3 and 4 illustrate a comparison (in terms of accuracy performance) of the fuzzy rule-based systems resulting from the partitions obtained by the CS method and the PSO full search (FS).The comparison refers to the classification of the synthetic data applying the max and sum inference methods, respectively.In the same way, Tables 5 and 6 illustrate an analogous comparison referred to the classification of the benchmark data.
As a general remark, we can observe the suitability of the PSO algorithm as a tool for fine-tuning the derived SFPs based on cuts.In  fact, in several cases its application contributes to modify the fuzziness of the involved fuzzy sets in such a way that the performance of the resulting classifiers is improved when compared to the results related to the baseline CS method.For some datasets the accuracy gain appears to be quite considerable.Also, none of the experiments determined a decay in the classification accuracy.It should be stressed, however, that the goal of our experiments concerned the analysis of the fuzziness influence on the final performance, instead of the definition of the best single model to be employed as an optimal classifier.In other words, we did not perform model selection; that is the reason why we did not apply any scheme of cross-validation, as it is usually the case in some other common experimental sessions whose scope is quite different from ours.
As a further remark, if we consider the results reported in the tables, it is straightforward to argue that the application of the PSO algorithm is much more profitable when fuzzy classifiers equipped with the sum inference method are considered.This allows some interesting observations which can be brought to the attention of the research community dwelling on fuzzy system design.On the one hand, the performance of the max inference method exhibited a certain stiffness with respect to the actual orientation of the oblique slides composing the trapezoidal fuzzy sets.
In this sense, the main contribution to the final performance of the fuzzy classifiers comes from the position of the fuzzy partitions (which in our experimental session has been fixed by the cuts obtained from data through the application of the DC* algorithm).On the other hand, the performance of the sum inference method proved to be much more sensitive to the fuzziness component, allowing a greater degree of variation in the optimization results while considering both the synthetic and the benchmark data.
As a consequence, to fully exploit the contribution coming from the fuzziness component, it makes sense to combine the adoption of the sum inference method and trapezoidal fuzzy sets.As previously asserted, in fact, this kind of fuzzy sets lend itself to a major flexibility in terms of fuzziness, which cannot be attained with different set shapes (i.e., triangular fuzzy sets).It is worth noting that such observations are supported by an empirical analysis which has been thoroughly conducted during the application of both the inference methods.Indeed, as formally demonstrated in Section 5, the PSO algorithm has been applied to the problem at hand so as to exhaustively explore the search space composed by all the trapezoidal SFPs possibly standing on a UoD.
Finally, to explicitly illustrate the tuning process, Figures 5 and 6 depict the comparison of the trapezoidal SFPs derived for some datasets through the application of the baseline CS method and the PSO algorithm.Particularly, the figures illustrate the results concerning a selected subset of synthetic and benchmark datasets, respectively.

CONCLUSIONS
Designing fuzzy classifiers is a challenging endeavor involving a number of research issues.We deal with this matter by analyzing some convenient way to define the SFPs which underlie a fuzzy inference system.Particularly, we highlighted a couple of features that characterize the design of the fuzzy sets involved in a SFP, namely their position and fuzziness.While focusing on a specific shape among others, i.e., trapezoidal fuzzy sets, we investigated the contribution coming from fuzziness when the performance of fuzzy classifiers is evaluated.To this end, we set up an optimization method based on the PSO algorithm to fine-tune the slopes of trapezoidal sets.
We described a mechanism to model the fuzzy sets by adopting a particular PSO implementation: we formally demonstrate that our proposal enables an exhaustive exploration of the search space composed by all the possible SFPs traceable on a UoD once the position of fuzzy sets is fixed.In other words, we were able to provide a thorough assessment of the fuzziness contribution to the performance of fuzzy classifiers.In this sense, this work represents a completion of some previous research of ours, where the PSO algorithm has been adopted to provide a partial exploration of the above-described search space.Additionally, the experimental session allowed to bring into focus some interesting issues which can be of some relevance for the research community working on the design of fuzzy inference system.This kind of remarks are connected to a couple of inference methods which may be adopted to trigger the classification process of a fuzzy system: we termed them max and sum methods.
As a hint for future research, we address the theoretical investigation of such inference methods as a major issue to better understand the semantics behind the working engine propelling a fuzzy classifier.

Figure 1 A
Figure 1 A strong fuzzy partition (SFP) over the Universe of Discourse composed by 5 fuzzy sets whose positions are determined by the arrangement of 4 cuts.

Figure 2 A
Figure 2 A trapezoidal strong fuzzy partition (SFP) tailored on the basis of 3 assigned cuts through the application of the constant slope (CS) method.

Figure 3
Figure 3 Placement of s i = max[(2t i − t i+1 ), b i ].According to the illustrated cuts configurations, s i = (2t i − t i+1 ) in (a); s i = b i in (b).

Figure 4
Figure 4 Graphical representation of the synthetic datasets adopted for the experimental session.The figures include the cuts produced by DC*.

Figure 5
Figure5 Comparison of the trapezoidal SFPs obtained through the application of the CS (upper row) and FS (lower row) methods.The involved synthetic datasets: sd3 (sum inference method, first feature), sd5 (max inference method, first feature), sd6 (sum inference method, first feature).

Figure 6
Figure 6Comparison of the trapezoidal SFPs obtained through the application of the CS (upper row) and FS (lower row) methods.The involved benchmark datasets: BS (sum inference method, first feature), Ir (max inference method, fourth input), Nth (sum inference method, first feature).

Table 1
Description of the synthetic datasets involved in the experimental session.

Table 2
Description of the benchmark datasets involved in the experimental session.(#S = number of samples, #F = number of features, #C = number of classes.)

Table 3
Fuzzy classification of synthetic data -Fine-tuning operated by the PSO search with respect to CS using the max inference method.

Table 4
Fuzzy classification of synthetic data -Fine-tuning operated by the PSO search with respect to CS using the sum inference method.

Table 5
Fuzzy classification of benchmark data -Fine-tuning operated by the PSO with respect to CS using the max inference method.

Table 6
Fuzzy classification of benchmark data -Fine-tuning operated by the PSO with respect to CS using the sum inference method.