Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts

The paper ﬁrst offers a parallel between two approaches to conceptual clustering, namely formal concept analysis (augmented with the introduction of new operators) and bipartite graph analysis. It is shown that a formal concept (as deﬁned in formal concept analysis) corresponds to the idea of a maximal bi-clique, while sub-contexts, which correspond to independent “conceptual worlds” that can be characterized by means of the new operators introduced, are disconnected sub-graphs in a bipartite graph. The parallel between formal concept analysis and bipartite graph analysis is further exploited by considering “approximation” methods on both sides. It leads to suggest new ideas for providing simpliﬁed views of datasets, taking also inspiration from the search for approximate itemsets in data mining (with relaxed requirements), and the detection of communities in hierarchical small worlds. ∗


Introduction
The human mind tries to make sense of a complex set of data usually by conceptualizing it by some means.Roughly speaking, it generally amounts to putting labels on subsets of data that are judged to be similar enough.Formal concept analysis 25,24 offers a theoretical setting for defining the notion of a formal concept as a pair made of (i) the set of objects that constitutes the extension of the concept and of (ii) the set of properties shared by these objects and that characterize these objects as a whole.This set of properties defines the intention of the concept.Thus, particular subsets of objects are associated with conjunctions of properties that identify them in a bi-univoque way.This provides a formal basis for data mining algorithms 46 .
Formal concept analysis exploits a relation that links objects with properties.Such a relation can be viewed as well as a bi-graph (or bipartite graph), i.e. a graph having two kinds of vertices, and whose links are only between vertices of dif-ferent kinds.
Besides, the discovery that real-world complex networks from many different domains (linguistics, biology, sociology, computer science, ...) are sharing some non-trivial characteristics has raised a considerable interest 53,3,42,26 .These networks are indeed sparse, highly clustered, and the average length of shortest paths is rather small with regard to the graph size 53 , hence their name of "small worlds".Moreover, most of parameters, and in particular their vertices degree, follow a power-law distribution 4,42 , which acknowledges a hierarchical organization.One of the most active fields of this new network science concerns the problem of graph clustering 48,23 .This problem is often called "community detection" in the literature due to its application to social networks.
Intuitively speaking, a cluster (or community) corresponds to a group of vertices with a high density of internal links and only a few links with external vertices.Nevertheless there is no universally accepted formal definition of a cluster 23 and making a parallel with formal concept analysis may provide some relevant views for defining graph clusters.Many real-world large networks are bipartite and it has been shown that such networks also share properties similar to the above-mentioned ones 38 .While clustering is usually done on projected graphs, some authors address the problem of community detection directly on bipartite networks 5,39 .Besides, techniques inspired from formal concept analysis have been also used for detecting human communities in social bipartite networks 50 .
The purpose of this paper is first to investigate the parallel between formal concept analysis and the graph-based detection of communities in bipartite graphs.In fact, we do not restrict ourselves here to standard formal concept analysis, but we rather consider an enlarged setting with new operators 17,18 .This setting includes the classical Galois connection that is at the basis of the definition of formal concepts, but also another connection that characterizes independent sub-contexts.This is the graph counterpart of this enlarged setting that is discussed here from a bi-partite graph point of view.Moreover, extensions of this setting which allows various forms of approximations of formal concepts and sub-contexts are then paralleled and compared with methods used in bi-graph clustering.
The paper is organized as follows, the basic elements of formal concept analysis are first restated and the other operators are introduced in Section 2.Then, after a short background on graphs, it is shown in Section 3 that a formal concept corresponds to a maximal bi-clique in a bigraph, while conceptual worlds (i.e., independent sub-contexts), obtained by the second connection, correspond to disconnected sub-parts in the graph.Then different ways of introducing various types of approximation, or gradualness, in formal concept analysis, data mining, or in community detection are reviewed in Section 4, before discussing and illustrating their counterpart in the bi-graph setting by proposing a two step clustering procedure in Section 5.

Extended formal concept analysis
Let R be a binary relation between a set O of objects and a set P of Boolean properties.We note R = (O, P, R) the tuple formed by these objects and properties sets and the binary relation.It is called a formal context 25 .The notation (x, y) ∈ R means that object x has property y.Let R(x) = {y ∈ P|(x, y) ∈ R} be the set of properties of object x.Similarly, R −1 (y) = {x ∈ O|(x, y) ∈ R} is the set of objects having property y.
Formal concept analysis 25 defines two set operators, here denoted (.) ∆ , (.) −1∆ , called intent and extent operators respectively, s.t.∀Y ⊆ P and ∀X ⊆ O : X ∆ is the set of properties possessed by all objects in X. Y −1∆ is the set of objects having all properties in Y .These two operators induce an antitone Galois connection between 2 O and 2 P .This means that the following property holds 25 .X is its extent and Y its intent.In other words, a formal concept is a pair (X,Y ) such that X is the set of objects having all properties in Y and Y is the set of properties shared by all objects in X.It can be shown that formal concepts correspond to maximal pairs (X,Y ) such that A recent parallel between formal concept analysis and possibility theory 17 has led to emphasize the interest of three other remarkable set operators (.) Π , (.) N and (.) ∇ .These three operators and the already defined intent operator can be written as follows, ∀X ⊂ O : Note that ( 5) is equivalent to the definition of operator (.) ∆ in (1).X Π is the set of properties that are possessed by at least one object in X. X N is the set of properties such that any object that satisfies one of them is necessarily in X. X ∆ is the set of properties shared by all objects in X. X ∇ is the set of properties that some object outside X misses.
It is usually assumed that the relation R is such that R −1 (y) = / 0 and R −1 (y) = O ("binormalization"), which respectively means that there is no property y that is possessed by no object, or by all objects.It guarantees that X N ⊆ X Π and X ∆ ⊆ X ∇ hold, as expected.
Operators (.) −1Π , (.) −1N , (.) −1∆ and (.) −1∇ are defined similarly on a set Y of properties by substituting R −1 to R and by inverting O and P. (Y ) −1Π , (Y ) −1N , (Y ) −1∆ and (Y ) −1∇ are respectively, i) the set of objects having at least one property in Y , ii) the set of objects whose properties are all in Y , iii) the set of objects that have all the properties in Y , and iv) the set of objects that are missing at least one property outside Y .Moreover, we also assume the bi-normalization of R for objects, namely R(x) = / 0 and R(x) = P, i.e., no object misses all properties or has all properties.These new operators lead to consider a new connection 20,16 that corresponds to pairs (X,Y ) such that X Π = Y and Y −1Π = X (or equivalently) such that X N = Y and Y −1N = X, while (.) ∇ and (.) ∆ lead to the same remarkable pairs which define formal concepts.But pairs (X,Y ) such that X Π = Y and Y −1Π = X do not define formal concepts, but rather independent sub-contexts.Indeed, it has been recently shown 16,20 that pairs (X,Y ) of sets exchanged through the new connection operators, are subsets such that For instance, the pairs ({1, 2, 3, 4}, {g, h, i}) and ({5, 6, 7, 8}, {a, b, c, d, e, f }) in Figure 1 are two independent sub-contexts, whereas pairs ({1, 2, 3, 4}, {g, h}), ({5, 6}, {a, b, c, d, f }) and ({5, 6, 7, 8}, {a, c, d}) are examples of formal concepts.However, note that in general, it might be the case that an independent sub-context in a binary relation R can still be further decomposed into smaller sub-contexts.
Thus, in the setting of formal concept analysis, by means of two companion connections, two key aspects of the idea of clustering are at work.On the one hand, independent sub-contexts are characterized, and on the other hand inside each sub-context, formal concepts (X,Y ) are identified where each pair (x, y) such that x ∈ X, y ∈ Y are in relation (while (X ′ ,Y ′ ) two independent subcontexts).In particular, two formal concepts belonging to two different sub-contexts are clearly well-separated.A recent discussion paper 19 has indeed emphasized a parallel between the characterizations of formal concepts and sub-contexts in formal concept analysis and the characterization of fuzzy clusters in the setting of the extensional fuzzy set approach 34 .The relation with clustering is made still clearer in the next section by providing a bipartite graph reading of formal concept analysis.

Graph reading of formal concept analysis
Let us start by restating some graph theory definitions.A graph is a pair of sets G = (V, E), where V is a set of vertices and E a set of edges.In the paper only undirected graphs will be considered, it means that edges are unordered pairs of vertices.
A graph is bipartite if the vertex set V can be split into two sets A and B such that there is no edge between vertices of the same set (in other words for every edge {u, v} either u ∈ A and v ∈ B or u ∈ B and v ∈ A).We note G = (A, B, E) such a graph where A and B constitute two classes of vertices.
A vertex v is a neighbour of a vertex u if {v, u} ∈ E, we say that u and v are adjacent.Γ(u) is the set of neighbours of a given vertex u, it is called neighbourhood of u.An ordinary graph is complete if every couple of vertices from V × V are adjacent.A bi-graph is complete if every couple of vertices from A × B are adjacent.
An induced subgraph on the graph G by a set of vertices S is a graph composed of a vertex set S ⊆ V , and an edge set E(S) that contains all vertices of E that bind vertices of S (∀u, v ∈ S, {u, v} ∈ E ⇔ {u, v} ∈ E(S)).A set of vertices S that induces a complete subgraph is called a clique.If no vertex could be added to this induced subgraph without loosing the clique property then the clique is maximal.It is straightforward that every subgraph of a bi-graph is still bipartite, every vertex keeping the same class.A set of vertices S that induces a complete subgraph (in a bipartite sense) on a bi-graph G is called a bi-clique and if no vertex could be added without loosing this biclique property then the bi-clique is maximal.
A path from a vertex u to a vertex v is a sequence of vertices starting with u and ending with v and such that from each of its vertices there exists an edge to the next vertex in the sequence.The length of a path is the length of this vertices sequence minus one (it is to say the number of edges that run along the path).Two vertices are connected if there is a path between them.We note S k the set of vertices connected to at least one vertex of S with a path of length smaller or equal to k.By definition S 0 = S.One can observe that ∀k, S k ⊆ S k+1 .S * is the set of vertices connected to at least one vertex of S with a path of any length, we have S * = k 0 S k .Two vertices are disconnected if there is no path between them.Two subsets A, B of vertices are disconnected if every vertex of A is disconnected from any vertex of B. A subset of vertices S is connected if there is a path between every pair of vertices of S, An induced subgraph that is connected is called a connected component.If no vertex could be added to this induced subgraph without loosing the property of connectedness then the connected component is maximal.Note that often "connected component" is used for speaking of a "maximal connected component".

From formal context to bi-graph
For every formal context R = (O, P, R), we can build an undirected bi-graph G = (V o ,V p , E) s.t.there is a direct correspondence between: the set of objects O and a set V o of "o-vertices", the set of properties P and a set V p of "p-vertices", and between the binary relation R and a set of edges E. In other words, there is one o-vertex for each object, one p-vertex for each property, and one edge between an o-vertex and a p-vertex if and only if the corresponding object possesses the corresponding property (according to R).
The four operators (.) Π , (.) N , (.) ∆ and (.) ∇ can be redefined for a set of vertices in this graph framework by replacing, in equations ( 3) to (6), O by V o , P by V p and R −1 (y) by Γ(y).Operators (.) Π and (.) ∆ can also be rewritten in the following way: These notations are interesting since only the neighborhood of vertices of X is involved.It permits to immediately understand operators (.) Π and (.) ∆ in terms of neighborhood in the bi-graph: X Π is the union of neighbors of vertices of X whereas X ∆ is the intersection of these neighbors.Note that with this writing and interpretation there is no difference between (.) Π and (.) −1Π neither between (.) ∆ and (.) −1∆ .
Graph interpretations of (.) N and (.) ∇ are less straightforward, nevertheless X N can be understood as the union of neighbors of vertices of X that have no neighbors outside of X.In other words it is the set of vertices exclusively connected with vertices of X (but not necessarily all).Whereas X ∇ is the set of p-vertices that are not connected to all o-vertices of X.

Two views of graph clusters in terms of connections
The connections induced by (.) ∆ and (.) Π can also be understood in the graph setting framework.On the bi-graph It means that the subgraph induced by X ∪Y is complete.Moreover there is no vertex that are adjacent to all vertices of X (resp.Y ) which are not in X ∆ (resp.
and there exists no vertex that is adjacent to all vertices of Proposition 2 For a pair (X,Y ) the two following propositions are equivalent: ) is given by definition.We then assume that it exists k such that (X ∪ Y ) k ⊆ (X ∪ Y ).We can notice that (X ∪Y ) k+1 ⊆ ((X ∪Y ) k ) 1 , by considering that a k + 1 long path is a path of length k followed by a one edge step.So This implies by recurrence that ∀k 0, We still have to show that any vertex v of X ∪Y has at least one neighbour, which is straightforward if A set S such that S * = S is not exactly a maximal connected component but it is a set of vertices disconnected from the rest of the graph.So if there is no strict subset S ′ of S satisfying S ′ * = S ′ it means that there is no subset of S disconnected from other vertices of S. In other words, S is connected and then S is a maximal connected component.Therefore, the following property: Proposition 3 For a pair (X,Y ) the two following propositions are equivalent:

X ∪ Y is a maximal connected component (which has at least two vertices).
According to Propositions 1-3, it is worth noting that the two Galois connections correspond to extreme definitions of what a cluster (or a community) could be: 1. a group of vertices with no link missing inside.
2. a group of vertices with no link with outside.One the one hand a maximal bi-clique is a maximal subset of vertices with a maximal edge density.Vertices cannot be moved closer, and in that sense one can not build a stronger cluster.On the other hand, a set of vertices disconnected from the rest of the graph can not be more clearly separated from other vertices.It corresponds to another type of cluster.In fact, only the smallest of such sets are really interesting, and they are nothing else than maximal connected components.This two extreme definitions were already pointed out for clusters in unipartite graphs 51 .

Approximate conceptual structure: comparative state of the art
Formal concepts correspond to maximal bicliques, while independent sub-context correspond to disconnected subparts.These two notions may need to be relaxed for various reasons.
Motivations are mainly twofold: first data may be noisy (some links in the graph may be missing or wrongly present), secondly one may need to have a simplified view of the set of concepts at work in the data.The first motivation is the one which is the most frequently emphasized in the literature.However in some sense any exceptional piece of data, even if there is no doubt about its validity, may be considered as contributing some "noise" that blurs the picture and prevents to have a simplified image of the data.This suggests to forget some "details" in order to summarize the information more easily.For instance, one may forget an edge because it simplifies the view by disconnecting weakly connected parts (for example the link (4, d) in Figure 2), or introduce some missing edges in order to reinforce the connectedness inside a potential cluster (missing links (1, h) and (5, c) for example in Figure 2) and lay bare a simpler and more general concept.Such ideas are encountered in formal concept analysis, when looking for relevant, or for approximate / pseudo formal concepts, but have also counterparts in other areas such as frequent itemset mining, or in graph clustering (also now known as "community detection" problem).In this section, we provide an overview of the existing literature in these different areas, starting with formal concept analysis.We end the section by considering different weighted extensions of formal concept analysis, where the weights bearing on the links between objects and properties may provide an indication of the importance of the link for deciding if we keep it, or we cut it in an approximation process.

Relevant or approximate formal concepts
A first line of research for simplifying a set of formal concepts which tend to be large (especially when data are noisy) is to select the relevant concepts only by means of appropriate measures 35 .The stability measure 36,32 is the most commonly used.The stability measure is all the greater for a concept (X,Y ) as more subsets of X are such that X ∆ = Y .This means in practice that the objects have not in general common properties outside Y , or in other words the concept is likely to be "stable" if one remove some objects.These approaches only select relevant concepts among the whole set of existing ones.So they cannot produce larger approximate concepts that would cover concepts that are distinct due to some missing links.One may think of doing that either by defining approximate formal concepts, or by using fuzzy formal concepts.
Defining an approximate formal concept can be done in a rather straightforward manner.Indeed the definition of X ∆ may be softened by looking for the set of properties shared by "most" objects in X rather than all.It leads to define 20 an operator X ∆,k which allows for at most k exceptions among objects (provided that X has more than k elements).Namely, and X = Y ∆, j .While a formal concept (X,Y ) corresponds to the Cartesian product X × Y , an approximate formal concept thus may have at most k holes by column and at most j holes by line.This idea is at work in data mining when looking for error-tolerant (closed) itemsets, see the next Section 4.2.Similarly one may think of defining approximately independent subcontexts (X ′ ,Y ′ ) and (X ′ ,Y ′ ) by tolerating a limited number of elements in (X ×Y ′ ) ∪ (X ′ ×Y ′ ).
In a related spirit, another approach consists in looking for pseudo concepts 44 .Pseudo concepts are a way to enlarge formal concepts: Indeed a set of objects X is associated with a set of primary properties Y , and with a secondary set of properties such that a majority of the objects are associated with the properties in this latter set.

Looking for error tolerant itemsets
Methods for mining association rules 1,52 look for sets of items that are frequently present together in a transaction database.The search for itemset can be related to formal concept analysis 46 .In that perspective, the transaction database is viewed as a formal context where each transaction stands for an object and each item in this transaction corresponds to a property satisfied by the object.Then the intent of each formal concept corresponds to a closed itemset.Then frequent itemset can be found by pruning the lattice of the formal concepts.In particular the rule: The presence of noise in data has led to the development of different ways of finding errortolerant itemsets 30 .Roughly speaking, the idea is to no longer require that every item in a frequent itemset appears in each supporting transaction.This idea is very close of the idea of tolerating "holes" in format concept (which are maximal rectangles included in the formal context).As already said for approximate formal concept, it is not only the proportion of holes in the rectangle which matters but also their relative positions.Similar issues can be found with error-tolerant itemsets where one distinguishes weak error-tolerant itemsets and strong error-tolerant itemsets 54 .In the former tolerance is global, while in the latter the number of possibly missing items (of the considered itemset) is limited inside each supporting transaction and the number of supporting transaction missing an item (of the considered itemset) is also limited.Moreover strong error tolerant itemsets may be further constrained by requiring that a given proportion of supporting transactions include all items (of the considered itemset) 10 .
Besides, once the error-tolerant itemsets are obtained, one may think of merging them (by taking their union) on a similarity basis in order to reduce their number.Similarity may be judged by requiring that the result of the merging is a strong error-tolerant itemset with respect to some tolerance thresholds 9 .

Community detection in hierarchical small world
There is a large amount of literature about graph clustering 51,23 , especially since the emergence of so-called community detection problem 42 .Most of these works concern unipartite graphs.However such graphs are often obtained from bipartite ones (for example co-authoring graph between authors are based on the author-paper bipartite graph).Some authors prefer to address the clustering problem on bipartite graphs rather than on unipartite graph projections (where a part of the information is lost).In the following we briefly review existing graph clustering methods that have been adapted for bipartite graphs.The classical graph partitioning problem 33 consists in splitting the vertex set in a given number of nearly equally-sized subsets such that the number of edges binding two vertices belonging to two different groups (i.e. the cut size) is minimal.This view has traditional applications in problems such as electronic circuit partitioning or load balancing in parallel computing.However it has also been applied to finding conceptual clusters in both unipartite and bipartite graphs 14 .The main drawback of this approach is that the number of clusters has to be known a priori, which makes sense for example in parallel computing when the number of resources is known.However when clustering communities, the number of clusters is usually unknown.In addition, as mentioned by Newman 43 the minimal cut argument is maybe a too naive approach to find "natural" structures in networks (some edges may be more significant than other according to the graph structure).
The modularity is a more complex quality measure of graph vertices partitioning.It is equal to the number of edges contained inside each cluster minus the same number for a graph built by rewiring vertices randomly but preserving their incidence degree.One looks for a partition that maximizes this modularity measure.This also amounts to minimize the number of edges between cluster minus the same number in the same randomly rewired graph.It was initially introduced by Newman and Girvan 42 as an argument to select a particular cut in a dendrogram (resulting from a hierarchical clustering algorithm).It has then been used as an objective function in various optimizing algorithm 11,7 .Different bipartite adaptations of this quality measure has been proposed recently 5,29,40 .
Another common approach for graph clustering consists in using random walks.The intuitive idea is that random walk reveal the graph structure.Indeed more precisely, a random walker tends to be trapped into clusters since once inside a cluster the probability of getting out is low.This is because there are only few paths that would enable the walker to go out, and many paths that remain inside the cluster.This has been used for computing similarity degrees between graph vertices 26,27 which are then used for building clusters 47 (this algorithm has been adapted to bipartite graph clustering 41 ).This same idea may also been used in a different way for defining quality measures of a partition of graph vertices 13,49 .Such quality measure is then used as an objective function in optimizing algorithms for obtaining a partition of the graph into clusters.The quality measure proposed by Rosvall and Bergstrom 49 seems especially promising 37 , but has not be ap-plied to bipartite graphs yet.
Another family of approaches views cliques as kernels of potential communities.A first idea is to look for adjacent cliques.It is illustrated by the well known method CFinder 45 which defines clusters as chains of adjacent k-cliques (a k-clique is a complete subgraph of size k, and two k-cliques are adjacent iff they have -at leastk − 1 vertices in common).This method has been adapted to bipartite graphs 39  In other words, each (big enough) maximal bi-clique is contained in a cluster (while maximal bi-cliques that are too small are ignored).Other approaches based on cliques work in more global way.Some authors have proposed to perform an agglomerative hierarchical clustering algorithm over edges (ie.2-cliques) of uni-partite graphs 2 .It has been also proposed to transform a uni-partite graph into the graph of its k-cliques and then to compute a partitioning of the optain graph 22,21 .
Apparently ignored by the above proposals, an older method is worth mentioning 31 .It starts with a bipartite graph for which one computes all the maximal cliques, then a bipartite graph is built by associating these maximal cliques to one of the two vertex sets.Finally this new graph is partitioned by minimizing the cut size.
In all these clique-based methods, the clusters of vertices that are obtained are allowed to overlap (since it is expected that individuals may belong to several communities).As emphasized by Hu et al. 31 the methods working with cliques better preserve the maximal cliques which constitute the core of communities, while all the previously presented methods that focus on vertices may split these meaningful maximal cliques.

Weighted extensions of formal concept analysis
There may be several good reasons for having a weighted formal context, i.e. a context where the links between objects and properties are weighted.Indeed, the weights may be understood in two different ways.First they may reflect the idea of allowing the properties to be a matter of degree.
Another type of weights correspond to the situation where the properties remains binary but are pervaded with uncertainty.In both cases, this supposes that the information about the gradedness, or the uncertainty, is available.The first way, which has been the most investigated until now, amounts to consider that objects may have properties only to a degree.Such fuzzy formal concept analysis 6 is based on the operator : where now R is a fuzzy relation, and X and X ∆ are fuzzy sets of objects and properties respectively, and denotes the min conjunction operator and → an implication operator.A suitable choice of connective (the residuated Gödel implication: a → b = 1 if a b, and a → b = b if a > b) still enables us to see a fuzzy formal concept in terms of its level cuts X α ,Y α such that (X α ×Y α ) ⊆ R α where X α ×Y α are maximal, with Another way 16,15 is related to the idea of uncertainty.The possibilistic manner of representing uncertainty here is to associate with each link (x, y) a pair of number (α, β ) such as α, β ∈ [0, 1] and min(α, β ) = 0 expressing respectively to what extent it is certain that the link exists (α), and does not exists (β ).A link in a classical formal context corresponds to a pair (1, 0), the absence of a link to the pair (0, 1), and the pair (0, 0) models complete ignorance on the existence or not of a link.On this basis a link may be all the more easily added (resp.deleted) as α (resp.β ) is larger.
One may also consider that some properties are less important, or that some objects are more typical 15 .Then weights are no longer put on links or edges, but rather on the nodes.Thus forgetting a non compulsory property (e.g. the ability to fly for a bird) may help building a larger concept (e.g.birdness, although typical birds fly).Forgetting an object or a property also suppresses links, which may also help obtaining disconnected subparts.
These three views require different kinds of additional knowledge which are not often available, especially in large data sets.Moreover the two last views have been only recently introduce and have not been yet investigated in detail.However such weights may provide a help for building larger formal concepts and smaller sub-contexts.We shall see a way of producing weights by exploiting the structure of a binary relation through a diffusion process; see Section 5.2.

Looking for an approximate conceptual view of data
We have reviewed different trends of research aiming either at defining approximate concepts, subcontexts, closed itemsets, or at clustering communities in bipartite graphs.Among these methods, some focus on the identification of what may be retrospectively understood as approximate independent sub-contexts, while others rather look for approximate (formal) concepts.Indeed some methods partition the bipartite graph into nonoverlapping subgraphs, while the others use formal concepts (or bi-cliques) for building larger groups (which are still allowed to overlap).We recognize here the two views for characterizing clusters (no link with outside vs. no missing link inside, see section 3.2).The partitioning methods suppress (or forget) links that are judged unimportant, whereas approximate concept methods tend to add (or to compensate) missing links.
It is worth noticing that the reviewed methods adopt one point of view (either looking for subcontexts that are as much as possible independent, or approximate concepts that are as dense as possible).But none of these methods may both add and suppress links (depending on the situation), at least in the initial context.However, there exists a two step approach that first looks for the con-cepts in the initial context, and then try to partition the set of concepts obtained.This may be done by considering the meta-context built from the links relating concepts and the objects of their extents and looking for approximatively independent subcontext in this meta-context, or doing the same in terms of bipartite graphs 31 .Moreover from a formal concept analysis point of view, it would seem natural to look first for approximate independent sub-contexts and then inside each of these subcontexts to look for the concepts.
Generally speaking, one may think of two types of approaches for identifying meaningful clusters (or communities), namely the ones that try to modify the initial context (by suppressing or adding links) in order to simplify the resulting clusters, and the ones that rather start from the set of concepts associated to the initial context and then try to simplify this set (for instance, by gathering or selecting relevant concepts).The first ones require some evaluation of the links in order to be able to decide to add or suppress them, while the second type of methods need some measure of the goodness of what is produced.
In the following, we present a two-step procedure that aims at providing a simplified, structured view of a set of data.These two steps correspond respectively to the two types of approaches mentioned above.The first step uses a random walk method for transforming the initial context into a graded one.This graded context is in turn reduced to a simplified binary context that is expected to have a smaller number of formal concepts.The second step then tries to merge together sets of concepts that overlap sufficiently.Before discussing this two-step procedure, we introduce the use of random walks, which also leads to a worth-noticing parallel with extended versions of formal concept analysis.

Random walks and formal concept analysis
A large panel of approaches developed within community detection literature use random walks for identifying communities.As already mentioned in Section 4.3, the underlying idea is that random walkers tend to be trapped inside communities.Let us consider a random walk 8 on a bipartite-graph.In the following we continue to use the formal concept analysis notations taking advantage of the strict parallel with bipartite graphs established in section 3.1.The relation R is now replaced by a probabilistic transition matrix for going from a vertex x to a vertex y, or conversely.The probability is generally equally shared between the edges directly connected to the starting vertex.Let P x→y be the probability for going from a vertex x ∈ O to a vertex y ∈ P, formally defined as follows: where R(x) is the set of properties verified by x.Let X(x) be the probability for a random walker to be in vertex x ∈ O at a given step; the probability X P (y) to reach a vertex y ∈ P at the next step is given by: X P (y) = ∑ x∈O X(x).P x→y (11)   Similarly when going from a property vertex to an object vertex we have the following equations: Y P (x) = ∑ y∈P Y (y).P y→x (13)   where Y (y) is the probability for a random walker to be in vertex y ∈ P at a given step.The equations ( 11) and ( 13) correspond to a one step walk between an object and a property.More generally, a walk with an odd number of steps links objects and properties.The probability to reach a property from an object (or conversely an object from a property) after an odd number of t steps can be computed by composing t times the operator (.) P .For t being an odd number, let P t x→y (resp.P t y→x ) be the probability to reach a property y from an object x (resp.an object x from an property y) in t steps.
One can show 26,27 that when t tends to infinity, P t x→y no more depends on the starting vertex x.However, the dynamics of the convergence towards this limit clearly depends on the starting node.Indeed, the trajectory of the random walker is completely governed by the topology of the graph: after t steps, any vertex y located at a distance of t links can be reached.The probability of this event depends on the number of paths between x and y, and the degree (i.e.number of neighbours) of each vertex along those paths.The more interconnections between these vertices, the higher the probability of reaching y from x.Therefore, for a small t, P t x→y reveals "how far" is y from x.This idea is used in the next section to compute a weight for each pair of object and property.
With regard to formal concept analysis, there is a worth noticing parallel between the "diffusion" operator at the basis of random walk methods and graded extensions of (the possibility theory reading of) formal concept analysis operators.Indeed equations ( 11) and ( 13) can formally be paralleled with the formula defining the operator at the basis of the definition of a formal concept 6 : and with the formula of the operator inducing independent sub-contexts 18 : where R may be graded, as well as X, X Π and X ∆ and where an usual choice for * is min, and a residuated implication for →.This parallel between operators may be further extended between definitions of concepts and communities as stable points for these operators.Some random walk approaches define clusters as sets of vertices almost stable in the sense that a random walker tends to stay inside them 13,49 .In formal concept analysis, a formal concept is also a stable set for the Galois connection operator (X ∆∆ = X and Y −1∆−1∆ = Y ).
Nevertheless, we do not intend in the following to compute a probabilistic substitute to the notion of formal concept.We rather compute a weighted counterpart of the considered formal context by using random walks for assessing the importance of links between objects and properties.

A two-step procedure
In the following, we propose a two-step procedure aiming at providing an approximate conceptual view of data.It takes advantages of several ideas coming from the previous comparative discussion of different research trends.Namely the procedure first uses a random walk approach for providing a weighted counterpart to the formal context which is the basis of a heuristic method for diminishing the number of formal concepts associated to the context.Then, in a second step, the procedure merges formal concepts which are sufficiently close.

Using random walks for assessing the importance of edges
We propose to use a short random walk to attribute a weight W (x, y) on each pair (x, y) ∈ O × P of object and property: The choice of random walk of length three may be discussed.As seen in the previous section, we have to use short random walks anyway.Moreover the number of steps t has to be odd, and if t = 1 only pairs (x, y) that are in R will have a non null weight.It is why we use the smallest informative number of steps, which has obvious computational advantages.
So the result of this computation is to substitutes a weighted context W to the original one R, where the weights accounts for a form of closeness.It is clear that given a threshold s, one can extract from W a binary relation R ′ : This new formal context R ′ is associated with independent sub-contexts (if any) and with a lattice of formal concepts.4 which gives the number of concepts in function of the threshold, where in the vicinity of 0.100 the number of concepts goes from 8 to 10 and then to 9. We also observe in Figure 3 that when the threshold is decreased too much, the two independent sub-contexts no longer exist (since for s = 0.10, (4, d) is no longer removed).Note that independent sub-contexts are easy to recognize in the lattice of concepts, since they correspond to families of paths from the top and the bottom formal concepts (with empty sets of properties and objects respectively), which are fully separated from the others.In order to explore what values of the threshold may be of interest for looking for approximately independent sub-contexts, or for diminishing the number of formal concepts, two landmark values are of interest: let m be the weight of the worse of the edges and M be the weight of the best of the "non-edges" in the formal context.Namely, One can note that if all edges have a larger weight than all "non-edges" (i.e.M m) then the relation stays the same for any value of the threshold in [M, m].When m M, the idea it to remove some edges that have a weight lower than the best "non-edges", and to add some "non-edges" that have a better score than the worst edges, by choos- In practice, one may blindly take s = m+M 2 , or explore different values.However, a better strategy seems to first look for approximately independent sub-contexts by taking a value close to M (see Table 1(b)), and then in each of the sub-contexts to diminish the threshold progressively in order to increase the number of edges and checking the corresponding number of formal concepts.
As a matter of illustration, Table 2 gives the result of the transformation on a relation that corresponds to the Southern Women Data Set 12 , a bipartite graph between 18 women and 14 events that is a standard example in social network analysis literature and in community detection literature.The crosses in Table 2 (as well as in Table 1(b)) represent the new relation R ′ computed from W , and the cells in gray are the ones that have been modified by the transformation (to get back to the original relation, one has just to swap cross and blank in each of these gray cells).In Table 2, there are 67 formal concepts in the original relation and only 22 with the new relation.

Merging concepts
At the end of the previous step, it is expected to have a simplified view of the original context in terms of sub-contexts (when possible) and formal concepts.However the number of formal concepts may remain still too high with respect to user needs.Indeed in the example of Table 2 after the first step there is still 22 formal concepts.In order to further simplify the view one may complete the previous procedure with a second step aiming at merging "close" formal concepts.
For two concepts a = (X a ,Y a ) and b = (X b ,Y b ), the inclusion value of a in b can be defined as follows: This can be extended to a set C of n concepts where s(C) > θ means that there is a concept b ∈ C such that, for all concepts a ∈ C, θ % of a is included in b.In other words, there exists a concept in the set C that almost includes all concepts of the set C.
In order to merge formal concepts, one may apply the following standard agglomerative algorithm : 1. each concept is in its own cluster, 2. for each pair of clusters, the score of their union is computed, 3. if this maximal score is larger than the threshold: (a) a pair having maximal score is selected and merged, (b) loop to 2.   However, note that it does not necessarily lead to a partition of the set of concepts that is optimal in the sense that it does not necessarily maximize min i (s(C i )) among all the n-partitions {C 1 ,C 2 , . . .,C n } of the set of formal concepts.
Table 3 provides the result of this second step on the women social network example.Finally, 6 non trivial clusters are retained.Clearly, a threshold θ too small may lead to an oversimplification of the formal context, as in the initial example, see Table 4, where each of the two (approximately) independent sub-context becomes a unique cluster for θ = 0.4.It is worth noticing that this second step still preserves a structured view, since one may keep the benefit of the lattice of concepts restricted to the "father" concepts of each group of concepts (which approximately include the other concepts in their group).Besides, other criteria (e.g., the relative size) may be used for a further selection in the set of formal concepts, if too many formal concepts remain in the result of this second step.

Conclusion
Starting with a view of a formal context as a bi-graph, the paper has shown that formal concepts correspond to the idea of maximal bicliques, whereas independent sub-contexts, obtained thanks to the introduction of another connection, correspond to disconnected subsets of vertices.Noticeably enough, these two constructs reflect two ideal views of the idea of graph cluster, namely a set of vertices with no link missing inside and a group of vertices with no link with outside.The last part of the paper, after a review of different ways of getting approximate structured views of a formal context, or equivalently of clustering a bipartite graph, has outlined a two-step procedure for laying approximately independent sub-contexts if any, and simplifying the lattice of the formal concepts.The first step of the procedure takes advantage of random walks methods for introducing a closeness estimation of the vertices (or equivalently of the pairs (object, property) in the formal context), the second step merges formal concepts that are approximately included.Clearly,

Figure 1 :
Figure 1: A formal context R and the corresponding bipartite graph.

Figure 2 :
Figure 2: R ′ : Relation R modified and the corresponding bi-graph.
: clusters become chains of adjacent K a,b -bicliques.A K a,b -biclique is a complete subgraph of a o-vertex and b p-vertex and two K a,b -bicliques are adjacent if they share a − 1 o-vertex and b − 1 p-vertex.It is worth noting that all maximal bicliques that count more than a o-vertices and b p-vertices are made of adjacent K a,b -bicliques.

10 Figure 3 :
Figure 3: Lattices of formal concepts for different values of the threshold (example of Figure 2, with W given inTable 1(a).

Figure 4 :
Figure 4: Number of formal concepts (including the bottom and top trivial ones) in function of the threshold applied to the weighted relation W of the formal context in Figure 2.

Table 1 :
Modification of the relation permit to go from 13 to 11 concepts

Table 2 :
New relation from southern woman bipartite network after applying random walk transformation.Cells in gray are the ones affected by the transformation.

Table 4 :
θ = 0.4, Final clusters of the academic example of Figure2: X Y 1.2.3.4 igh 5.6.7.8 abcdef by bridging different areas, this overview paper has discussed several ideas that are worth of further investigation, and the procedure that has been outlined for clustering bipartite graphs may be still refined and improved after a proper evaluation.