Clustering with noisy multiplex data

Claire Mathieu, Département d'Informatique de l'Ecole Normale Supérieure

In [1], Abraham, Chechik, Kempe and Slivkins analyze a social network that it the unlabeled union of several instances of Kleinberg's small world model, each instance representing a different category. In [2], Mathieu and Schudy analyze a network of similarity information that is the noisy version of a perfect ground truth partition, and give an algorithm to reconstruct the ground truth when each cluster is large enough. Is it possible to extend the analysis to a multi-category ground truth consisting of the superposition of several perfect partitions? [1] assumes small local correlation. If the sets in the two partitions have small intersection - for every set A in the first partition and B in the second partition, the intersection has size much less than min(|A|,|B|), then most points are either in exactly one of A,B, or in neither. This suggests an extension of the semi-definite programming relaxation analyzed in [1], whereby to each vertex i we associate not one but two vectors v(i) and w(i), and the similarity of two vertices i and j is measured by v(i).v(j)+w(i).w(j). Up to increasing the number of dimensions, the algorithm can thus be attempted in essentially the same form. Can the analysis be extended? Alternatively, can the ideas from [1] be adapted to the clustering setting?

[1] Ittai Abraham, Shiri Chechik, David Kempe, Aleksandrs Slivkins: Low-distortion Inference of Latent Similarities from a Multiplex Social Network. http://arxiv.org/abs/1202.0922

[2] Claire Mathieu, Warren Schudy: Correlation Clustering with Noisy Input. SODA 2010: 712-728