DivClust: Controlling Diversity in Deep Clustering
A method for controlling diversity between clusterings in deep clustering frameworks.
DivClust extends deep clustering frameworks so that they can learn to cluster data in multiple ways with a controlled degree of diversity between them. This leads to a) multiple ways of partitioning the data and b) more robust single-clustering results using consensus clustering.
Clustering has been a major research topic in the field of machine learning, one to which Deep Learning has recently been applied with significant success. However, an aspect of clustering that is not addressed by existing deep clustering methods, is that of efficiently producing multiple, diverse partitionings for a given dataset. This is particularly important, as a diverse set of base clusterings are necessary for consensus clustering, which has been found to produce better and more robust results than relying on a single clustering. This gap is addressed by DivClust, a diversity controlling loss that can be incorporated into existing deep clustering frameworks to produce multiple clusterings with the desired degree of diversity. DivClust a) effectively controls diversity across frameworks and datasets with very small additional computational cost, b) learns sets of clusterings that include solutions that significantly outperform single-clustering baselines, and c) using off-the-shelf consensus clustering algorithms, DivClust produces consensus clustering solutions that consistently outperform single-clustering outcomes, effectively improving the performance of the base deep clustering framework.