**This is an old revision of the document!**

# Clustering

**Aliases**

*This identifies the pattern and should be representative of the concept that it describes. The name should be a noun that should be easily usable within a sentence. We would like the pattern to be easily referenceable in conversation between practitioners.
*

**Intent**

*Describes in a single concise sentence the meaning of the pattern.
*

**Motivation**

* This section describes the reason why this pattern is needed in practice. Other pattern languages indicate this as the Problem. In our pattern language, we express this in a question or several questions and then we provide further explanation behind the question.*

**Sketch**

*This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern.
*

**Discussion**

*This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.*

**Known Uses**

*Here we review several projects or papers that have used this pattern.*

**Related Patterns**
*
In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.*

*Relationship to Canonical Patterns*

- Similarity and Distance Measure is required by clustering for measure inclusiveness.
- Structured Factorization have various methods that show equivalence to clustering.
- Random Projections is similar to clustering in that it captures similarities to random vectors in multi-dimensional spaces.
- Geometry provides generalized ways to partition information spaces.
- Ensembles is analogous in that each cluster may be considered as an expert and all the projections as being a mixture of experts.
- Disentangled Basis may be achievable through clustering where clusters are disjoint.

*Relationship to other Patterns*

**Further Reading**

*We provide here some additional external material that will help in exploring this pattern in more detail.*

**References**

http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf On Spectral Clustering

https://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdf

http://research.microsoft.com/en-us/um/people/dongyu/nips2009/papers/arel-destin_nips%20tpk2.pdf

A Deep Learning Architecture Comprising Homogeneous Cortical Circuits for Scalable Spatiotemporal Pattern Inference

At the core of the adaptation mechanism are two learned constructs, one of which relies on a fast and stable incremental clustering. Moreover, the proposed methodology does not require layer-by-layer training and lends itself naturally to massively-parallel processing platforms.

http://arxiv.org/pdf/1607.05002v1.pdf Geometric Mean Metric Learning

http://arxiv.org/pdf/1608.08792v1.pdf CliqueCNN: Deep Unsupervised Exemplar Learning

https://arxiv.org/pdf/1604.00734v1.pdf Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks

A key challenge in entity linking is making effective use of contextual information to disambiguate mentions that might refer to different entities in different contexts. We present a model that uses convolutional neural networks to capture semantic correspondence between a mention’s context and a proposed target entity. These convolutional networks operate at multiple granularities to exploit various kinds of topic information, and their rich parameterization gives them the capacity to learn which n-grams characterize different topics. We combine these networks with a sparse linear model to achieve state-of-the-art performance on multiple entity linking datasets, outperforming the prior systems of Durrett and Klein (2014) and Nguyen et al. (2014).1

https://arxiv.org/pdf/1611.04500v1.pdf DEEP LEARNING WITH SETS AND POINT CLOUDS

We study a simple notion of structural invariance that readily suggests a parameter-sharing scheme in deep neural networks. In particular, we define structure as a collection of relations, and derive graph convolution and recurrent neural networks as special cases. We study composition of basic structures in defining models that are invariant to more complex “product” structures such as graph of graphs, sets of images or sequence of sets. For demonstration, our experimental results are focused on the setting where the discrete structure of interest is a set. We present results on several novel and non-trivial problems on sets, including point-cloud classification, set outlier detection and semi-supervised learning using clustering information

https://arxiv.org/pdf/1701.06508v1.pdf The Impact of Random Models on Clustering Similarity

Here, we derive corrected variants of two clustering similarity measures (the Rand index and Mutual Information) in the context of two random clustering ensembles in which the number and sizes of clusters vary. In addition, we study the impact of one-sided comparisons in the scenario with a reference clustering. The consequences of different random models are illustrated using synthetic examples, handwriting recognition, and gene expression data. We demonstrate that the choice of random model can have a drastic impact on results and argue that the choice should be carefully justified.

https://openreview.net/forum?id=BJMO1grtl Neural Expectation Maximization

We introduce a novel framework for clustering that combines generalized EM with neural networks and can be implemented as an end-to-end differentiable recurrent neural network. It learns its statistical model directly from the data and can represent complex non-linear dependencies between inputs. We apply our framework to a perceptual grouping task and empirically verify that it yields the intended behavior as a proof of concept.

https://arxiv.org/abs/1704.01460 Comparison Based Nearest Neighbor Search

We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points i and j is smaller than the distance between the points i and k.

https://arxiv.org/abs/1704.02345 Fast Spectral Clustering Using Autoencoders and Landmarks

Spectral clustering is a powerful clustering algorithm that suffers from high computational complexity, due to eigen decomposition. We present a definition of the Laplacian matrix of the graph that enable us to perform eigen decomposition efficiently, using a deep autoencoder. The overall complexity of the algorithm for eigen decomposition is O(np), where n is the number of data points and p is the number of landmarks. At last, we evaluate the performance of the algorithm in different experiments.