Feature Patterns

This chapter explores recurring patterns that correspond to the features of the data. Data will consist of multi-dimensional vectors that are collections of features. We originally had thought of using the word 'representation', however to avoid confusion with other literature in deep learning, we use the word 'feature' instead. In fact, the term 'feature map' is used in the subject of convolution networks.

The feature space maintains a kind of duality with the model space. This arises from the observation that the model space performs similarity operations between model and feature vectors. Therefore there should be some trait of compatibility between the model and feature spaces. However, with the exception of a inner product similarity operation found in traditional neural networks, we cannot assume that the model and feature spaces are in the same space. Many papers appear to assume that the same space, and it reflects in the ambiguity of the use of the word representation. However, this is likely due to historical reasons and maybe bad habits are hard to let go of.

Feature patterns discusses the many ways that we can conform different kinds of data such as images, speech, events, signals, language, categories, trees, graphs and other more complex forms. This is a subject that is important for real world practical application. Most deep learning research will involve the more common real valued vector as input. This if of course is too liming a scope.

(TODO: Fix graph)

The most well known of techniques is called Neural Embedding. An example of Neural Embedding is the Word2Vec method that was introduced by an employee at Google. Word2Vec is able to learn how to map words into a low dimensional space by sampling a huge number of sentences. The approach has been shown to be surprisingly effective in capturing the semantics of words. So vectors in the low dimensional space represent words. In this space if you perform the vector arithmetic of a vector representing king and subtracting one representing man, the result will be surprisingly the vector that represents queen. This new space is able to approximate a kind of metric space that capture the words semantics. Neural Embedding is a recurring method and there have been many approaches that have been implemented since.

The pattern of Reusable Features is one of the most intriguing capabilities of deep learning systems. Resusable features are feature spaces that are trained by one network and reused in an another network. Reusable features are a consequence of Unsupervised Pre-Training. Its first use was for bootstrapping training. That is, the method of training of Autoencoders and then stacking them to build a multi-layered solution was the key development that revealed the effectiveness of deep learning. Today, unsupervised pre-training isn't practiced as often as in the past, however most practice begins with a trained network as an initialization state to start training. The fact that previously trained models can be used as a basis of a new solution is one of the big advantages of the deep learning approach.

The grunt work of data science involves a lot of data preparation. This is the activity that practitioners detest the most. Unfortunately, it is an extremely important area despite the claim that DL systems require very little feature engineering. The data that a Machine digests are not arbitrary and need to exist in a certain form. So for example, if images where being processed, data of a 512 by 512 image will consist of a single vector with 3 x 512 x 512 dimensions. The key constraint for DL systems is that the dimension is a fixed vector and each element in the vector is a real number. Unfortunately, observations in the real world do not consist only of a fixed dimension of real numbers. Data that convey semantics like text and graphs do not fit so well within a fixed box. Therefore, one will need to rely on several techniques to massage the data into a suitable Representation. This chapter covers many of these Representation techniques found in practice. This chapter focuses on the techniques required to mold data into a Representation that works well with neural networks. As we shall also see, some of these techniques will even exploit the use of ANNs.

Neural Embedding

Reusable Representation

Data Augmentation

Data Synthesis

Missing Values

Feature Normalization

Dimensionless Features

Quantized Data

Categorical Data

Textual Features

Vector Concatenation

Batch Normalization


Dimensional Reduction

Imputting (GLRM)


Graph Embedding

Parse Tree

Adversarial Features

Disentangled Representation

Propositionalization (move to collective learning)

User Embedding Contextual Embedding (This not data prep?)

Imperfect Information

Relational Semantic Network (ALVT-198)

Alternative Sensors

Natural Language

State Space Model

Label Smoothing


Human Behavior

Early Prediction




3D Shape Recognition




References: Representation Learning: A Review and New Perspectives Deep Learning of Representations: Looking Forward Group Equivariant Convolutional Networks

We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. Natural Language Understanding with Distributed Representation AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

This paper introduces AtomNet, the first structure-based, deep convolutional neural network designed to predict the bioactivity of small molecules for drug discovery applications. We demonstrate how to apply the convolutional concepts of feature locality and hierarchical composition to the modeling of bioactivity and chemical interactions. In further contrast to existing DNN techniques, we show that AtomNet’s application of local convolutional filters to structural target information successfully predicts new active molecules for targets with no previously known modulators.