Differences

This shows you the differences between two versions of the page.

Link to this comparison view

feature [2017/05/21 14:03] (current)
Line 1: Line 1:
 +https://​docs.google.com/​a/​codeaudit.com/​document/​d/​1lgOBtgCk0gi20G6f1DEepNgFPj_31Bt41Otesc5ytOs/​edit?​usp=sharing
  
 +======Feature Patterns======
 +
 +This chapter explores recurring patterns that correspond to the features of the data. Data will consist of multi-dimensional vectors that are collections of features. ​   We originally had thought of using the word '​representation',​ however to avoid confusion with other literature in deep learning, we use the word '​feature'​ instead. ​ In fact, the term '​feature map' is used in the subject of convolution networks.
 +
 +The feature space maintains a kind of duality with the model space. ​ This arises from the observation that the model space performs similarity operations between model and feature vectors. ​ Therefore there should be some trait of compatibility between the model and feature spaces. ​ However, with the exception of a inner product similarity operation found in traditional neural networks, we cannot assume that the model and feature spaces are in the same space. ​ Many papers appear to assume that the same space, and it reflects in the ambiguity of the use of the word representation. ​ However, this is likely due to historical reasons and maybe bad habits are hard to let go of. 
 +
 +Feature patterns discusses the many ways that we can conform different kinds of data such as images, speech, events, signals, ​ language, categories, trees, graphs and other more complex forms. ​ This is a subject that is important for real world practical application. ​ Most deep learning research will involve the more common real valued vector as input. ​ This if of course is too liming a scope.  ​
 +
 +
 +
 +{{http://​main-alluviate.rhcloud.com/​wp-content/​uploads/​2016/​05/​Preparation.png}}
 +
 +(TODO: Fix graph) ​
 +
 +The most well known of techniques is called Neural Embedding. ​ An example of Neural Embedding is the Word2Vec method that was introduced by an employee at Google. ​  ​Word2Vec is able to learn how to map words into a low dimensional space by sampling a huge number of sentences. ​ The approach has been shown to be surprisingly effective in capturing the semantics of words. ​ So vectors in the low dimensional space represent words. ​ In this space if you perform the vector arithmetic of a vector representing king and subtracting one representing man, the result will be surprisingly the vector that represents queen. ​ This new space is able to approximate a kind of metric space that capture the words semantics. ​  ​Neural Embedding is a recurring method and there have been many approaches that have been implemented since.
 +
 +The pattern of Reusable Features is one of the most intriguing capabilities of deep learning systems. ​  ​Resusable features are feature spaces that are trained by one network and reused in an another network. ​ Reusable features are a consequence of Unsupervised Pre-Training. ​ Its first use was for bootstrapping training. ​ That is, the method of training of Autoencoders and then stacking them to build a multi-layered solution was the key development that revealed the effectiveness of deep learning. ​ Today, unsupervised pre-training isn't practiced as often as in the past, however most practice begins with a trained network as an initialization state to start training. ​ The fact that previously trained models can be used as a basis of a new solution is one of the big advantages of the deep learning approach.  ​
 +
 +
 +
 +<​del>​The grunt work of data science involves a lot of data preparation. ​ This is the activity that practitioners detest the most.  Unfortunately,​ it is an extremely important area despite the claim that DL systems require very little feature engineering. ​  
 +
 +The data that a Machine digests are not arbitrary and need to exist in a certain form.  So for example, if images where being processed, data of a 512 by 512 image will consist of a single vector with 3 x 512 x 512 dimensions. ​ The key constraint for DL systems is that the dimension is a fixed vector and each element in the vector is a real number.  ​
 +
 +Unfortunately,​ observations in the real world do not consist only of a fixed dimension of real numbers. ​ Data that convey semantics like text and graphs do not fit so well within a fixed box.  Therefore, one will need to rely on several techniques to massage the data into a suitable Representation. ​ This chapter covers many of these Representation techniques found in practice. ​ This chapter focuses on the techniques required to mold data into a Representation that works well with neural networks. ​ As we shall also see, some of these techniques will even exploit the use of ANNs.
 +</​del>​
 +
 +[[Neural Embedding]]
 +
 +[[Reusable Representation]]
 +
 +[[Data Augmentation]]  ​
 +
 +[[Data Synthesis]]  ​
 +
 +[[Missing Values]]
 +
 +
 +[[Feature Normalization]]
 +
 +[[Dimensionless Features]]
 +
 +[[Quantized Data]]
 +
 +[[Categorical Data]]
 +
 +[[Textual Features]]
 +
 +[[Vector Concatenation]]
 +
 +[[Batch Normalization]]
 +
 +[[Sketching]]
 +
 +[[Dimensional Reduction]]
 +
 +Imputting (GLRM)
 +
 +[[Fingerprinting]]
 +
 +[[Graph Embedding]]
 +
 +[[Parse Tree]]
 +
 +[[Adversarial Features]]
 +
 +[[Disentangled Representation]]
 +
 +[[Propositionalization]] (move to collective learning)
 +
 +[[User Embedding]] Contextual Embedding (This not data prep?)
 +
 +[[Imperfect Information]]
 +
 +[[Relational Semantic Network]] (ALVT-198)
 +
 +[[Alternative Sensors]]
 +
 +[[Natural Language]]
 +
 +[[State Space Model]]
 +
 +[[Label Smoothing]]
 +
 +[[Handwriting]]
 +
 +[[Human Behavior]]
 +
 +[[Early Prediction]]
 +
 +[[Audio]]
 +
 +[[Video]]
 +
 +[[Pose]]
 +
 +[[3D Shape Recognition]]
 +
 +[[Context]]
 +
 +[[Reputation]]
 +
 +[[Emotion]]
 +
 +References:
 +
 +http://​arxiv.org/​abs/​1206.5538 Representation Learning: A Review and New Perspectives
 +
 +
 +http://​arxiv.org/​abs/​1305.0445 Deep Learning of Representations:​ Looking Forward
 +
 +http://​arxiv.org/​pdf/​1602.07576v3.pdf ​ Group Equivariant Convolutional Networks
 +
 +We introduce Group equivariant Convolutional
 +Neural Networks (G-CNNs), a natural generalization
 +of convolutional neural networks that reduces
 +sample complexity by exploiting symmetries.
 +G-CNNs use G-convolutions,​ a new type of
 +layer that enjoys a substantially higher degree of
 +weight sharing than regular convolution layers.
 +
 +http://​smerity.com/​articles/​2016/​ml_not_magic.html ​
 +
 +http://​arxiv.org/​abs/​1511.07916 Natural Language Understanding with Distributed Representation
 +
 +https://​www.linkedin.com/​pulse/​feature-engineering-data-scientists-secret-sauce-ashish-kumar
 +
 +http://​arxiv.org/​pdf/​1510.02855.pdf AtomNet: A Deep Convolutional Neural Network for
 +Bioactivity Prediction in Structure-based Drug
 +Discovery
 +
 +This paper introduces AtomNet, the first structure-based,​ deep convolutional
 +neural network designed to predict the bioactivity of small molecules for drug discovery
 +applications. We demonstrate how to apply the convolutional concepts of
 +feature locality and hierarchical composition to the modeling of bioactivity and
 +chemical interactions. In further contrast to existing DNN techniques, we show
 +that AtomNet’s application of local convolutional filters to structural target information
 +successfully predicts new active molecules for targets with no previously
 +known modulators.