Differences

This shows you the differences between two versions of the page.

Link to this comparison view

learning [2017/07/20 11:25] (current)
Line 1: Line 1:
 +https://​docs.google.com/​a/​codeaudit.com/​document/​d/​1caZUtQXSluYRsQppO9lZrJhVHO80xfigXhf8ZxfBgAg/​edit?​usp=sharing
 +
 +====== Learning Patterns======
 +
 +This chapter covers mechanisms that are known to lead to a trained model. ​  Why are neural networks able to generalize? Why does back-propagation eventually lead to convergence? ​ There are many questions that still are looking for a good theoretical explanation. ​ However, DL is an experimental science and it is known that the simplistic method of back-propagation is surprisingly effective.
 +
 +Early objections with regards to neural networks were that the equivalent optimization problem was likely to be convex. ​  What this meant was that it would be extremely difficult to train a model to reach convergence. ​  ​However recent research disproves this original intuition. ​ Rather, in high-dimensional spaces, it is more likely to find that a local minima is a saddle point and thus the higher probability that gradient descent will eventually find a way to continue to roll down the optimization hill.
 +
 +The requirements for back-propagation in Deep Learning is surprisingly simplistic. ​ If one is able to calculate the divergence of each of the layers with respect to its model parameters then one can apply it. Back-propagation works extremely well in discovering a convergence basin where a model has learned to generalize.
 +
 +This chapter covers recurring learning patterns we find in different neural network architectures. ​ At its most abstract form, learning is a credit assignment problem. ​ As a consequence of observed data, which parts of a model do we need to change and by how much?  We will explore many of techniques that have been shown to be effective in practice. ​
 +
 +{{http://​main-alluviate.rhcloud.com/​wp-content/​uploads/​2016/​06/​learning.png}}
 +
 +[[Relaxed Backpropagation]] =[[Credit Assignment]]
 +
 +[[Stochastic Gradient Descent]] ​
 +
 +[[Natural Gradient Descent]]
 +
 +[[Random Orthogonal Initialization]] ​
 +
 +[[Transfer Learning]] ​
 +
 +
 +
 +[[Curriculum Training]] ​
 +
 +[[DropOut]]
 +
 +[[Domain Adaptation]]
 +
 +[[Unsupervised Pretraining]] ​
 +
 +[[Differential Training]]
 +
 +[[Genetic Algorithm]]
 +
 +[[Unsupervised Learning]]
 +
 +[[Mutable Layer]]
 +
 +[[Program Induction]]
 +
 +[[Learning to Optimize]] note: Different from Meta-learning
 +
 +[[Simulated Annealing]]
 +
 +[[Meta-Learning]]
 +
 +[[Continuous Learning]]
 +
 +[[Feedback Network]]
 +
 +[[Network Generation]]
 +
 +[[Learning to Purpose]]
 +
 +[[Planning to Learn]]
 +
 +[[Exploration]]
 +
 +[[Learning to Communicate]]
 +
 +[[Predictive Learning]]
 +
 +[[Temporal Learning]]
 +
 +[[Intrinsic Decomposition]]
 +
 +[[Herding]]
 +
 +[[Active Learning]]
 +
 +[[Primal Dual]]
 +
 +[[Transport Related]]
 +
 +[[Structure Evolution]]
 +
 +[[Self-Supervised Learning]]
 +
 +[[Knowledge Gradient]]
 +
 +[[Option Discovery]]
 +
 +[[Infusion Learning]]
 +
 +[[Ensemble Reinforcement Learning]]
 +
 +[[Learning from Demonstration]]
 +
 +[[Egomotion]]
 +
 +[[Iterative Teaching]]
 +
 +[[Reasoning by Analogy]]
 +
 +**References**
 +
 +https://​arxiv.org/​pdf/​1606.04838v1.pdf Optimization Methods for Large-Scale Machine Learning
 +
 +we present a comprehensive
 +theory of a straightforward,​ yet versatile SG algorithm, discuss its practical behavior,
 +and highlight opportunities for designing algorithms with improved performance. This leads to
 +a discussion about the next generation of optimization methods for large-scale machine learning,
 +including an investigation of two main streams of research on techniques that diminish noise in
 +the stochastic directions and methods that make use of second-order derivative approximations.
 +
 +Recent Advances in Non-Convex Optimization
 +and its Implications to Learning ​ Anima Anandkumar ​ ICML 2016 Tutorial
 +
 +http://​www-anw.cs.umass.edu/​~barto/​courses/​cs687/​williams92simple.pdf ​ Simple Statistical Gradient Following for Connectionist Reinforcement Learning