Wikipedia describes the field as:
Deep learning (also called deep structural learning or hierarchical learning1) is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations.
Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, and music/audio signal recognition where they have been shown to produce state-of-the-art results on various tasks.
Alternatively, “deep learning” has been characterized as “just a buzzword for”,[5] or “largely a rebranding of”, neural networks.[6]
A comprehensive historical survey of methods can be found in:
In the new millennium, deep NNs have finally attracted wide-spread attention, mainly by outperforming alternative machine learning methods such as kernel machines (Vapnik, 1995; Scholkopf et al., 1998) in numerous important applications.
The bias/variance dilema is often addressed thorough strong prior assumptions. Weight decay“ encourages near-zero weights by penalizing larger weights.
In a Bayesian framework, weight decay can be derived from Gaussian or Laplacian weight priors (Hinton and van Camp (1993)).
Many UL methods automatically and robustly generate distributed, sparse representations of input patterns through well-known feature detectors such as off-center-on-surround-like structures, as well as orientation sensitive edge detectors and Gabor filter. They extract simple features related to those oserved in early visual pre-procesing states of biological systems (Jone and Palmer, (1987)).
This is the problem of vanishing or exploding gradients (aka.the long time lag problem).
Hessian-free optimization (Sec. 5.6.2) can alleviate the problem for FNNs (Møller, 1993; Pearlmutter, 1994; Schraudolph, 2002; Martens, 2010) (Sec. 5.6.2) and RNNs (Martens and Sutskever, 2011)(Sec. 5.20).
The space of NN weight matrices can also be searched without relying on error gradients, thus avoiding the Fundamental Deep Learning Problem altogether. Random weight guessing sometimes works better than more sophisticated methods (Hochreiter and Schmidhuber, 1996). Certain more complex problems are better solved by using Universal Search (Levin, 1973b) for weight matrix-computing programs written in a universal programming language (Schmidhuber, 1997). Some are better solved by using linear methods to obtain optimal weights for connections to output events (Sec. 2), and evolving weights of connections to other events—this is called Evolino (Schmidhuber et al., 2007).
Torch7 is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and provides a very efficient implementation, thanks to an easy and fast scripting language, LuaJIT, and an underlying C implementation.
Among other things, it provides:
- a powerful N-dimensional array
- lots of routines for indexing, slicing, transposing, …
- amazing interface to C, via LuaJIT
- linear algebra routines
- neural network, and energy-based models
- numeric optimization routines
Torch implementation of the softmax algorithm for convolutional neural networks in Lua.
Deep learning in Haskell.
Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009.
Yoshua Bengio, Aaron Courville, Pascal Vincent, pRepresentation Learning: A Review and New Perspectives](http://arxiv.org/abs/1206.5538), Arxiv, 2012.
The Deep Learning Tutorials are a walk-through with code for several important Deep Architectures (in progress; teaching material for Yoshua Bengio’s IFT6266 course).
Stanford’s Unsupervised Feature and Deep Learning tutorials has wiki pages and matlab code examples for several basic concepts and algorithms used for unsupervised feature learning and deep learning.
Geoffrey Hinton’s GoogleTech Talk, March 2010. Learning Deep Hierarchies of Representations * A general presentation done by Yoshua Bengio in September 2009, also at Google.
Geoffrey Hinton’s December 2007 Google TechTalk.
Geoffrey Hinton’s 2007 NIPS Tutorial [updated 2009] on Deep Belief Networks 3 hour video , ppt, pdf , readings
Geoffrey Hinton’s talk at Google about dropout and “Brain, Sex and Machine Learning”.
[LeCun et al 2006]. A Tutorial on Energy-Based Learning, in Bakir et al. (eds) “Predicting Structured Outputs”, MIT Press 2006: a 60-page tutorial on energy-based learning, with an emphasis on structured-output models. The tutorial includes an annotated bibliography of discriminative learning, with a simple view of CRF, maximum-margin Markov nets, and graph transformer networks.
A 2006 Tutorial an Energy-Based Learning given at the 2006 CIAR Summer School: Neural Computation & Adaptive Perception.[Energy-Based Learning: Slides in DjVu (5.2MB), Slides in PDF (18.2MB)][Deep Learning for Generic Object Recognition:Slides in DjVu (3.8MB), Slides in PDF (11.6MB)]
ECCV 2010 Tutorial
Feature learning for Image Classification (by Kai Yu and Andrew Ng): introducing a paradigm of feature learning from unlabeled images, with an emphasis on applications to supervised image classification.
NIPS 2010 Workshop
Deep Learning and Unsupervised Feature Learning Basic concepts about unsupervised feature learning and deep learning methods with links to papers and code.
Geoffrey Hinton’s Online Neural networks Course on Coursera.