The Azimuth Project
Deep learning (Rev #2)

Idea

Wikipedia describes the field as:

Deep learning (also called deep structural learning or hierarchical learning1) is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations.

Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, and music/audio signal recognition where they have been shown to produce state-of-the-art results on various tasks.

Alternatively, “deep learning” has been characterized as “just a buzzword for”,[5] or “largely a rebranding of”, neural networks.[6]

Details

A comprehensive historical survey of methods can be found in:

In the new millennium, deep NNs have finally attracted wide-spread attention, mainly by outperforming alternative machine learning methods such as kernel machines (Vapnik, 1995; Scholkopf et al., 1998) in numerous important applications.

Simple, low-complexity, problem-solving NNs

The bias/variance dilema is often addressed thorough strong prior assumptions. Weight decay“ encourages near-zero weights by penalizing larger weights.

In a Bayesian framework, weight decay can be derived from Gaussian or Laplacian weight priors (Hinton and van Camp (1993)).

Many UL methods automatically and robustly generate distributed, sparse representations of input patterns through well-known feature detectors such as off-center-on-surround-like structures, as well as orientation sensitive edge detectors and Gabor filter. They extract simple features related to those oserved in early visual pre-procesing states of biological systems (Jone and Palmer, (1987)).

1991: The fundamental deep learning problem of gradient descent

This is the problem of vanishing or exploding gradients (aka.the long time lag problem).

Hessian-free optimization (Sec. 5.6.2) can alleviate the problem for FNNs (Møller, 1993; Pearlmutter, 1994; Schraudolph, 2002; Martens, 2010) (Sec. 5.6.2) and RNNs (Martens and Sutskever, 2011)(Sec. 5.20).

The space of NN weight matrices can also be searched without relying on error gradients, thus avoiding the Fundamental Deep Learning Problem altogether. Random weight guessing sometimes works better than more sophisticated methods (Hochreiter and Schmidhuber, 1996). Certain more complex problems are better solved by using Universal Search (Levin, 1973b) for weight matrix-computing programs written in a universal programming language (Schmidhuber, 1997). Some are better solved by using linear methods to obtain optimal weights for connections to output events (Sec. 2), and evolving weights of connections to other events—this is called Evolino (Schmidhuber et al., 2007).

Resources

Videos

Software

Torch7 is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and provides a very efficient implementation, thanks to an easy and fast scripting language, LuaJIT, and an underlying C implementation.

Among other things, it provides:

  • a powerful N-dimensional array
  • lots of routines for indexing, slicing, transposing, …
  • amazing interface to C, via LuaJIT
  • linear algebra routines
  • neural network, and energy-based models
  • numeric optimization routines

Torch implementation of the softmax algorithm for convolutional neural networks in Lua.

Deep learning in Haskell.

References

Convolutional neural networks

  • Michael Mathieu, Mikael Henaff, Yann LeCun, Fast Training of Convolutional Networks through FFTs

Hessian-free networks

  • Martens, J. (2010). Deep learning via Hessian-free optimization. In Fernkranz, J. and Joachims, T., editors, Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 735–742, Haifa, Israel. Omnipress.
  • Martens, J. and Sutskever, I. (2011). Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th International Conference o:

‘ Hierarchical neural networks

  • SchmidtHuber-1404.7828.md:* Ranzato, M. A., Huang, F., Boureau, Y., and LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Proc. Computer Vision and Pattern Recognition Conference (CVPR’07), pages 1–8. IEEE Press
  • Raiko, T., Valpola, H., and LeCun, Y. (2012). Deep learning made easier by linear transformations in perceptrons. In International Conference on Artificial Intelligence and Statistics, pages 924–932.
  • SchmidtHuber-1404.7828.md:* Ranzato, M., Poultney, C., Chopra, S., and LeCun, Y. (2006). Efficient learning of sparse representations with an energy-based model. In et al., J. P., editor, Advances in Neural Information Processing Systems (NIPS 2006). MIT Press.

Monoidal neural networks

Survey Papers on Deep Learning

Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009.

Yoshua Bengio, Aaron Courville, Pascal Vincent, pRepresentation Learning: A Review and New Perspectives](http://arxiv.org/abs/1206.5538), Arxiv, 2012.

Deep Learning Code Tutorials

The Deep Learning Tutorials are a walk-through with code for several important Deep Architectures (in progress; teaching material for Yoshua Bengio’s IFT6266 course).

Unsupervised Feature and Deep Learning

Stanford’s Unsupervised Feature and Deep Learning tutorials has wiki pages and matlab code examples for several basic concepts and algorithms used for unsupervised feature learning and deep learning.

‘ Videos

Deep Learning Representations

  • Yoshua Bengio’s Google tech talk on Deep Learning Representations at Google Montreal (Google Montreal, 11/13/2012)
  • Deep Learning with Multiplicative Interactions Geoffrey Hinton’s talk at the Redwood Center for Theoretical Neuroscience (UC Berkeley, March 2010).

Recent developments on Deep Learning

Geoffrey Hinton’s GoogleTech Talk, March 2010. Learning Deep Hierarchies of Representations * A general presentation done by Yoshua Bengio in September 2009, also at Google.

A New Generation of Neural Networks

Geoffrey Hinton’s December 2007 Google TechTalk.

Deep Belief Networks

Geoffrey Hinton’s 2007 NIPS Tutorial [updated 2009] on Deep Belief Networks 3 hour video , ppt, pdf , readings

Training deep networks efficiently

Geoffrey Hinton’s talk at Google about dropout and “Brain, Sex and Machine Learning”.

Deep Learning and NLP

  • Yoshua Bengio and Richard Socher’s talk, “Deep Learning for NLP(without magic)” at ACL 2012. Tutorial on Learning Deep Architectures
  • Yoshua Bengio and Yann LeCun’s presentation at “ICML Workshop on Learning Feature Hiearchies” on June 18th 2009.

Energy-based Learning

  • [LeCun et al 2006]. A Tutorial on Energy-Based Learning, in Bakir et al. (eds) “Predicting Structured Outputs”, MIT Press 2006: a 60-page tutorial on energy-based learning, with an emphasis on structured-output models. The tutorial includes an annotated bibliography of discriminative learning, with a simple view of CRF, maximum-margin Markov nets, and graph transformer networks.

  • A 2006 Tutorial an Energy-Based Learning given at the 2006 CIAR Summer School: Neural Computation & Adaptive Perception.[Energy-Based Learning: Slides in DjVu (5.2MB), Slides in PDF (18.2MB)][Deep Learning for Generic Object Recognition:Slides in DjVu (3.8MB), Slides in PDF (11.6MB)]

  • ECCV 2010 Tutorial

  • Feature learning for Image Classification (by Kai Yu and Andrew Ng): introducing a paradigm of feature learning from unlabeled images, with an emphasis on applications to supervised image classification.

  • NIPS 2010 Workshop

Deep Learning and Unsupervised Feature Learning Basic concepts about unsupervised feature learning and deep learning methods with links to papers and code.

Geoffrey Hinton’s Online Neural networks Course on Coursera.