An artificial neural network or ANN is a machine learning method used for regression and classification.
Artificial neurons were first proposed in 1943 by Warren \McCulloch, a neurophysiologist, and Walter Pitts, a logician. The concept of a neural network appears to have first been proposed by Alan Turing in his 1948 paper “Intelligent Machinery”.
“Connectionism” is a set of approaches in the fields of artificial intelligence, cognitive psychology, cognitive science, neuroscience and philosophy of mind, that models mental or behavioral phenomena as the emergent processes of interconnected networks of simple units.
In a neural network model, simple nodes, which can be called “neurons”, “processing elements” or “units”, are connected together to form a network. While a neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.
ANNs can be used to approximate a function $g:\mathbb{R}^n \to \mathbb{R}^m$. Here $g$ is unknown but there are training examples $x^{(i)} \in \mathbb{R}^n$ and $z^{(i)} = g(x^{(i)}) \in \mathbb{R}^m$. The approximation is a function $f(x;\theta)$ where $\theta$ is a vector or list of parameters. A training algorithm is used to choose $\theta$ by minimising a cost which measures how different $f$ and $g$ are on the training set.
The function $f(x;\theta)$ can be quite general. Almost any structural form for $f$ which can be represented as a network could be called an ANN. A common form is
where $W$ and $V$ are matrices, $w$ and $v$ are vectors, and the function $\alpha:\mathbb{R} \to \mathbb{R}$ is applied component-wise to vectors. The number of rows in $W$ can be anything. The dimensions of the other quantities follow from this and $n$ and $m$. A common choice for $\alpha$ is the logistic function. Any continuous $g$ can be approximated uniformly on compacta by such an $f$ (Proposition 5.1, Ripley’s book). The catch is that $W$ may need a huge number of rows.
Suppose we have a time series $s_1, s_2, \dots$. We could take $n$ consecutive values $s_i, \dots s_{i+n-1}$ as our $x^{(i)}$ and the following value $s_{i+n}$ as our $z^{(i)}$ (so $m=1$). Note that $W$ could smooth the time series, integrate or differentiate it (as a discrete approximation), calculate a Fourier or wavelet coefficient and so on, or even do all these things simulataneously.
In general, an attractor network is a network of nodes (i.e., neurons in a biological network), often recurrently connected, whose time dynamics settle to a stable pattern.
A Hopfield network is a recurrent neural network having synaptic connection pattern such that there is an underlying Lyapunov function for the activity dynamics. Started in any initial state, the state of the system evolves to a final state that is a (local) minimum of the Lyapunov function.
Also see Deep learning.
This book is an introduction to the application of statistical and neural techniques to the problem of classification, which is particularly suitable if you have a background in statistics:
Attractor networks:
Dr. Chris Eliasmith, Attractor networks, Scholarpedia.
Chris Eliasmith, A unified approach to building and controlling spiking attractor networks, 2004.
Hopfield networks:
Also see chapters 13 and also 12 of this online book:
Here is a random sampling of recent (circa 2012) papers on ‘deep networks’:
Mallat, Classification with Deep Invariant Scattering Networks. http://nips.cc/Conferences/2012/Program/event.php?ID=3127
Dean et. al, Large Scale Distributed Deep Networks. http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf
Krizhevsky, Sutskever, Hinton. ImageNet Classification with Deep Convolutional Neural Networks. http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf