The Azimuth Project
Neural network


An artificial neural network or ANN is a machine learning method used for regression and classification.


Artificial neurons were first proposed in 1943 by Warren \McCulloch, a neurophysiologist, and Walter Pitts, a logician. The concept of a neural network appears to have first been proposed by Alan Turing in his 1948 paper “Intelligent Machinery”.

“Connectionism” is a set of approaches in the fields of artificial intelligence, cognitive psychology, cognitive science, neuroscience and philosophy of mind, that models mental or behavioral phenomena as the emergent processes of interconnected networks of simple units.

In a neural network model, simple nodes, which can be called “neurons”, “processing elements” or “units”, are connected together to form a network. While a neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.



ANNs can be used to approximate a function g: n mg:\mathbb{R}^n \to \mathbb{R}^m. Here gg is unknown but there are training examples x (i) nx^{(i)} \in \mathbb{R}^n and z (i)=g(x (i)) mz^{(i)} = g(x^{(i)}) \in \mathbb{R}^m. The approximation is a function f(x;θ)f(x;\theta) where θ\theta is a vector or list of parameters. A training algorithm is used to choose θ\theta by minimising a cost which measures how different ff and gg are on the training set.

The function f(x;θ)f(x;\theta) can be quite general. Almost any structural form for ff which can be represented as a network could be called an ANN. A common form is

f(x;W,w,V,v)=Vα(Wx+w)+v f(x;W,w,V,v) = V \alpha(W x + w) + v

where WW and VV are matrices, ww and vv are vectors, and the function α:\alpha:\mathbb{R} \to \mathbb{R} is applied component-wise to vectors. The number of rows in WW can be anything. The dimensions of the other quantities follow from this and nn and mm. A common choice for α\alpha is the logistic function. Any continuous gg can be approximated uniformly on compacta by such an ff (Proposition 5.1, Ripley’s book). The catch is that WW may need a huge number of rows.

Example of time series

Suppose we have a time series s 1,s 2,s_1, s_2, \dots . We could take nn consecutive values s i,s i+n1s_i, \dots s_{i+n-1} as our x (i)x^{(i)} and the following value s i+ns_{i+n} as our z (i)z^{(i)} (so m=1m=1). Note that WW could smooth the time series, integrate or differentiate it (as a discrete approximation), calculate a Fourier or wavelet coefficient and so on, or even do all these things simulataneously.

Types of ANN

Attractor network

In general, an attractor network is a network of nodes (i.e., neurons in a biological network), often recurrently connected, whose time dynamics settle to a stable pattern.

Hopfield network

A Hopfield network is a recurrent neural network having synaptic connection pattern such that there is an underlying Lyapunov function for the activity dynamics. Started in any initial state, the state of the system evolves to a final state that is a (local) minimum of the Lyapunov function.

Deep network

Also see Deep learning.


This book is an introduction to the application of statistical and neural techniques to the problem of classification, which is particularly suitable if you have a background in statistics:

Attractor networks:

Hopfield networks:

Also see chapters 13 and also 12 of this online book:

Here is a random sampling of recent (circa 2012) papers on ‘deep networks’: