Non-parametric density estimation (Rev #1)

**Non-parametric density estimation** is a class of methods for approximating a probability distrbution *directly* in term of a set of samples drawn from it.

Classical statistics represents *probability distributions* in terms of *parametric* closed formulae, e.g., normal distributions, etc, and chooses parameters which *best fit* the function to a set of samples. The success of these methods primarily depends on whether a closed formula close to the distribution is in the statistican’s toolbox. They also have the problem that they do not distinguish between which parts of the distribution are supported by samples and which are based purely on the function and its parameters.

In contrast, non-paramteric density estimation uses some method to construct the density estimate $\hat{p}(x)$ directly from the samples $\{x_i\}_{i=1}^N$ from $p(x)$. It can also generally be determined if a query about $\hat{p}(x)$ is made in an area in which we have no samples, so that we may be more cautious about reliability.

For example, the method of *kernel density estimation (KDE)* places a localised shaped “probability mass” (called a *kernel*) at each sample location and takes the probability density at a point $x$ to be the sum of all the probability masses, viz:

$\hat{p}(x)=\frac{1}{N} \sum_{i=1}^N K(|x-x_i|;\theta)$

where the kernel function $K$ has $\int_x K(x;\theta)=1$. For example, the *Epanechnikov kernel* has

$K(\delta;\theta)=\frac{3(\theta^2-\delta^2)}{2\theta^3} \qquad if \quad\delta\le\theta$

$K(\delta;\theta)=0 \qquad if \quad\delta\gt\theta$

This $\hat{p}(x)$ is clearly an approximation since it is possible, although unlikely, that any finite set of samples may be unrepresentative of the underlying distribution. One source of confusion is that a non-parametric model does generally have *some* parameters. These parameters have to be set by some method, whether manual experimentation or some more principled method, and do affect the quality of the estimates but are generally less critical than setting parameters to parametric distributions correctly.

In this case $\theta$ generally determines the spread of $K$. In general, non-parametric density estimators can be shown to converge to the probability distribution generating the samples in the limit as the number of samples goes to infinity and the parameters tend to zero.

- Non-parametric statistics, Wikipedia.

category: methodology