The Azimuth Project
Non-parametric density estimation (changes)

Showing changes from revision #3 to #4: Added | Removed | Changed

Non-parametric density estimator


Non-parametric density estimation is a class of methods for approximating a probability distrbution directly in term of a set of samples drawn from it.


Classical statistics represents probability distributions in terms of parametric closed formulae, e.g., normal distributions, etc, and chooses parameters which best fit the function to a set of samples. The success of these methods primarily depends on whether a closed formula close to the distribution is in the statistican’s toolbox. They also have the problem that they do not distinguish between which parts of the distribution are supported by samples and which are based purely on the function and its parameters.

In contrast, non-paramteric density estimation uses some method to construct the density estimate p^(x)\hat{p}(x) directly from the samples {x i} i=1 N\{x_i\}_{i=1}^N from p(x)p(x). It can also generally be determined if a query about p^(x)\hat{p}(x) is made in an area in which we have no samples, so that we may be more cautious about reliability.

Example: kernel density estimation

For example, the method of kernel density estimation (KDE) places a localised shaped “probability mass” (called a kernel) at each sample location and takes the probability density at a point xx to be the sum of all the probability masses, viz:

p^(x)=1N i=1 NK(|xx i|;θ)\hat{p}(x)=\frac{1}{N} \sum_{i=1}^N K(|x-x_i|;\theta)

where the kernel function KK has xK(x;θ)=1\int_x K(x;\theta)=1. For example, the Epanechnikov kernel has

K(δ;θ)=3(θ 2δ 2)2θ 3ifδθK(\delta;\theta)=\frac{3(\theta^2-\delta^2)}{2\theta^3} \qquad if \quad\delta\le\theta
K(δ;θ)=0ifδ>θK(\delta;\theta)=0 \qquad if \quad\delta\gt\theta

This p^(x)\hat{p}(x) is clearly an approximation since it is possible, although unlikely, that any finite set of samples may be unrepresentative of the underlying distribution. One source of confusion is that a non-parametric model does generally have some parameters. These parameters have to be set by some method, whether manual experimentation or some more principled method, and do affect the quality of the estimates but are generally less critical than setting parameters to parametric distributions correctly.

In this case θ\theta generally determines the spread of KK. In general, non-parametric density estimators can be shown to converge to the probability distribution generating the samples in the limit as the number of samples goes to infinity and the parameters tend to zero.


sex shop sex shop sex shop sex shop sex shop lingerie sex shop atacado calcinhas uniformes profissionais uniformes dicas de sexo