Kernel density estimation (KDE)¶
Similar to Gaussian mixture model, but instead of choosing K number of cluster centers we allocate one cluster center per data point, so \(\mu_i = x_i\). The model:
\[
p(x|D) = \frac{1}{N} \sum_{i=1}^NN(x|x_i, \sigma^2I)
\]
Which can be generalized as:
\[
\hat{p}(x) = \frac{1}{N} \sum_{i=1}^N \mathcal{k}_h(x - x_i)
\]
This is called a Parzen window density estimator, or kernel density estimator (KDE), and is a simple non-parametric density model. The advantage over a parametric model is that no model fitting is required (except for tuning the bandwidth, usually done by cross-validation). and there is no need to pick K . The disadvantage is that the model takes a lot of memory to store, and a lot of time to evaluate. It is also of no use for clustering tasks.
Essentially this model results in an smooth histogram.