Lecture 10a: Nearest neighbor and kernel density


Some good and bad properties of

	histograms as density estimators

•

There is no need to fit a model to the data.

–

We just compute some very simple statistics (the

number of datapoints in each bin) and store them.

•

The number of bins is exponential in the dimensionality

of the dataspace. So high-dimensional data is tricky:

–

We must either use big bins or get lots of zero counts

(or adapt the local bin-width to the density)

•

The density has silly discontinuities at the bin

boundaries.

–

We must be able to do better by some kind of

smoothing.