demos I wrote

regular neural net

The demo below shows a pre-trained neural network, where two weights in the first layer are free, while the rest are fixed. Left shows training loss, right shows test loss. Experiment with loss by moving the dot and observe overfitting when training loss is minimized(at the reddest point).

Deep
Shallow
Linear

Now we train the network (lag warning), click the train button below. Observe the loss surfaces changing (because it shows a 2d slice of a D>2 dimension loss space.) However, left and right loss surface still don't match, which means overfitting remains.

Markov Chain Monte Carlo

Gaussian proposal random walk MCMC, a way to guarantee convergence to a stationary distribution (exact inference). In this case, a stationary distribution of the parameters. Green = accepted, red = rejected.

Gaussian Random Walk Proposal on Training Data

The Kullback-Liebler divergence

We could approximate the underlying distribution, instead of guaranteeing convergence at expense of time with exact inference, we try to approximate it quickly. (Variational inference) But we need a measure of distance between our approximate distribution and true distribution. Kullback-Liebler divergence is one such measure. Experiment with KL in the demo below, try to match the two distributions by dragging around the slider.

Mean μ = 0.0
Standard Deviation = 1.0
KL
Reverse KL
Jensen-Shannon divergence

Stochastic Variational Inference

Now instead of point estimate of each weight, we impose a gaussian distribution on each parameter. With a uniform prior, we minimize the KL divergence by maximizing Evidence Lower BOund or ELBO. Now observe that we are fitting a distribution to the training set.

Training Loss
Test Loss