NOTES ON THE VERSION OF 1998-08-02

Changes in this version:

1) The documentation has been revised, and extended to cover the features
   described below.

2) Distributions can now be specified by giving an arithmetic formula
   for the "energy", and the various Markov chain methods can then be 
   applied to sample from this distribution.  Bayesian posterior 
   distributions can also be specified, by giving the prior and the
   likelihood.  This is meant primarily for demonstrating the Markov
   chain methods.  Some real problems might be solvable with these
   facilities, but many features that would often be needed have not
   been provided.  See the new introductory documentation, and dist.doc 
   for details.

3) A calculator program (see calc.doc) has been written, mainly as
   a demo and test of the routines for evaluating arithmetic formulas
   needed for the 'dist' programs (see formula.doc).

4) Annealed Importance Sampling (AIS) is now supported for neural network 
   models and for Bayesian models sampled from using the 'dist' module.  
   See the technical report available from my web page for a description 
   of Annealed Importance Sampling.  See mc-spec.doc for details of how
   to use AIS.

5) A 'Q' quantity is now provided for mc log files, allowing access
   to the importance weights when annealed importance sampling is used. 
   See mc-quantities.doc for details.

6) New "met-values" and "mh-values" operations have been implemented for 
   updating the latent values in a Gaussian process model.  See gp-mc.doc
   for details.

7) The matrix operations for Gaussian process prediction and energy
   computation have been improved, so that they now do not explicitly
   compute the inverse covariance matrix except when this is needed for
   calculating derivatives of the energy.  Forward and backward 
   substitution with the Cholesky decomposition is used instead.  This
   is faster and possibly more accurate.  Unfortunately, there is little
   speed improvement for hybrid Monte Carlo, since it needs derivatives.

8) Two new Markov chain update operations for mixture models have been
   added: gibbs1-indicators and met1-indicators.  These are mostly useful
   for models with an infinite number of components.  See mix-mc.doc for 
   details.

9) The energy for neural network models is now exactly minus the log of
   the probability of the training targets given the current weights and
   noise hyperparameters.  Previously, terms that were constant or that
   involved only the noise hyperparameters were omitted, as they weren't
   relevant for sampling the weights.  The full form is needed for 
   annealing/tempering schemes, and for computation of the marginal
   likelihood.

10)The energy for Gaussian process models is also now exactly minus the
   log of the probability of the targets or latent values given the
   current hyperparameters and case-specific noise-variances, plus minus
   the log of the prior for the hyperparameters.  This is for consistency 
   with the change above for neural network models, and makes the likelihood
   equal to minus the energy when the hyperparameters are all constant.

11)Options for finding 10% and 90% quantiles of predictive distributions 
   have been added.  See net-pred.doc and gp-pred.doc for details.

12)Components of additive models can now be examined using gp-pred.  See
   the documentation on the options 1-9 in gp-pred.doc.

13)Each application now has a "tbl" as well as a "plt" program.  The "tbl"
   programs are very similar to the "plt" programs, but output data in
   a different order when more than one quantity is "plotted", which is 
   better for manual examination, and for some plot programs and statistical 
   packages.  See xxx-tbl.doc for details.

14)The quantity 'E0' is now defined, giving the energy at inverse 
   temperature zero, along with 'E1', which is E minus E0.  See 
   mc-quantities.doc for details.  

15)Quantities 'F1' and 'F2' are now provided to allow ratios of 
   normalizing constants to be estimated using tempered transitions.  
   See mc-quantities.doc for details.  (The old meaning of 'F' no longer 
   exists.  It previously was the same as 'f' divided by two.  I must 
   once have thought this was convenient, but I can see no good use for 
   it now.)  

16)The mc-temp-sched command has been extended to allow arithmetic as 
   well as geometric sequences of inverse temperatures.

17)The maximum size of a tempering schedule has been increased to 2001
   from 1001.  This makes old log files that used the tempering features
   unreadable by the new version.

18)The heuristics for setting the stepsizes for the top-level linear
   and relevance hyperparameters for GP models were changed for the
   case where there are no lower-level hyperparameters (the stepsize
   now does not depend on the number of inputs).  This could possibly
   result in a high rejection rate if you use a stepsize designed for
   the previous version.  Reducing the stepsize by a factor of the
   square root of the number of inputs should fix any such problems.

19)A "Cauchy" style covariance function has been implemented.  It is
   obtained by setting the "power" to -1.  See gp-spec.doc for details.

20)The 'z' option for net-pred has been removed.  It was a fudge used
   for only one dataset, and is now inconvenient to retain.


Compatibility with past versions.

As far as I know, the only potential program compatibility problems
are the removal of two obscure options (see (15) and (20) above), and
the change in stepsize heuristics in gp-mc (see (18) above). 

Log files created with the previous version should be readable with
this version as long as they didn't contain a tempering schedule (the
maximum size of which was increased in this version).


Bug fixes.

Fixed error check on jitter prior in GP models.  Fixed bug in gp-gen
that in some circumstances prevented proper scaling of relevance
priors with "x" specified.  Fixed bug in net-gen that sometimes (when
"fix" was used) set hyperparameters that according to the spec were
supposed to have pre-determined values.  Fixed bug that caused the "l"
quantity for Gaussian process models of "class" data to be incorrect.
Fixed an error in the gradient calculation in GP models with
case-by-case variances.


Known bugs and other deficiencies.

1) The facility for plotting quantities using "plot" operations in xxx-mc
   doesn't always work for the first run of xxx-mc (before any
   iterations exist in the log file).  A work-around is to do a run of
   xxx-mc to produce just one iteration before attempting a run of
   xxx-mc that does any "plot" operations.

2) The CPU time features (eg, the "k" quantity) will not work correctly
   if a single iteration takes more than about 71 minutes.

3) The latent value update operations for Gaussian processes may recompute 
   the inverse covariance matrix even when an up-to-date version was 
   computed for the previous Monte Carlo operation.

4) Covariance matrices are stored in full, even though they are symmetric,
   which sometimes costs a factor of two in memory usage.