Some of you may have read a discussion in the README file (which was
uploaded on Friday) on determining a good number of samples for computing
the log likelihoods.  Unfortunately, it describes an unrealistically
stringent criterion for deciding when two log likelihoods are
different. Thus you could have been misled, so this email is
specifically for you.

Here is a much better way of addressing the issue. Suppose we have
collected a set of log likelihoods, l_1,...,l_N. It can be shown,
under benign assumptions, that the true log likelihood, l, is found in
a so called "confidence interval" [l_low, l_high] with high
probability (e.g., 90%) [
http://en.wikipedia.org/wiki/Confidence_interval#Theoretical_example
], which we can compute as follows:

from scipy.stats.morestats import bayes_mvs
confidence_interval = bayes_mvs(data, alpha=.9)[0][1]

where the variable data is a list of numbers.

Although we can be certain that every two log likelihoods defined by
an RNN will different real numbers, it does not follow that we should
think of the all log likelihoods as of "different". Indeed, if we
can't certify their difference using 1000 random samples, then we
can reasonably declare the two log likelihoods as indistinguishable.
Although more samples will eventually determine that log A
is greater than log likelihood B, the difference will be so small
as to not be meaningful. Thus, here is a sensible thing you could do:

Either you show that the confidence intervals of the two log
likelihoods do not overlap for N<=1000 samples, in which case we
conclude that the log likelihoods really are different (and one bigger
than the other),
or show that the confidence intervals overlap at even N=1000 samples,
in which case we conclude that the log likelihoods are indistinguishable
with respect to 1000 samples.

Some of you might find that 1000 samples per log likelihood will
take too long to obtain, in which case you could
try using fewer samples.  Using these samples you would either find
that the log likelihoods differ, or that they are "indistinguishable
with respect to M samples".

Finally, note that you *don't have* to follow the above approach. If your
previous approach was sensible then you should keep using it. But if
you were unsure on how to compare the log likelihoods, consider using
this suggestion.

Ilya