Some of you may have read a discussion in the README file (which was uploaded on Friday) on determining a good number of samples for computing the log likelihoods. Unfortunately, it describes an unrealistically stringent criterion for deciding when two log likelihoods are different. Thus you could have been misled, so this email is specifically for you. Here is a much better way of addressing the issue. Suppose we have collected a set of log likelihoods, l_1,...,l_N. It can be shown, under benign assumptions, that the true log likelihood, l, is found in a so called "confidence interval" [l_low, l_high] with high probability (e.g., 90%) [ http://en.wikipedia.org/wiki/Confidence_interval#Theoretical_example ], which we can compute as follows: from scipy.stats.morestats import bayes_mvs confidence_interval = bayes_mvs(data, alpha=.9)[0][1] where the variable data is a list of numbers. Although we can be certain that every two log likelihoods defined by an RNN will different real numbers, it does not follow that we should think of the all log likelihoods as of "different". Indeed, if we can't certify their difference using 1000 random samples, then we can reasonably declare the two log likelihoods as indistinguishable. Although more samples will eventually determine that log A is greater than log likelihood B, the difference will be so small as to not be meaningful. Thus, here is a sensible thing you could do: Either you show that the confidence intervals of the two log likelihoods do not overlap for N<=1000 samples, in which case we conclude that the log likelihoods really are different (and one bigger than the other), or show that the confidence intervals overlap at even N=1000 samples, in which case we conclude that the log likelihoods are indistinguishable with respect to 1000 samples. Some of you might find that 1000 samples per log likelihood will take too long to obtain, in which case you could try using fewer samples. Using these samples you would either find that the log likelihoods differ, or that they are "indistinguishable with respect to M samples". Finally, note that you *don't have* to follow the above approach. If your previous approach was sensible then you should keep using it. But if you were unsure on how to compare the log likelihoods, consider using this suggestion. Ilya