Maximum Likelihood and Maximum A-Posteriori

NLL <- function(x, mu, sigma){
  # How consistent are the observations in x
  # with x ~ N(mu, sigma)
  # Here we are computing the Negative Log-Likelihood (NLL)
  # lower NLL => higher likelihood => data is more consistent
  # with model
  -sum(log(dnorm(x, mu, sigma)))
}

This function itself returns a function which takes in params and returns the NLL of the data x (which is fixed once given to make.my.nll) under the parameters params.

make.my.nll <- function(x){
  function(params){
    NLL(x, params[1], params[2])
  }
}

nll <- make.my.nll(x) #deferred evaluation
nll <- function(params){
    NLL(x, params[1], params[2])
  }

Let’s now generate some data:

x <- rnorm(n = 5000, mean = 1, sd = 2)

Now, let’s find the parameters with which the data is the most consistent:

my.nll <- make.my.nll(x)
nlminb(c(0, 1), my.nll)$par

## [1] 0.9948869 2.0305297

This works, but doesn’t work great, because dnorm gives an imprecise probability density for very small densities. The following uses the log = TRUE argument to dnorm to get the log-probability.

NLL.c <- function(x, mu, sigma){
  # Like NLL, but computing log(dnorm)
  -sum(dnorm(x, mu, sigma, log = TRUE))
}
make.my.nll.c <- function(x){
  my.nll <- function(params){
    NLL.c(x, params[1], params[2])
  }
}

x <- rnorm(n = 1000, mean = 1, sd = 2)
my.nll <- make.my.nll.c(x)
nlminb(c(0, 1), my.nll)$par

## [1] 1.044025 1.949285

Now, let’s optimize \(P(\mu)P(x | \mu, \sigma)\), with the prior \(\mu \sim N(0, 1)\)

make.my.posterior <- function(x, mean.mu, sigma.mu){
  function(params){
    NLL.c(x, params[1], params[2]) - dnorm(params[1], mean = mean.mu, sd = sigma.mu, log = TRUE)
  }
}

x <- rnorm(n = 1000, mean = 1, sd = 2)
my.post <- make.my.posterior(x, -10, 1)
nlminb(c(0, 1), my.post)$par

## [1] 0.9180435 2.0343655

Prior-Data conflicts

First, let’s make a function that plots the data likelihood (for \(\sigma = 1\)) as well as the posterior, for different value of \(\mu\).

plot.lik.posterior <- function(x, mean.mu, sigma.mu){
  possible.mus <- seq(-10, 10, 0.01)
  sigmas <- rep(1, length(possible.mus))
  possible.params <- mapply(c, possible.mus, sigmas, SIMPLIFY = FALSE)
  my.log.post <- make.my.posterior(x, mean.mu, sigma.mu)
  my.nll <- make.my.nll.c(x)
  
  df <- data.frame(mu = possible.mus,
                   log.likelihood = -sapply(X = possible.params, FUN = my.nll),
                   log.posterior = -sapply(X = possible.params, FUN = my.log.post))
  
  
  df.long <- df %>% pivot_longer(cols = c(log.likelihood, log.posterior), names_to = "f", values_to = "log.p")
  
  
  point.ests <- df.long %>% group_by(f) %>% summarize(mu = mu[which.max(log.p)])
  
  ggplot(data = df.long, mapping = aes(color = f)) + 
    geom_line(mapping = aes(x = mu, y = log.p)) + 
    geom_vline(data = point.ests, mapping = aes(xintercept = mu, color = f), linetype = "dashed")
    
    
  
}

On one extreme, we have very little data, and a strong prior (a strong prior means that we’re pretty sure about the value of e.g. \(\mu\)).:

x <- rnorm(n = 2, mean = 1, sd = 2)
plot.lik.posterior(x = x, mean.mu = 10, sigma.mu = 0.1)

On another extreme, we have a lot of data, and a (relatively) weak prior:

x <- rnorm(n = 100, mean = 1, sd = 2)
plot.lik.posterior(x = x, mean.mu = 10, sigma.mu = 0.1)

An intermediate situation is when we have prior and data that are in conflict that is not resolved. This won’t happen with Gaussian priors and likelihoods, but will sometimes happen with other models.

lik.plot <- function(probs, lik){
  df <- data.frame(prob = probs,
                   lik = lik)
  ggplot(df, mapping = aes(x = prob, y = lik)) + 
    geom_line() + 
    geom_vline(mapping = aes(xintercept = probs[which.max(lik)]), color = "green")
}

probs <- seq(0.001, 1, 0.002)
prior <- dnorm(x = probs, mean = 0.5, sd = 0.05) + 0.00000001
prior <- prior/sum(prior)

posterior <- function(r, prob, prior){
  lik <- exp(log(probs)*sum(r) + log(1-probs)*(sum(1-r)))
  post <- lik * prior
  post/sum(post)
}

set.seed(100)
r <- rbinom(n = 50, size = 1, prob = 0.95) # mean(r) == 0.98
prior <- dnorm(x = probs, mean = 0.5, sd = 0.05) + 0.0000001
prior <- prior/sum(prior)

lik.plot(probs, posterior(r, probs, prior))