[R-sig-dyn-mod] Using negative log-likelihood in MCMC optimization (FME Package)

Thu Feb 2 19:29:50 CET 2017

Hi Folks

I would like to ask for your opinion and advice on the following modeling issue.

I am trying to evaluate the potential of different scenarios to explain/predict infection dynamics that has been observed in the wild. The scenarios reflect different mechanisms and are translated into SIR models. Some of those models are very complex and most parameters are unknown (only the approximate range) and therefore, I was planing to estimate the parameters using some sort of optimization routine. I first tried maximum likelihood, but failed, given the complex likelihood surface with many local optimums etc. I then tried to use the very neat inverse modeling approach described by Soetaert & Petzold (FME package). However, the modCost function can only handle gamma distributed on-informative error structures. Given my data, I would like to fit the model scenarios using a binomial error structure that incorporates different confidence intervals (e.g. different sampling effort over time).

I came up with a way that works really well; using the negative log-likelihood to describe the model costs. Yet, I am not entirely happy with this approach and I am not even sure whether it’s a legitimate way?

I am also wondering - given the above described approach is OK - how to present the results. I could use the negative log-likelihood for the best fit of each scenario to describe the relative suitability of the scenario (like in the code below; legend in final plot). However, I am not sure if people would be satisfied or whether they would rather see something like a BIC value?

Any thoughts are most welcome.

Thanks,
Simeon

Below, a very simple example code that reflects my ideas and how I modeled it:

library(deSolve)
library(FME)

### Model Scenarios ####
###
### 1st Model: Susceptible-Infectious Model
### 2nd Model: Susceptible-Infectious-Recovered Model
###
### Question: Which model fits better (unknown parameters)?
### In other words, the model reflect possible processes and 
### I am interested in evaluating their potential to generate
### the observed infection pattern.

### Step 1: Defining the models
si.mod <- function(t, x, parms) {
  S <- x[1]
  I <- x[2]
  with(as.list(parms), 
       {
         dS <- -parms[1]*S*I
         dI <- parms[1]*S*I
         dx <- c(dS, dI)
         list(dx)
         })
}

sir.mod <- function(t, x, parms) {
  S <- x[1]
  I <- x[2]
  R <- x[3]
  with(as.list(parms), 
       {
         dS <- -parms[1]*S*I
         dI <- parms[1]*S*I-parms[2]*I
         dR <- parms[2]*I
         dx <- c(dS, dI, dR)
         list(dx)
       })
}

### Step 2: Simulating observation data using sir.mod
sir <- ode(y = c(S = 100, I = 1, R = 0), times = 1:20, func = sir.mod, parms = c(beta = 6e-3, gamma = 0.175))

obs <- data.frame(day = 1:20, 
                  pos = round(sir[,3],0),
                  neg = round(c(sir[,2]+sir[,4]),0))

### plot prevalence
plot(obs$pos/(obs$pos+obs$neg), type = "b", pch = 16, cex = 2,
     xlab = "Time", ylab = "Prevalence", las = 2, ylim = c(0, 0.5), las = 1)
###

### Step 3: FME Inverse Modeling using MCMC optimization
### 2a - the cost-Function (binomial error does not work with modCost)
###      alternatively I thought you could use a probabilistic approach
###      and use the binomial density function one would use in
###      maximum likelihood approaches to calculate the log-likelihood.

nll.si <http://nll.si/> <- function(pars) {
  fit  <- as.data.frame(ode(func = si.mod, y = c(S = obs$neg[1], I = ifelse(obs$pos[1]==0,1,obs$pos[1])), 
                           times = obs$day, parms = pars))
  prev <- fit[,3]/(fit[,2]+fit[,3])
  -2*sum(dbinom(obs$pos, obs$neg+obs$pos, ifelse(prev>0.99, 0.99, prev), log = T), na.rm = T)
}

nll.sir <- function(pars) {
  fit  <- as.data.frame(ode(func = sir.mod, 
                            y = c(S = obs$neg[1], I = ifelse(obs$pos[1]==0,1,obs$pos[1]), R = 0), 
                            times = obs$day, parms = pars))
  prev <- fit[,3]/(fit[,2]+fit[,3]+fit[,4])
  -2*sum(dbinom(obs$pos, obs$neg+obs$pos, ifelse(prev>0.99, 0.99, prev), log = T), na.rm = T)
}

### Run MCMC optimization (and plot the results)
mcmc1 <- modMCMC(f = nll.si <http://nll.si/>, p = 2e-4, upper = 0.9, lower = 5e-10,
                niter = 15000, updatecov = 100, outputlength = 2000)

bfit1 <- ode(func = si.mod, y = c(S = obs$neg[1], I = ifelse(obs$pos[1]==0,1,obs$pos[1])), 
            times = obs$day, parms = mcmc1$bestpar)
lines(1:20, bfit1[,3]/(bfit1[,2]+bfit1[,3]), lwd = 2, lty = 4, col = "firebrick")

mcmc2 <- modMCMC(f = nll.sir, p = c(2e-4, 0.01), upper = c(0.9, 2), lower = c(5e-10, 5e-10),
                 niter = 15000, updatecov = 100, outputlength = 2000)

bfit2 <- ode(func = sir.mod, y = c(S = obs$neg[1], I = ifelse(obs$pos[1]==0,1,obs$pos[1]), R = 0), 
             times = obs$day, parms = mcmc2$bestpar)
lines(1:20, bfit2[,3]/(bfit2[,2]+bfit2[,3]+bfit2[,4]), lwd = 4, lty = 3, col = "cornflowerblue")

legend("topleft", paste(c("SI-Model; Neg. LogLik = ", "SIR-Model; Neg. LogLik = "), 
                         round(c(nll.si <http://nll.si/>(mcmc1$bestpar), nll.sir(mcmc2$bestpar)),2)), 
       lty = c(2, 3), lwd = 3, col = c("firebrick", "cornflowerblue"),
       pch = NA, bty = "n", y.intersp = 1.5)

——
Simeon Lisovski, Ph.D.
Postdoctoral Research Fellow
University of California, Davis
Department of Neurobiology, Physiology and Behavior

Mail: simeon.lisovski at gmail.com <mailto:simeon.lisovski at gmail.com>
	[[alternative HTML version deleted]]