[R-sig-dyn-mod] Using negative log-likelihood in MCMC optimization (FME Package)

Thu Feb 2 21:14:08 CET 2017

Dear Simeon,

some comments from my side:

1. I wonder if the binomial distribution you want to incorporate,
although being correct, could easily be approximated. I assume that
your model describes the number of susceptible, infected and recovered
persons. I would assume that these numbers follow a Poisson
distribution (which follows from the binomial distribution for small p
and large N). Poisson errors can be approximated by symmetric sqrt(N)
errors. That would take you back to weighted least squares estimation.

2. Your two models seem to be nested. In that case I would apply a
likelihood-ratio test to decide whether the more complex model is
justified.

Best,
Daniel

On Thu, 2017-02-02 at 10:29 -0800, Simeon Lisovski wrote:
> Hi Folks
> 
> I would like to ask for your opinion and advice on the following
> modeling issue.
> 
> I am trying to evaluate the potential of different scenarios to
> explain/predict infection dynamics that has been observed in the
> wild. The scenarios reflect different mechanisms and are translated
> into SIR models. Some of those models are very complex and most
> parameters are unknown (only the approximate range) and therefore, I
> was planing to estimate the parameters using some sort of
> optimization routine. I first tried maximum likelihood, but failed,
> given the complex likelihood surface with many local optimums etc. I
> then tried to use the very neat inverse modeling approach described
> by Soetaert & Petzold (FME package). However, the modCost function
> can only handle gamma distributed on-informative error structures.
> Given my data, I would like to fit the model scenarios using a
> binomial error structure that incorporates different confidence
> intervals (e.g. different sampling effort over time).
> 
> I came up with a way that works really well; using the negative log-
> likelihood to describe the model costs. Yet, I am not entirely happy
> with this approach and I am not even sure whether it’s a legitimate
> way?
> 
> I am also wondering - given the above described approach is OK - how
> to present the results. I could use the negative log-likelihood for
> the best fit of each scenario to describe the relative suitability of
> the scenario (like in the code below; legend in final plot). However,
> I am not sure if people would be satisfied or whether they would
> rather see something like a BIC value?
> 
> Any thoughts are most welcome.
> 
> Thanks,
> Simeon
> 
> Below, a very simple example code that reflects my ideas and how I
> modeled it:
> 
> 
> 
> library(deSolve)
> library(FME)
> 
> 
> ### Model Scenarios ####
> ###
> ### 1st Model: Susceptible-Infectious Model
> ### 2nd Model: Susceptible-Infectious-Recovered Model
> ###
> ### Question: Which model fits better (unknown parameters)?
> ### In other words, the model reflect possible processes and 
> ### I am interested in evaluating their potential to generate
> ### the observed infection pattern.
> 
> 
> ### Step 1: Defining the models
> si.mod <- function(t, x, parms) {
>   S <- x[1]
>   I <- x[2]
>   with(as.list(parms), 
>        {
>          dS <- -parms[1]*S*I
>          dI <- parms[1]*S*I
>          dx <- c(dS, dI)
>          list(dx)
>          })
> }
> 
> 
> sir.mod <- function(t, x, parms) {
>   S <- x[1]
>   I <- x[2]
>   R <- x[3]
>   with(as.list(parms), 
>        {
>          dS <- -parms[1]*S*I
>          dI <- parms[1]*S*I-parms[2]*I
>          dR <- parms[2]*I
>          dx <- c(dS, dI, dR)
>          list(dx)
>        })
> }
> 
> 
> ### Step 2: Simulating observation data using sir.mod
> sir <- ode(y = c(S = 100, I = 1, R = 0), times = 1:20, func =
> sir.mod, parms = c(beta = 6e-3, gamma = 0.175))
> 
> obs <- data.frame(day = 1:20, 
>                   pos = round(sir[,3],0),
>                   neg = round(c(sir[,2]+sir[,4]),0))
> 
> ### plot prevalence
> plot(obs$pos/(obs$pos+obs$neg), type = "b", pch = 16, cex = 2,
>      xlab = "Time", ylab = "Prevalence", las = 2, ylim = c(0, 0.5),
> las = 1)
> ###
> 
> 
> ### Step 3: FME Inverse Modeling using MCMC optimization
> ### 2a - the cost-Function (binomial error does not work with
> modCost)
> ###      alternatively I thought you could use a probabilistic
> approach
> ###      and use the binomial density function one would use in
> ###      maximum likelihood approaches to calculate the log-
> likelihood.
> 
> nll.si <http://nll.si/> <- function(pars) {
>   fit  <- as.data.frame(ode(func = si.mod, y = c(S = obs$neg[1], I =
> ifelse(obs$pos[1]==0,1,obs$pos[1])), 
>                            times = obs$day, parms = pars))
>   prev <- fit[,3]/(fit[,2]+fit[,3])
>   -2*sum(dbinom(obs$pos, obs$neg+obs$pos, ifelse(prev>0.99, 0.99,
> prev), log = T), na.rm = T)
> }
> 
> nll.sir <- function(pars) {
>   fit  <- as.data.frame(ode(func = sir.mod, 
>                             y = c(S = obs$neg[1], I =
> ifelse(obs$pos[1]==0,1,obs$pos[1]), R = 0), 
>                             times = obs$day, parms = pars))
>   prev <- fit[,3]/(fit[,2]+fit[,3]+fit[,4])
>   -2*sum(dbinom(obs$pos, obs$neg+obs$pos, ifelse(prev>0.99, 0.99,
> prev), log = T), na.rm = T)
> }
> 
> ### Run MCMC optimization (and plot the results)
> mcmc1 <- modMCMC(f = nll.si <http://nll.si/>, p = 2e-4, upper = 0.9,
> lower = 5e-10,
>                 niter = 15000, updatecov = 100, outputlength = 2000)
> 
> bfit1 <- ode(func = si.mod, y = c(S = obs$neg[1], I =
> ifelse(obs$pos[1]==0,1,obs$pos[1])), 
>             times = obs$day, parms = mcmc1$bestpar)
> lines(1:20, bfit1[,3]/(bfit1[,2]+bfit1[,3]), lwd = 2, lty = 4, col =
> "firebrick")
> 
> 
> mcmc2 <- modMCMC(f = nll.sir, p = c(2e-4, 0.01), upper = c(0.9, 2),
> lower = c(5e-10, 5e-10),
>                  niter = 15000, updatecov = 100, outputlength = 2000)
> 
> bfit2 <- ode(func = sir.mod, y = c(S = obs$neg[1], I =
> ifelse(obs$pos[1]==0,1,obs$pos[1]), R = 0), 
>              times = obs$day, parms = mcmc2$bestpar)
> lines(1:20, bfit2[,3]/(bfit2[,2]+bfit2[,3]+bfit2[,4]), lwd = 4, lty =
> 3, col = "cornflowerblue")
> 
> 
> legend("topleft", paste(c("SI-Model; Neg. LogLik = ", "SIR-Model;
> Neg. LogLik = "), 
>                          round(c(nll.si <http://nll.si/>(mcmc1$bestpa
> r), nll.sir(mcmc2$bestpar)),2)), 
>        lty = c(2, 3), lwd = 3, col = c("firebrick",
> "cornflowerblue"),
>        pch = NA, bty = "n", y.intersp = 1.5)
> 
> 
> ——
> Simeon Lisovski, Ph.D.
> Postdoctoral Research Fellow
> University of California, Davis
> Department of Neurobiology, Physiology and Behavior
> 
> Mail: simeon.lisovski at gmail.com <mailto:simeon.lisovski at gmail.com>
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-dynamic-models mailing list
> R-sig-dynamic-models at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-dynamic-models