[R] problem with convergence in mle2/optim function
Bert Gunter
gunter.berton at gene.com
Fri Oct 5 08:51:27 CEST 2012
Presumably you checked out the CRAN Optimization task view to see if
you could find any joy there, right? (I doubt there is, but ...)
-- Bert
On Thu, Oct 4, 2012 at 10:12 PM, Adam Zeilinger <zeil0006 at umn.edu> wrote:
> Hello R Help,
>
> I am trying solve an MLE convergence problem: I would like to estimate four
> parameters, p1, p2, mu1, mu2, which relate to the probabilities, P1, P2, P3,
> of a multinomial (trinomial) distribution. I am using the mle2() function
> and feeding it a time series dataset composed of four columns: time point,
> number of successes in category 1, number of successes in category 2, and
> number of success in category 3. The column headers are: t, n1, n2, and n3.
>
> The mle2() function converges occasionally, and I need to improve the rate
> of convergence when used in a stochastic simulation, with multiple
> stochastically generated datasets. When mle2() does not converge, it
> returns an error: "Error in optim(par = c(2, 2, 0.001, 0.001), fn = function
> (p) : L-BFGS-B needs finite values of 'fn'." I am using the L-BFGS-B
> optimization method with a lower box constraint of zero for all four
> parameters. While I do not know any theoretical upper limit(s) to the
> parameter values, I have not seen any parameter estimates above 2 when using
> empirical data. It seems that when I start with certain 'true' parameter
> values, the rate of convergence is quite high, whereas other "true"
> parameter values are very difficult to estimate. For example, the true
> parameter values p1 = 2, p2 = 2, mu1 = 0.001, mu2 = 0.001 causes convergence
> problems, but the parameter values p1 = 0.3, p2 = 0.3, mu1 = 0.08, mu2 =
> 0.08 lead to high convergence rate. I've chosen these two sets of values
> because they represent the upper and lower estimates of parameter values
> derived from graphical methods.
>
> First, do you have any suggestions on how to improve the rate of convergence
> and avoid the "finite values of 'fn'" error? Perhaps it has to do with the
> true parameter values being so close to the boundary? If so, any
> suggestions on how to estimate parameter values that are near zero?
>
> Here is reproducible and relevant code from my stochastic simulation:
>
> ########################################################################
> library(bbmle)
> library(combinat)
>
> # define multinomial distribution
> dmnom2 <- function(x,prob,log=FALSE) {
> r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
> if (log) r else exp(r)
> }
>
> # vector of time points
> tv <- 1:20
>
> # Negative log likelihood function
> NLL.func <- function(p1, p2, mu1, mu2, y){
> t <- y$tv
> n1 <- y$n1
> n2 <- y$n2
> n3 <- y$n3
> P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
> mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))) -
> exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
> mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
> 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
> exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
> P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
> mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))) -
> exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
> mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
> 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
> exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
> P3 <- 1 - P1 - P2
> p.all <- c(P1, P2, P3)
> -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
> }
>
> ## Generate simulated data
> # Model equations as expressions,
> P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
> mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))) -
> exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
> mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
> 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
> exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
>
> P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
> mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))) -
> exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
> mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
> 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
> exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
> 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
> sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
>
> # True parameter values
> p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001
>
> # Function to calculate probabilities from 'true' parameter values
> psim <- function(x){
> params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
> eval.P1 <- eval(P1, params)
> eval.P2 <- eval(P2, params)
> P3 <- 1 - eval.P1 - eval.P2
> c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
> }
> pdat <- sapply(tv, psim, simplify = TRUE)
> Pdat <- as.data.frame(t(pdat))
> names(Pdat) <- c("time", "P1", "P2", "P3")
>
> # Generate simulated data set from probabilities
> n = rep(20, length(tv))
> p = as.matrix(Pdat[,2:4])
> y <- as.data.frame(rmultinomial(n,p))
> yt <- cbind(tv, y)
> names(yt) <- c("tv", "n1", "n2", "n3")
>
> # mle2 call
> mle.fit <- mle2(NLL.func, data = list(y = yt),
> start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t),
> control = list(maxit = 5000, factr = 1e-10, lmm = 17),
> method = "L-BFGS-B", skip.hessian = TRUE,
> lower = list(p1 = 0, p2 = 0, mu1 = 0, mu2 = 0))
>
> ###########################################################################
>
> I interpret the error as having to do with the finite difference
> approximation failing. If so, perhaps a gradient function would help? If
> you agree, I've described my unsuccessful attempt at writing a gradient
> function below. If a gradient function is unnecessary, ignore the remainder
> of this message.
>
> My gradient function: I derived the gradient function by taking the
> derivative of my NLL equation with respect to each parameter. My NLL
> equation is the probability mass function of the trinomial distribution.
> Thus the gradient equation for, say, parameter p1 would be:
>
> gr.p1 <- deriv(log(P1^n1), p1) + deriv(log(P2^n2), p1) + deriv(log(P3^n3),
> p1)
>
> This produces a very large equation, which I won't reproduce here. Let's say
> that the four gradient equations for the four parameters are defined as
> gr.p1, gr.p2, gr.mu1, gr.mu2, and all are derived as described above for
> gr.p1. These gradient equations are functions of p1, p2, mu1, mu2, t, n1,
> n2, and n3. My current gradient function is:
>
> grr <- function(p1, p2, mu1, mu2, y){
> t <- y[,1]
> n1 <- y[,2]
> n2 <- y[,3]
> n3 <- y[,4]
> gr.p1 <- .......
> gr.p2 <- .......
> gr.mu1 <- .......
> gr.mu2 <- .......
> c(gr.p1, gr.p2, gr.mu1, gr.mu2)
> }
>
> The problem is that I need to supply values for t, n1, n2, and n3 to the
> gradient function, which are from the dataset yt, above. When I supply the
> dataset yt, the function produces a vector of length 4*nrow(yt) = 80. When
> I include it in my mle2() function, I get an error that mle2 (optim)
> requires a vector of length 4. How do I write my gradient function to work
> in mle2()?
>
> Any help would be much appreciated.
>
> Adam Zeilinger
>
> --
> Adam Zeilinger
> Post Doctoral Scholar
> Department of Entomology
> University of California Riverside
> www.linkedin.com/in/adamzeilinger
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list