[R] Is profile.mle flexible enough?

Sun Aug 1 09:26:12 CEST 2010

Arnaud Le Rouzic wrote:
> Hi the list,
> 
> I am experiencing several issues with profile.mle (and consequently with 
> confint.mle) (stat4 version 2.9.2), and I have to spend a lot of time to 
> find workarounds to what looks like interface bugs. I would be glad to 
> get feedback from experienced users to know if I am really asking too 
> much or if there is room for improvement.
> 
> * Problem #1 with fixed parameters. I can't manage to get profile-based 
> confidence intervals when some parameters of the likelihood function are 
> fixed:
> 
> -----------8<--------------------------------
> library(stats4)
> 
> minusLogL1 <- function(mu, logsigma2) { N*log(2*pi*exp(logsigma2))/2 + 
> N*(var(x)+(mean(x)-mu)^2)/(2*exp(logsigma2)) }
> minusLogL2 <- function(mu) { logsigma2 <- 0; 
> N*log(2*pi*exp(logsigma2))/2 + 
> N*(var(x)+(mean(x)-mu)^2)/(2*exp(logsigma2)) }
> 
> N <- 100
> x <- rnorm(N, 0, 1)
> 
> fit <- mle(minusLogL1, start=list(mu=0, logsigma2=0))
> confint(fit)
> 
> fit2 <- mle(minusLogL1, start=list(mu=0), fixed=list(logsigma2=0))
> confint(fit2)
> 
> fit3 <- mle(minusLogL2, start=list(mu=0))
> confint(fit3)
> ----------->8--------------------------------
> 
> Is it unreasonable to expect an identical result with fit2 and fit3? 
> When looking into the code of the "profile" method for mle, one can find 
> something like call$fixed <- fix ; i.e. the fixed effects in the call 
> are completely replaced by the combination of fixed effects needed by 
> the profile function. Perhaps I am missing something, but I don't 
> understand why this is necessary.
> 
> * Problem #2 deals with the scope of the variables used in the "call" 
> function. I understand that there are technical constraints, and similar 
> remarks were previously answered on the Microsoft mode ("It's not a bug, 
> it's a feature and you have to cope with it"), but the direct 
> consequence is that the user has to understand the internals of the 
> functions provided by the stats4 package, which does not look like a 
> good idea to me. One of the simplest and most striking example is 
> something like:
> 
> -----------8<--------------------------------
> 
> fit3 <- mle(minusLogL2, start=list(mu=0))
> confint(fit3)
> x <- rnorm(N, 0, 1)
> confint(fit3)
> 
> ----------->8--------------------------------
> 
> Although I understand the technical reason why the result is different, 
> I think such side effects are catastrophic in terms of UI. The user 
> should never have to know whether the information he/she wants is 
> already stored in the object (vcov, coef) or if the function will 
> recompute something.
> 
> Incidentally, such side effects make it very complicated to write 
> wrappers for the mle (which is my actual problem). A straightforward way 
> to solve it would be to store more information, including the data, in 
> the mle object (as many others, e.g. lm, do), but if stats4 has been 
> included among the core packages, it is probably because it was 
> considered stable and flexible enough. Are there any tricks or 
> alternatives to wrap mle calls properly?
>

Well... Issue #1 looks like a programming oversight, the "fixed"
argument was mainly put there to enable profiling, not as a model
building tool. However, it sounds easy enough to maintain a larger set
of fixed variables. Patches will be considered...

Issue #2 is stickier. I think I must say that the idelology of mle() is
that the user passes a likelihood function. If the likelihood function
depends on global data, and the user changes the global data, the user
deserves what he or she gets... It is pretty much impossible for mle()
to take an arbitrary function and detect which data it might depend on.

You can go the bbmle route and pass in the data set as a data frame, but
then you lose the flexibility that a likelihood can depend on data that
doesn't fit within the single rectangular data frame. I think a better
way is just to create likelihood functions that do not depend on global
data, e.g. by keeping data in an environment that is within the scope of
the likelihood function, but not (easily) accessible from elsewhere. In
your case, maybe

minusLogL1 <- local(
{
  N <- 100
  x <- rnorm(N, 0, 1)

  function(mu, logsigma2) {
    N*log(2*pi*exp(logsigma2))/2 +
    N*(var(x)+(mean(x)-mu)^2)/(2*exp(logsigma2))
  }
}

Or maybe

NormalLike <- function(x) {
  force(x)
  N <- length(x)
  function(mu, logsigma2) {
    N*log(2*pi*exp(logsigma2))/2 +
    N*(var(x)+(mean(x)-mu)^2)/(2*exp(logsigma2))
  }
}

x <-rnorm(100, 0, 1)

minusLogL1 <- NormalLike(x)

(The force(x) is not strictly necessary here as the calculation of N
will automatically force evaluation of x, but forgetting to do it can
cause some rather magnificent grief otherwise because of R's lazy
evaluation mechanisms.)

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com