[Rd] overhaul of mle

Fri Jun 11 22:35:55 CEST 2004

Peter Dalgaard wrote:
> Ben Bolker <bolker at zoo.ufl.edu> writes:
> 
> 
>>*** Changed type of fullcoef from "numeric" to "list", and return
>>fullcoef rather than unlist(fullcoef) from mle [couldn't see a
>>rationale for this -- it destroys a lot of the information in fullcoef
>>*and* is a
>>pain, say, when the fixed arguments include a data frame with lots of
>>information in it]
> 
> 
> Wait a minute. How can a likelihood function have an argument that is
> a data frame? I think you're abusing the fixed arguments if you use it
> to pass in data. The natural paradigm for that would be to pass data
> via a closure, i.e. 
> 
> mll <- with(data,
>     function(lambda=1,theta=0)sum(dpois(y, lambda+theta*x, log=TRUE))
> )

>>*** Changed "coef" method to return object at coef, not object at fullcoef
>>[this really seems to be the better choice to me -- I normally want to
>>see the *fitted values* of the MLE, not all the other auxiliary
>>stuff.  Besides, object at fullcoef can be very long, and therefore a
>>nuisance to see in the default show(object) method]
> 
> 
> See above. This was never intended to contain auxiliary stuff (and
> AFAIR has already been changed once in the opposite direction, by Brian)

    OK, I want to hear about this.  My normal approach to writing 
likelihood functions that can be evaluated with more than one data
set is essentially

mll <- function(par1,par2,par3,X=Xdefault,Y=Ydefault,Z=Zdefault) { ... }

where X, Y, Z are the data values that may change from one fitting 
exercise to the next.  This seems straightforward to me, and I always 
thought it was the reason why optim() had a ... in its argument list,
so that one could easily pass these arguments down.  I have to confess 
that I don't quite understand how your paradigm with with() works: if
mll() is defined as you have it above, "data" is a data frame containing
$x and $y, right?  How do I run mle(minuslogl=mll,start=...) for 
different values of "data" (data1, data2, data3) in this case?  Does
it go in the call as mle(minuslogl=mll,start=...,data=...)?  Once I've
found my mle, how do I view/access these values when I want to see
what data I used to fit mle1, mle2, etc.?

   I'm willing to change the way I do things (and teach my students
differently), but I need to see how ... I don't see how what I've
written is an "abuse" of the fixed arguments [I'm willing to believe
that it is, but just don't know why]

>>added a cor method for mle objects -- which just normalizes the
>>variance-covariance matrix to a correlation matrix.  Is this a bad
>>idea/perversion of the cor method?
> 
> 
> Yes, I think so. cov2cor(vcov(ml.obj)) is easy enough.

   OK.  I wasn't aware of cov2cor().

> 
>>changed
>>call$fixed <- fix
>>to
>>call$fixed <- c(fix,eval(call$fixed))
>>for cases where there are non-trivial fixed arguments
> 
> 
> Which there shouldn't be...
>  
> 
>>added "follow" argument to profile: this makes profiling use a
>>continuation method where the starting point for each profile
>>optimization is the previous best-fit solution, rather than the
>>overall MLEs of the parameters.  Actually fairly easy to implement (I
>>think: I haven't really tested that it works on anything hard, just
>>that it doesn't seem to break profiling) -- requires pfit to be
>>assigned globally within onestep() and a few lines of code further
>>down.
> 
> 
> Sounds nice, but surely you don't need a global assignment there? A
> superassign ("<<-") perhaps, but that doesn't need to go to
> .GlobalEnv. 

   OK -- I think that's just my ignorance showing.

>>
>>Added code that allows (1) default arguments (evaluated
>>in the frame of the full coefficient list, with fixed values
>>and starting values substituted and (2) arguments specified in the
>>start list in arbitrary order (which seems like a reasonable
>>expectation since
>>it *is* specified as a list).  The fundamental problem is that optim()
>>loses names
>>of the parameter vector somewhere.
>>Example:
>>
>>x = runif(200)
>>y = 1+x+x^2+rnorm(200,sd=0.05)
>>fn <- function(a,b,z=2,c,d) {
>>    -sum(dnorm(y,mean=a+c*x+d*x^2,sd=exp(b),log=TRUE))
>>}
>>
>>m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1))
>>## fails with "missing argument" warning, about wrong argument
>>m1 = mle(minuslogl=fn,start=list(a=1,b=1,c=1,d=1),fixed=list(z=2))
>>## works
>>m2 = mle(minuslogl=fn,start=list(a=1,d=1,c=1,b=1),fixed=list(z=2))
>>## fails -- coeffs returned in wrong order
> 
> 
> Hmm.. I see the effect with the current version too. Depending on
> temperament, it is the labels rather than the order that is wrong...  

   Hmmm.  By "current version" do you mean you can still
find ways to get wrong answers with my modified version?
Can you send me an example?  Can you think of a better way to fix this?

>>
>>allow for fitting of transformed parameters (exp/log, tanh/atanh =
>>logistic/logit)
> 
> 
> The last one should be trivial, no?
> 
> mll2 <- function(a,b,c,d) mll1(log(a),atan(b),c,d)

   Yes, but I was thinking of something convenient for my students --
syntactic sugar, if you like -- so that fitted values, confidence
intervals, etc., would automatically be displayed on the
original (non-transformed) scale

   cheers,
     Ben Bolker