[Rd] suggested modification to the 'mle' documentation?

Fri Dec 7 21:46:37 CET 2007

On Fri, 7 Dec 2007, Duncan Murdoch wrote:

> On 12/7/2007 8:10 AM, Peter Dalgaard wrote:
>> Ben Bolker wrote:
>>>   At this point I'd just like to advertise the "bbmle" package
>>> (on CRAN) for those who respectfully disagree, as I do, with Peter over
>>> this issue.  I have added a data= argument to my version
>>> of the function that allows other variables to be passed
>>> to the objective function.  It seems to me that this is perfectly
>>> in line with the way that other modeling functions in R
>>> behave.
>>>
>> This is at least cleaner than abusing the "fixed" argument. As you know,
>> I have reservations, one of which is that it is not a given that I want
>> it to behave just like other modeling functions, e.g. a likelihood
>> function might refer to more than one data set, and/or data that are not
>> structured in the traditional data frame format. The design needs more
>> thought than just adding arguments.
>
> We should allow more general things to be passed as data arguments in
> cases where it makes sense.  For example a list with names or an
> environment would be a reasonable way to pass data that doesn't fit into
> a data frame.
>
>> I still prefer a design based a plain likelihood function. Then we can
>> discuss how to construct such a function so that  the data are
>> incorporated in a flexible way.  There are many ways to do this, I've
>> shown one, here's another:
>>
>>> f <- function(lambda) -sum(dpois(x, lambda, log=T))
>>> d <- data.frame(x=rpois(10000, 12.34))
>>> environment(f)<-evalq(environment(),d)
>
> We really need to expand as.environment, so that it can convert data
> frames into environments.  You should be able to say:
>
> environment(f) <- as.environment(d)
>
> and get the same result as
>
> environment(f)<-evalq(environment(),d)
>
> But I'd prefer to avoid the necessity for users to manipulate the
> environment of a function.  I think the pattern
>
> model( f, data=d )

For working at the general likelihood I think is is better to
encourage the approach of definign likelihood constructor functions.
The problem with using f, data is that you need to mathc the names
used in f and in data, so either you have to explicitly write out f
with the names you have in data or you have to modify data to use the
names f likes -- in the running example think

     f <- function(lambda) -sum(dpois(x, lambda, log=T))
     d <- data.frame(y=rpois(10000, 12.34))

somebody has to connext up the x in f with the y in d. With a negative
log likelihood constructor defines, for example, as

     makePoisonNegLogLikelihood <- function(x)
         function(lambda) -sum(dpois(x, lambda, log=T))

this happens naturally with

     makePoisonNegLogLikelihood(d$y)

>
> being implemented internally as
>
> environment(f) <- as.environment(d, parent = environment(f))
>
> is very nice and general.  It makes things like cross-validation,
> bootstrapping, etc. conceptually cleaner:  keep the same
> formula/function f, but manipulate the data and see what happens.
> It does have problems when d is an environment that already has a
> parent, but I think a reasonable meaning in that case would be to copy
> its contents into a new environment with the new parent set.

Both (simple) bootstrapping and (simple leave-one-out) crossvalidation
require a data structure with a notion of cases, which is much more
restrictive than the conext in which mle can be used.  A more ngeneric
aproach to bootstrapping that might fit closer to the level of
generality of mle might be parameterized in terms of a negative log
likelihood constructor, a starting value constructor, and a resampling
function, with a single iteration implemented soemthing like

     mleboot1 <- function(nllmaker, start, esample)  {
 	newdata <- resample()
 	newstart <- do.call(start, newdata)
 	nllfun <- do.call(nllmaker, newdata)
 	mle(fnllfun, start = newstart)
     }

This would leave decisions on the resampling method and data structure
up to the user. Somehing similar could be done with K-fold CV.

luke

>
> Duncan Murdoch
>
>
>>> mle(f, start=list(lambda=10))
>>
>> Call:
>> mle(minuslogl = f, start = list(lambda = 10))
>>
>> Coefficients:
>>  lambda
>> 12.3402
>>
>> It is not at all an unlikely design to have mle() as a generic function
>> which works on many kinds of objects, the default method being
>> function(object,...) mle(minuslogl(obj)) and minuslogl is an extractor
>> function returning (tada!) the negative log likelihood function.
>>>   (My version also has a cool formula interface and other
>>> bells and whistles, and I would love to get feedback from other
>>> useRs about it.)
>>>
>>>    cheers
>>>     Ben Bolker
>>>
>>>
>>
>>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu