[R] The 'data' argument and scoping in nls

Fri Sep 26 18:01:27 CEST 2008

Hi Everyone,

I seek guidance to avoid wasting a lot of time and doing things badly. 
Several times I've solved my problems, only to find that my solutions were 
clumsy and not robust. (see "nested" getInitial calls; variable scoping 
problems: Solved?? 
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/139943.html for one truly 
horrible approach). I'm sure that I'm not the first to address these issues, 
but I haven't found clear guidance - I've read lost of relevant help pages, 
but nothing on a strategic level, what general approach should I use? Can 
someone point me in the right direction?

I'm using nls to fit models where some non-optimised parameters are 
different lengths from the data, often matrices, something like this ...
----------
SSfunc <- selfStart(
   model = function(x, Coeff, A)
   {
   },
   initial = function(mCall, data, LHS)
   {
   },
   parameters = c("Coeff")
                )

y <- ...
x <-...
A <- ...
nls(y ~ SSfunc(x, Coeff, A), data=...)
--------------------
... where A may be a matrix. This means that A cannot be stored in a 
data.frame with y and x, so I can't use (for example)
model.frame(y ~ SSfunc(x, Coeff, A), data=...  )

I've found one solution by noticing that (in nls, lm, ...) 'data' can be a 
list, so I can store objects of different lengths. That leads to my first 
question(s):

Q1) ?getInitial and ?sortedXyData suggest that 'data' must be "a data 
frame". I think this limitation isn't real. Can I safely use a list (or an 
environment or ??) for 'data'? Or is this going to "break" something (e.g. 
if I end up passing this data onto selfStart functions provided with R)?

I've more scoping problems. I've got selfStart functions whose initial 
functions call nls or GetInitial on other selfStart functions (and so on). 
I'm having trouble making sure everything (e.g. 'A' in the example) gets 
passed on. Now I could add things to the data list as it is passed on, 
something like this...
-----------------------
initial = function(mCall, data, LHS)
{
  # identify formula variables other than parameters (Coeff in this example)
  Vnames <- all.vars(as.call(mCall))[!(all.vars(as.call(mCall)) %in% 
as.character(mCall[["Coeff"]]))]
  # list their values, checking first in data then in parent.frame
  evaln <- function(x,...) eval(as.name(x), ...)
  data <- lapply(Vnames, evaln, envir=data, enclos=parent.frame())
  names(data) <- Vnames
     :
  # other processing ending up in a call to another selfStart
  # e.g.  getInitial( .. ~ ssB( ....), data = data

},
-----------------
... where if a variable isn't in the data I look in parent.frame() and add 
it to the data. The problem is, parent.frame() may not be the right place to 
look, especially if the initial function has been called via getInitial or 
nls.

Now, I've recently noticed in ?lm that "If not found in data, the variables 
are taken from environment(formula), typically the environment from which lm 
is called." I hadn't been aware that a formula had an environment :-0, which 
leads me to more questions

Q2) Does nls automagically check environment(formula) in the same way as lm? 
I guess getInitial doesn't because the initial function doesn't have formula 
available (except as LHS and mCall)

Q3) Would it be better to manipulate environments rather than lists?
    e.g I could pass environment(formula) as data to make it available to 
the inital function
           nls(formula,  data=environment(formula), )
    getInitial(formula,  data=environment(formula), )
making sure that any variable needed later in the chain were created/copied 
into environment(formula) rather than a list.

I'm getting into waters which are a bit too deep for me. Can anyone point me 
in the right direction?

Thanks in advance,

Keith Jewell