[R] The 'data' argument and scoping in nls
Keith Jewell
k.jewell at campden.co.uk
Fri Sep 26 18:01:27 CEST 2008
Hi Everyone,
I seek guidance to avoid wasting a lot of time and doing things badly.
Several times I've solved my problems, only to find that my solutions were
clumsy and not robust. (see "nested" getInitial calls; variable scoping
problems: Solved??
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/139943.html for one truly
horrible approach). I'm sure that I'm not the first to address these issues,
but I haven't found clear guidance - I've read lost of relevant help pages,
but nothing on a strategic level, what general approach should I use? Can
someone point me in the right direction?
I'm using nls to fit models where some non-optimised parameters are
different lengths from the data, often matrices, something like this ...
----------
SSfunc <- selfStart(
model = function(x, Coeff, A)
{
},
initial = function(mCall, data, LHS)
{
},
parameters = c("Coeff")
)
y <- ...
x <-...
A <- ...
nls(y ~ SSfunc(x, Coeff, A), data=...)
--------------------
... where A may be a matrix. This means that A cannot be stored in a
data.frame with y and x, so I can't use (for example)
model.frame(y ~ SSfunc(x, Coeff, A), data=... )
I've found one solution by noticing that (in nls, lm, ...) 'data' can be a
list, so I can store objects of different lengths. That leads to my first
question(s):
Q1) ?getInitial and ?sortedXyData suggest that 'data' must be "a data
frame". I think this limitation isn't real. Can I safely use a list (or an
environment or ??) for 'data'? Or is this going to "break" something (e.g.
if I end up passing this data onto selfStart functions provided with R)?
I've more scoping problems. I've got selfStart functions whose initial
functions call nls or GetInitial on other selfStart functions (and so on).
I'm having trouble making sure everything (e.g. 'A' in the example) gets
passed on. Now I could add things to the data list as it is passed on,
something like this...
-----------------------
initial = function(mCall, data, LHS)
{
# identify formula variables other than parameters (Coeff in this example)
Vnames <- all.vars(as.call(mCall))[!(all.vars(as.call(mCall)) %in%
as.character(mCall[["Coeff"]]))]
# list their values, checking first in data then in parent.frame
evaln <- function(x,...) eval(as.name(x), ...)
data <- lapply(Vnames, evaln, envir=data, enclos=parent.frame())
names(data) <- Vnames
:
# other processing ending up in a call to another selfStart
# e.g. getInitial( .. ~ ssB( ....), data = data
},
-----------------
... where if a variable isn't in the data I look in parent.frame() and add
it to the data. The problem is, parent.frame() may not be the right place to
look, especially if the initial function has been called via getInitial or
nls.
Now, I've recently noticed in ?lm that "If not found in data, the variables
are taken from environment(formula), typically the environment from which lm
is called." I hadn't been aware that a formula had an environment :-0, which
leads me to more questions
Q2) Does nls automagically check environment(formula) in the same way as lm?
I guess getInitial doesn't because the initial function doesn't have formula
available (except as LHS and mCall)
Q3) Would it be better to manipulate environments rather than lists?
e.g I could pass environment(formula) as data to make it available to
the inital function
nls(formula, data=environment(formula), )
getInitial(formula, data=environment(formula), )
making sure that any variable needed later in the chain were created/copied
into environment(formula) rather than a list.
I'm getting into waters which are a bit too deep for me. Can anyone point me
in the right direction?
Thanks in advance,
Keith Jewell
More information about the R-help
mailing list