[R] Using $ accessor in GAM formula
Berwin A Turlach
Berwin.Turlach at gmail.com
Fri May 6 11:53:33 CEST 2011
G'day Rolf,
On Fri, 06 May 2011 09:58:50 +1200
Rolf Turner <rolf.turner at xtra.co.nz> wrote:
> but it's strange that the dodgey code throws an error with gam(dat1$y
> ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s))
> Something a bit subtle is going on; it would be nice to be able to
> understand it.
Well,
R> traceback()
3: eval(expr, envir, enclos)
2: eval(inp, data, parent.frame())
1: gam(dat$y ~ s(dat$x))
So the lines leading up to the problem seem to be the following from
the gam() function:
vars <- all.vars(gp$fake.formula[-2])
inp <- parse(text = paste("list(", paste(vars, collapse = ","),
")"))
if (!is.list(data) && !is.data.frame(data))
data <- as.data.frame(data)
Setting
R> options(error=recover)
running the code until the error occurs, and then examining the frame
number for the gam() call shows that "inp" is
"expression(list( dat1,x ))" in your first example and
"expression(list( dat2,s ))" in your second example. In both
examples, "data" is "list()" (not unsurprisingly). When,
dl <- eval(inp, data, parent.frame())
is executed, it tries to eval "inp", in both cases "dat1" and "dat2"
are found, obviously, in the parent frame. In your first example "x" is
(typically) not found and an error is thrown, in your second example an
object with name "s" is found in "package:mgcv" and the call to eval
succeeds. "dl" becomes a list with two components, the first being,
respectively, "dat1" or "dat2", and the second the body of the function
"s". (To verify that, you should probably issue the command
"debug(gam)" and step through those first few lines of the function
until you reach the above command.)
The corollary is that you can use the name of any object that R will
find in the parent frame, if it is another data set, then that data
set will become the second component of "inp". E.g.:
R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
R> gam(dat$cf ~ s(dat$min))
Family: gaussian
Link function: identity
Formula:
dat$cf ~ s(dat$min)
Estimated degrees of freedom:
3.8925 total = 4.892488
GCV score: 0.002704789
Or
R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(100,0,.05))
R> gam(dat$cf ~ s(dat$BOD))
Family: gaussian
Link function: identity
Formula:
dat$cf ~ s(dat$BOD)
Estimated degrees of freedom:
3.9393 total = 4.939297
GCV score: 0.002666985
> Just out of pure academic interest. :-)
Hope your academic curiosity is now satisfied. :)
HTH.
Cheers,
Berwin
========================== Full address ============================
A/Prof Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019) +61 (8) 6488 3383 (self)
The University of Western Australia FAX : +61 (8) 6488 1028
35 Stirling Highway
Crawley WA 6009 e-mail: Berwin.Turlach at gmail.com
Australia http://www.maths.uwa.edu.au/~berwin
More information about the R-help
mailing list