[Rd] model frames and update()

William Dunlap wdunlap at tibco.com
Thu Apr 23 22:24:16 CEST 2015


> "Save the model frame in case you need to refit something next month"
> does not appear to be a safe approach to reproducible research.

Is this a standard recommendation?  It will not work in many cases.  E.g.,
if
you use lm() to model the sum of some variables the model.frame contains
only the sum, not the addends so you cannot later change an addend and refit
the model.
  > d <- data.frame(y1=1:5,y2=sin(1:5),x1=log(1:5))
  > fit <- lm(y1+y2 ~ x1, data=d, model=TRUE)
  > fit$model
     y1 + y2        x1
  1 1.841471 0.0000000
  2 2.909297 0.6931472
  3 3.141120 1.0986123
  4 3.243198 1.3862944
  5 4.041076 1.6094379
(The same happens if you use a function like abs(x) on
the right side of the formula.)


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Apr 23, 2015 at 9:58 AM, Therneau, Terry M., Ph.D. <
therneau at mayo.edu> wrote:

> This issue has arisen within my anova.coxph routine, but is as easily
> illustrated with glm.
>
> testdata <- data.frame(y= 1:5,
>                        n= c(8,10,6,20,14),
>                        sex = c(0,1,0,1,1),
>                        age = c(30,20,35,25,40))
>
> fit <- glm(cbind(y,n) ~ age + sex, binomial, data=testdata, model=TRUE)
> saveit <- fit$model
>
> update(fit, .~. - age, data=saveit)
> Error in cbind(y, n) : object 'y' not found
>
> One would hope that a saved model frame is precisely the thing that would
> work best. The issue of course is that "cbind(y, n)" is the name of the
> first variable in saveit, and it is not being properly quoted somewhere
> down the line.  The same issue can occur on the right hand side.  "Save the
> model frame in case you need to refit something next month" is does not
> appear to be a safe approach to reproducable research.
>
> fit2 <- glm(y ~ sex + log(age), poisson, testdata)
> save2 <- fit2$model
> update(fit2, . ~ . - sex, data=save2)  # fails
> glm(y ~ log(age), poisson, save2)      # fails
>
>
> I can work around this in my anova, but I wanted to not rebuild the frame
> if it is already present.   It looks like model.matrix plus attr(x,
> 'assign') time -- a bit harder to read, but that looks like what anova.glm
> is doing.  Is there a way to make update work?
>
> The current code, BTW, starts by building its own frame using results of
> terms.inner, which solves the above issue nicely and update() works as
> expected.  But it isn't robust to scoping issues.  (As pointed out
> yesterday by a user: lapply of a function that contained coxph followed by
> anova gives a variable not found error.)  It also ignores saved model
> frames; thus the rewrite.
>
> Terry T
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list