[R] Issue with step(): Fails to look for object$model

Bill.Venables at csiro.au Bill.Venables at csiro.au
Sun Feb 15 05:28:35 CET 2009


It's pretty simple.  In R, objects are stored im memory.  If the objects are big, they take up a lot of memory.  If your data frame is big, the objects are big too.

The problem really only arises commonly if you are fitting many models simultaneously and holding them all in memory at once, but it can arise if your data frame is truly humongous, and you are unnecessarily holding multiple copies of it in several fitted model objects. 


Bill Venables
http://www.cmis.csiro.au/bill.venables/ 


-----Original Message-----
From: Juliet Hannah [mailto:juliet.hannah at gmail.com] 
Sent: Sunday, 15 February 2009 2:25 PM
To: Venables, Bill (CMIS, Cleveland)
Cc: adik at ilovebacon.org; r-help at r-project.org
Subject: Re: [R] Issue with step(): Fails to look for object$model

Could you explain how memory problems may arise? I understand that by
using model=FALSE, we can reduce the memory required. But if we did
not, what kinds of problems may arise. I have run into a possible
memory leak that I have not been able to work around (using yags, not
lm). I am trying to gain insight into this problem. Thanks, Juliet

On Sat, Feb 14, 2009 at 10:18 PM,  <Bill.Venables at csiro.au> wrote:
> Without arguing you case I would point out that the data need not be there in l$model:
>
>> l <- lm(y ~ x, data=ex, model = FALSE)
>> l
>
> Call:
> lm(formula = y ~ x, data = ex, model = FALSE)
>
> Coefficients:
> (Intercept)            x
>     0.1310      -0.1736
>
>> l$model
> NULL
>
> Model objects are often huge simply because by default they squirrel away a copy of the original data inside themselves.  One way to avoid memory problems can be only to keep this backup copy within the fitted model object only when you really need to do so, which is not all that often, in fact.
>
> I can see why step() (and allies) do not work the way you think they should.  These functions use the call component of the fitted model object and modify that.  And if the call component says your data are in the data frame 'ex', they take you seriously.  They your word for it.  We're not dealing with Microsoft software here, you know.
>
> Bill Venables
> http://www.cmis.csiro.au/bill.venables/
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Adam D. I. Kramer
> Sent: Sunday, 15 February 2009 12:44 PM
> To: r-help at r-project.org
> Subject: [R] Issue with step(): Fails to look for object$model
>
> Hi,
>
>        I'm playing around with stepwise regression, using the step
> function, and believe I have found a bug (or at least, a strong case for
> improvement):
>
>> ex <- data.frame(y=rnorm(100),x=rnorm(100))
>> l <- lm(y ~ x, data=ex)
>> step(l)
> [output is correct]
>> rm(ex)
>> step(l)
> Start:  AIC=11.79
> y ~ x
>
>        Df Sum of Sq     RSS     AIC
> - x     1     0.120 108.221   9.900
> <none>              108.100  11.789
> Error in inherits(x, "data.frame") : object "ex" not found
>
> ...ex is not found, so step fails. However, all of the necessary data to run
> the step function is present in l$model.
>
> I would also argue that step() *should* use l$model if at all possible, as
> it seems reasonable to expect that ex may undergo changes. Further, step()
> does not appear to test any other columns present in ex (even if
> direction="both" is specified), unless they are specified in scope.
>
> If I am misunderstanding step() or if there is a good reason why it operates
> this way, please let me know!
>
> --Adam
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list