[R] More compact form of lm object that can be used for prediction?

Fri Jul 11 21:02:31 CEST 2008

> From: Marc Schwartz [mailto:marc_schwartz at comcast.net]
> Sent: Friday, July 11, 2008 12:14 PM
> 
> on 07/11/2008 10:50 AM Woolner, Keith wrote:
> > Hi everyone,
> >
> >
> >
> > Is there a way to take an lm() model and strip it to a minimal form
(or
> > convert it to another type of object) that can still used to predict
the
> > dependent variable?
> 
> <snip>
> 
> Depending upon how much memory you need to conserve and what else you
> may need to do with the model object:
> 
> 1. lm(YourFormula, data = YourData, model = FALSE)
> 
> 'model = FALSE' will result in the model frame not being retained.
> 
> 2. lm(YourFormula, data = YourData, model = FALSE, x = FALSE)
> 
> 'x = FALSE' will result in the model matrix not being retained.
> 
> See ?lm for more information.

Marc, 

Thank you for the suggestions.  Though I neglected to mention it, I had
already consulted ?lm and was using model=FALSE.  x=FALSE is the default
setting and I had left it unchanged.

The problem I still face is that the memory usage is dominated by the
"qr" component of the model, consuming nearly 80% of the total
footprint.  Using model=FALSE and x=FALSE saves a little over 4% of
model size, and if I deliberately clobber some other components, as
shown below, I can get about boost that to about 20% savings while still
being able to use predict().

	lm.1$fitted.values <- NULL
	lm.1$residuals     <- NULL
	lm.1$weights       <- NULL
	lm.1$effects       <- NULL

The lm() object after doing so is still around 52 megabytes
(object.size(lm.1) = 51,611,888), with 99.98% of it being used by
lm.1$qr.  That was the motivation behind my original question, which was
whether there's a way to get predictions from a model without keeping
the "qr" component around.  Especially since I want to create and use
six of these models simultaneously.

My hope is to save and deploy the models in a reporting system to
generate predictions on a daily basis as new data comes in, while the
model itself would change only infrequently.  Hence, I am more concerned
with being able to retain the predictive portion of the models in a
concise format, and less concerned with keeping the supporting
analytical detail around for this application.

The answer may be that what I'm seeking to do isn't possible with the
currently available R+packages, although I'd be mildly surprised if
others haven't run into this situation before.  I just wanted to make
sure I wasn't missing something obvious.

Many thanks,
Keith