[R] Saving fits (glm, nls) without data

Tue Sep 7 17:39:51 CEST 2010

On Sep 7, 2010, at 11:02 AM, Johann Hibschman wrote:

> Is there any package that assists in saving and reconstituting glm and
> nls fits without bringing along the accompanying data?  A quick search
> on CRAN didn't turn up anything.
>
> If not, how do other people deal with saving the coefficients of model
> fits?
>
> For example, I've run a glm fit that has 23 coefficents on data set  
> that
> had 193,008 rows, by the time the fit was called.  When I save the
> resulting fit object, I get a 491 MB object, which suggests that it's
> pulling along all sorts of junk in the environment, as 23*193k*8 is  
> only
> 34 MB.  Even so, I would prefer to only save the coefficients

Have you read through the Value section of glm's help page?

...and

?coef

> and the
> Hessian, not the fit data set.

I'm not sure about whether there will be a Hessian in a glm object.  
Have you run str() on your objects. It's likely that the residuals,  
fitted.values, weights, prior.weights, and linear.predictors are going  
to be fairly large. You could use lapply to run object.size to see  
whether I have missed any. When I do that on hte first help page  
example, it is the model component that is the second largest, but its  
inclusion is optional. The largest compenent is "family" but I suspect  
that is a family of functions and would not increase in size with  
larger models.
>
> Is there anything I can do?  If I want to save several fits, 490 MB a
> shot starts to add up very quickly.  If I just save the  
> coefficients, I
> have to manually hack up an object that I can then run 'predict' on  
> when
> I want to evaluate the model, and that feels very error-prone.

The predict.glm function is visible so you can just type its name to  
see the code. It appears that the section of the code that does the  
work is fairly short. This is my nomination for what happens in most  
cases:

if (!se.fit) {# not generally invoked with se.fit=TRUE
         }
         else {
             pred <- predict.lm(object, newdata, se.fit, scale = 1,
                 type = ifelse(type == "link", "response", type),
                 terms = terms, na.action = na.action)
             switch(type, response = {
                 pred <- family(object)$linkinv(pred)
             }, link = , terms = )
         }

So maybe you should write a predict function that would work on a  
reduced glm object that has a class name of your choosing.

-- 

David Winsemius, MD
West Hartford, CT