[R] Large file size while persisting rpart model to disk

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Feb 3 12:56:58 CET 2009


On Tue, 3 Feb 2009, tan wrote:

> I am using rpart to build a model for later predictions. To save the
> prediction across restarts and share the data across nodes I have been
> using "save" to persist the result of rpart to a file and "load" it
> later. But the saved size was becoming unusually large (even with
> binary, compressed mode). The size was also proportional to the amount
> of data that was used to create the model.
>
> After tinkering a bit, I figured out that most of the size was because
> of the rpart$functions attribute. If I set it to NULL, the size seems
> to drop dramatically. It can be seen with the following lines of R
> code, where there is a difference, though it is small. The difference
> is more pronounced with large datasets.
>
> library(rpart)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> save(fit, file="fit1.sav")
> fit$functions <- NULL
> save(fit, file="fit2.sav")
>
> What is the reason behind it? The functions themselves seem small, so
> where it all the bulk coming from?

Their environments.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list