[R] Large file size while persisting rpart model to disk

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Feb 4 15:38:04 CET 2009


You need to set the environment to the rpart namespace, at least in 
the print etc functions attached to the return object.  One reason is 
that formatg needs to be found, and in principle that has to be right 
as you are just going a couple of steps up the environment chain.

I've not had time to look further.

On Wed, 4 Feb 2009, Terry Therneau wrote:

>  In R, functions remember their entire calling chain.  The good thing about
> this is that they can find variables further up in the nested context, i.e.,
>    mfun <- function(x) { x+y}
> will look for 'y' in the function that called myfun, then in the function that
> called the function, .... on up and then through the search() list.  This makes
> life easier for certain things such as minimizers.
>
>  The bad thing is that to make this work R has to remember all of the variables
> that were available up the entire chain, and 99-100% of them aren't necessary.
> (Because of constructs like get(varname) a parser can't read the code to decide
> what might be needed).

Actually, it does almost no work to remember them.  The work comes 
only when searches fail (more to search) and in save(), the issue 
here.

>  This is an issue with embedded functions.  I recently noticed an extreme case
> of it in the pspline routine and made changes to fix it.  The short version
>  	pspline(x, ...other args) {
>  		some computations to define an X matrix, which can be large
>  		define a print function
>  		...
>  		return(X, printfun, other stuff)
>  		}
> It's even worse in the frailty functions, where X can be VERY large.
> The print function's environment wanted to 'remember' all of the temporary work
> that went into defining X, plus X itself and so would be huge.  My solution was
> add the line
> 	environment(printfun) <- new.env(parent=baseenv())
> which marks the function as not needing anything from the local environment,
> only the base R definitions.  This would probably be a good addition to rpart,
> but I need to look closer.
>   My first cut was to use emptyenv(), but that wasn't so smart.  It leaves
> everything undefined, like "+" for instance. :-)
>
>   	Terry Therneau
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list