[R] Large file size while persisting rpart model to disk
Terry Therneau
therneau at mayo.edu
Wed Feb 4 20:27:40 CET 2009
Lots of interesting comments while I was off in meetings. (Some days I wonder
why they pay me - with so many meetings I certainly don't accomplish any work.)
Some responses:
1. To Brian: I think that there is another issue outside of save(). Use the
frailty.gamma function as a thought example. It's about 3 pages long with lots
and lots of temporary variables and computations, at the end of which it returns
an X matrix of data and a stack of attributes. One of these is a print
function. Some of the temp objects can be really large, large enough that
memory recovery may be important. Does not the reference of these in an
environment prevent R from reclaiming that memory during the session?
2. Duncan: You objected to my phrase
mfun <- function(x) { x+y}
will look for 'y' in the function that called myfun, then in the function that
called the function, .... on up and then through the search() list. This makes
life easier for certain things such as minimizers.
I was writing for ordinary mortals, reading code. The distinction you raise
between the code and the "current instance of memory objects when the code was
being executed" is opaque to many. At least its tricky for me.
3. On removing variables: I don't like that idea, and think it is much much
clearer to exlicitly refer to what you do want than to remove what you don't. I
never liked the m$x <- m$y <- m$whozit....... <- NULL construct for that reason,
which was once found in most of the modeling functions.
4. Luke: I've read your code suggestion thrice now, and I understand what you
are doing less on each pass.
Now, two questions for the pros
a. I like Brian's suggestion of using asNamespace('survival'), other than the
help page that expliclty states that I should never ever call said function. If
I don't use any non-exported-from-the-package functions, it seems that
globalenv() is the most clear construct, however.
How do I know what gets saved and what doesn't? We don't want the all the
survival functions to be saved on disk with my object, like local variables
would be.
b. Is there any difference or preference for
environment(printfun) <- asNamespace('survival')
environment(printfun) <- new.env(parent= asNamespace('surivival'))
Terry T.
More information about the R-help
mailing list