[Rd] options(keep.source = TRUE) -- also for "library(.)" ?
Martin Maechler
Martin Maechler <maechler@stat.math.ethz.ch>
Fri, 28 Apr 2000 18:19:50 +0200 (CEST)
I'm replying to myself once more :
[and this gets more and more envolved, please "d" if you're not interested ..]
>>>>> "MM" == Martin Maechler <maechler@stat.math.ethz.ch> writes:
MM> (and I haven't seen more feedback..)
>>>>> "PD" == Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk> writes:
........
MM> Of course we now could even make
MM> keep.source = getOption("keep.source")
MM> an argument to library(), being propagated to sys.source(..).
MM> I'm considering to commit the necessary changes and add the following to
MM> NEWS [for "R-devel"]
MM>
MM> o library(), require(), and sys.source() have a new argument
MM> ` keep.source = getOption("keep.source") '.
MM>
MM> Hence, by default, functions from all packages (not just base)
MM> `keep their source'.
MM>
MM> Is this okay for everyone ?
Now, I still haven't committed the new code, but I have been using it
myself and made a "big picture statistic" *using* the new code, and gc()
for many packages (actually I've done this for all CRAN packages and more)
to find how much memory is "spilled" by keep.source = TRUE.
Here are the results :
I show the difference in memory usage {Vcells & Ncells, see ?gc & ?Memory}
for interesting packages, only using R builtin and CRAN (non-Devel) packages:
Package Bytes used
additionally with Ncells Vcells
keep.source= TRUE
nlme 2305'364 19023 107659 (actually nlme + nls)
survival5 1066'776 8867 49792
MASS 631'628 5186 29507
mclust 493'512 4349 22936
boot 456'944 3833 21314
ctest 309'288 2406 14502
ts 297'368 2311 13944
cluster 244'120 2270 11298
nls 236'668 1871 11085
wavethresh 218'624 1878 10180
mda 215'944 1878 10046
rpart 203'892 1654 9533
chron 194'640 1735 9038
tseries 183'360 1505 8566
locfit 176'416 1632 8168
tree 166'844 1248 7843
modreg 116'752 989 5442
nnet 98'124 838 4571
splines 85'112 769 3948
mva 79'280 710 3680
lqs 34'116 292 1589
eda 10'860 105 501
zmatrix 7'196 82 327
Devore5 0 0 0 [took this to "test"
I.e., for the nlme() one needs an extra 2.3 MBytes of memory just for
"keep.source = TRUE".
I further investigated a bit how much the "keep.source" of base ``costs''
memory wise.
Note that I still don't know how to turn it off easily for base (Peter ?).
However, I just counted how much "source" is in base :
> length(ob <- ls(pos= match("package:base",search()), all.nam = TRUE))
[1] 1193
> length(fns <- ob[sapply(ob, function(n)is.function(get(n)))])
[1] 1169
> stem(len.src <- sapply(fns, function(n)sum(nchar(attr(get(n),"source")))))
The decimal point is 3 digit(s) to the right of the |
0 | 00000000000000000000000000000000000000000000000000000000000000000000+980
1 | 00000000000112222233333333334444555555666667777778888899
2 | 00000012333334444555556666777788899
3 | 12234477
4 | 15669
5 | 34
6 |
7 | 35
8 |
9 |
10 | 2
(guess *which* is the outlier ;-)
> sum(len.src)
[1] 359964
i.e., only ~360'000 characters.
Now compare this with survival5 which was scoring pretty high above :
> library(survival5, keep.source = TRUE)
> length(ob <- ls(pos= match("package:survival5",search()), all.nam = TRUE))
[1] 117
> length(fns <- ob[sapply(ob, function(n)is.function(get(n)))])
[1] 116
> stem(len.src <- sapply(fns, function(n)sum(nchar(attr(get(n),"source")))))
The decimal point is 3 digit(s) to the right of the |
0 | 00000000011111111111111112222222233344444455555789
1 | 0001122334445567777899
2 | 0001134555555789
3 | 02233445567
4 | 12368
5 | 2478
6 | 14799
7 |
8 | 0
9 |
10 |
11 |
12 |
13 |
14 |
15 | 4
16 |
17 | 3
> sum(len.src)
[1] 235633
i.e. about 2/3 of "base".
(but then base has "source" attributes for much more objects)
Very crude extrapolation would mean that turning off the "keep.source" for
"base" would save about 1.5 MBytes of RAM {I'd guess even more..}
After all this testing, I think what we really want is
"keep.source = FALSE" (including for "base" !)
WHEN working with large data, working on smallish machines,
or for all "batch" processing.
Hence I'd propose
1.
options(keep.source = interactive())
in the default profile
2. {as proposed earlier today -- see below}
provide a command line option to turn it on or off.
------------
PD> The real question is whether we want to have a different mechanism
PD> for controlling whether keep.source is set or not.
MM> right.
PD> Originally it was FALSE for the base library to save space, and
PD> according the same setting was used for other libraries since some
PD> of them are rather large, but later it got flipped to TRUE for
PD> base,
MM> (yes, I'm still wondering...)
PD> and then there is little point in setting it FALSE for packages.
PD> Question is whether anyone would want the old behaviour
PD> back to get more space for analyses?
would be nice if it *was* configurable for base as well;
possibly both via cmd line option
(something like --keepsource / --no-keepsource )
and a setting in Rprofile..
MM> From grepping through the source code, I don't see how it was turned off
MM> for base...
anyone [R-core] ?
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._