[R] Issue with gc() on Ubuntu 20.04
John Logsdon
j@|og@don @end|ng |rom qu@ntex-re@e@rch@com
Sun Aug 27 20:54:23 CEST 2023
Folks
I have come across an issue with gc() hogging the processor according to
Rprof.
Platform is Ubuntu 20.04 all up to date
R version 4.3.1
libraries: survival, MASS, gtools and openxlsx.
With default gc.auto options, the profiler notes the garbage collector
as self.pct 99.39%.
So I have tried switching it off using options(gc.auto=Inf) in the R
session before running my program using source().
This lowered self.pct to 99.36. Not much there.
After some pondering, I added an options(gc.auto=Inf) at the beginning
of each function, not resetting it at exit, but expecting the offending
function(s) to plead guilty.
Not so although it did lower the gc() time to 95.84%.
This was on a 16 core Threadripper 1950X box so I was intending to use
library parallel but I tried it on my lowly windows box that is years
old and got it down to 88.07%.
The only thing I can think of is that there are quite a lot of cases
where a function is generated on the fly as in:
eval(parse(t=paste("dprob <-
function(x,l,s){",dist.functions[2,][dist.functions[1,]==distn],"(x,l,s)}",sep="")))
I haven't added the options to any of these.
The highest time used by any of my functions is 0.05% - the rest is
dominated by gc().
There may not be much point in parallising the code until I can reduce
the garbage collection.
I am not short of memory and would like to disable it fully but despite
adding to all routines, I haven't managed to do this yet.
Can anyone advise me?
And why is the Linux version so much worse than Windows?
TIA
--
John Logsdon
Quantex Research Ltd
m:+447717758675/h:+441614454951
More information about the R-help
mailing list