[R-SIG-Mac] How to overcome Java error?
simon.urbanek at r-project.org
Thu Oct 13 17:05:14 CEST 2011
On Oct 13, 2011, at 5:45 AM, Luca Meyer wrote:
> I have to upload data from more than 200 separated excel pages and I am using the read.xlsx function in the xlsx package. Each sheet is an articulated page (made of more than one table plus extra data) and I need to load into a R data frame the different elements I find in each page. This procedure needs to be repeated several times - i.e. on more than one excel file.
> Initially I did not have problems and the script run just fine providing the desired data frame. After running a few hundreds sheets now I get this error after I upload each page:
> Errore in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
> java.lang.OutOfMemoryError: Java heap space
> Has this to do with some sort of cache that is being filled up? Can anyone suggest a solution to this error?
Well, you're running out of memory on the Java side. First, did you run gc()? You should make sure that you don't keep any references around unnecessarily, since memory cannot be released until they are collected.
I did run a quick test with rJava memory profiling enabled and I see no leaks in rJava:
ginaz:~$ R -e 'library(xlsx); d=read.xlsx("/Users/urbanek/Downloads/Acme Coffin Company.xls",1); gc()' | ./mem.match
Loading required package: xlsxjars
Loading required package: rJava
SUM[Leaked objects]: 0
SUM[Used objects]: 253
So if you see any issues even after running gc(), it will be hard to trace. It could be in the Java code used by xlsx (which I don't know how to trace) or in theory some local places in the infrastructure that are not covered by the memory traces (the above counts Java objects referenced from R). You may have some luck using java.lang.management to look into Java usage. Also you can force Java to run its own garbage collector using
but I doubt that it will help, since Java would have run it before running out of memory.
Finally, you could increase the Java heap if you have enough memory - for example to use 2Gb:
but that's just delaying the problem.
So, again, make sure you run gc() after you're done with each worksheet and see if the problem persists.
More information about the R-SIG-Mac