[R] Segmentation Fault with large dataframes and packages using rJava
Sebastian Salentin
sebastian.salentin at biotec.tu-dresden.de
Thu May 26 11:49:09 CEST 2016
Dear all,
I have been trying to perform machine learning/feature selection tasks
in R using various packages (e.g. mlr and FSelector).
However, when giving larger data frames as input for the functions, I
get a segmentation fault (memory not mapped).
This happened first when using the mlr benchmark function with
dataframes in the order of 200 rows x 10,000 columns (all integer values).
I prepared a minimal working example where I get a segmentation fault
trying to calculate the information gain with FSelector package.
require("FSelector")
# Random dataframe 200 rows * 25,000 cols
large.df <- data.frame(replicate(25000,sample(0:1,200,rep=TRUE)))
weights <- information.gain(X24978~., large.df)
print(weights)
I am using R version 3.3.0 64-bit on Ubuntu 14.04.4 LTS with FSelector
v0.20 and rJava v0.9.8 on a machine with 32 core Intel i7 and 250 GB
Ram. Java is OpenJDK 1.7 74bit.
I would highly appreciate if you could give me any hint on how to solve
the problem.
Best
ssalentin
--
Sebastian Salentin, PhD student
Bioinformatics Group
Technische Universität Dresden
Biotechnology Center (BIOTEC)
Tatzberg 47/49
01307 Dresden, Germany
More information about the R-help
mailing list