[Rd] strange apparently data-dependent crash with large data
(PR#6955)
tplate at blackmesacapital.com
tplate at blackmesacapital.com
Mon Jun 7 18:59:27 CEST 2004
I'm consistently seeing R crash with a particular large data set. What's
strange is that although the crash seems related to running out of memory,
I'm unable to construct a pseudo-random data set of the same size that also
causes the crash. Further adding to the strangeness is that the crash only
happens if the dataset goes through a save()/load() cycle -- without that,
the command in question just gives an out-of-memory error, but does not crash.
To make this clear, three different versions of the same data consistently
produce very different behavior:
(1) original data read with read.table: memory error; fail to allocate
164062 Kb
(2) original data through save()/load() cycle: memory error; fail to
allocate 82031 Kb, followed by crash
(3) psuedo-random data of same size and similar characteristics: works
without problem
This is with R-1.9.0 under Windows 2000. I'm not loading any optional
packages. I get the same crash behavior with R-1.9.0 patched, and R-2.0.0
alpha, but I didn't test success with the psuedo-random data under those
programs. (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as
pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am
MDT on Jun 7, 2004.) Unfortunately, I don't have sufficient knowledge of
how to debug memory problems in R to make further progress than I've made
here, but maybe the following will provide some clues for someone else.
All the following transcripts are from Rgui.exe, with new runs at each
comment beginning with "###"
### Read in the data and get a out-of-memory error (but no crash)
> # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
> X <- read.table("ClassifyTrain.txt", skip=2)
> X1 <- as.matrix(X)
> hist(log(X1[,-(1:2)]+1))
Error: cannot allocate vector of size 164062 Kb
In addition: Warning message:
Reached total allocation of 1024Mb: see help(memory.size)
>
### Read in the data and save it as a .RData file for faster runs (I
initially did this for speed,
### but this seems to be essential to causing the crash)
> # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
> X <- read.table("ClassifyTrain.txt", skip=2)
> X1 <- as.matrix(X)
> c(class(X1), storage.mode(X1), dim(X1))
[1] "matrix" "double" "30000" "702"
> save(list="X1", file="X1.RData")
### Produce the crash
> version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 1
minor 9.0
year 2004
month 04
day 12
language R
>
> load("X1.RData")
> c(class(X1), storage.mode(X1), dim(X1))
[1] "matrix" "double" "30000" "702"
> # all of the following 3 command consistently cause a crash
> hist(log(X1[,-(1:2)]+1))
> hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5))
> hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F)
Error: cannot allocate vector of size 82031 Kb
In addition: Warning message:
Reached total allocation of 1024Mb: see help(memory.size)
[message that comes in a Windows dialog box after a wait of many seconds:]
R Console: Rgui.exe - Application Error
The exception unknown software exception (0xc00000fd) occured in the
application at location 0x6b5b0a53
#### The following is a failed attempt to reproduce the crash with
psuedo-random
#### data, i.e., R functions correctly (even when X1 is in memory)
>
> # Look at some characteristics of the original data in
> # order to produce a matrix of similar psuedo-random numbers.
> load("X1.RData")
> dim(X1)
[1] 30000 702
> class(X1)
[1] "matrix"
> storage.mode(X1)
[1] "double"
> table(is.na(X1))
FALSE
21060000
> table(X1==0)
FALSE TRUE
2284455 18775545
> exp(diff(log(table(X1==0))))
TRUE
8.218829
> table(X1>=0)
TRUE
21060000
> range(X1)
[1] 0 326022
> memory.limit()
[1] 1073741824
> memory.limit()/2^20
[1] 1024
> object.size(X1)/2^20
[1] 161.0267
>
> set.seed(1)
> X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702)
> range(X)
[1] 3.615044e-04 3.249415e+05
>
> # Both of thse commands seem to work without problems
> hist(log(X[,-(1:2)]+1))
> hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5))
More information about the R-devel
mailing list