(PR#6955) Re: [Rd] strange apparently data-dependent crash with large data

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jun 7 19:40:17 CEST 2004


It is not very surprising that the R process might crash once the maximum 
memory limit is reached.  View anything done in a session after that as 
suspect.  (The Unix equivalent is often to crash without even telling you 
that you are out of memory.)

On Mon, 7 Jun 2004 tplate at blackmesacapital.com wrote:

> I'm consistently seeing R crash with a particular large data set.  What's 
> strange is that although the crash seems related to running out of memory, 
> I'm unable to construct a pseudo-random data set of the same size that also 
> causes the crash.  Further adding to the strangeness is that the crash only 
> happens if the dataset goes through a save()/load() cycle -- without that, 
> the command in question just gives an out-of-memory error, but does not crash.
> 
> To make this clear, three different versions of the same data consistently 
> produce very different behavior:
> 
> (1) original data read with read.table: memory error; fail to allocate 
> 164062 Kb
> (2) original data through save()/load() cycle: memory error; fail to 
> allocate 82031 Kb, followed by crash
> (3) psuedo-random data of same size and similar characteristics: works 
> without problem
> 
> This is with R-1.9.0 under Windows 2000.  I'm not loading any optional 
> packages.  I get the same crash behavior with R-1.9.0 patched, and R-2.0.0 
> alpha, but I didn't test success with the psuedo-random data under those 
> programs.  (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as 
> pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am 
> MDT on Jun 7, 2004.)  Unfortunately, I don't have sufficient knowledge of 
> how to debug memory problems in R to make further progress than I've made 
> here, but maybe the following will provide some clues for someone else.
> 
> All the following transcripts are from Rgui.exe, with new runs at each 
> comment beginning with "###"
> 
> ### Read in the data and get a out-of-memory error (but no crash)
>  > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
>  > X <- read.table("ClassifyTrain.txt", skip=2)
>  > X1 <- as.matrix(X)
>  > hist(log(X1[,-(1:2)]+1))
> Error: cannot allocate vector of size 164062 Kb
> In addition: Warning message:
> Reached total allocation of 1024Mb: see help(memory.size)
>  >
> 
> ### Read in the data and save it as a .RData file for faster runs (I 
> initially did this for speed,
> ### but this seems to be essential to causing the crash)
>  > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
>  > X <- read.table("ClassifyTrain.txt", skip=2)
>  > X1 <- as.matrix(X)
>  > c(class(X1), storage.mode(X1), dim(X1))
> [1] "matrix" "double" "30000"  "702"
>  > save(list="X1", file="X1.RData")
> 
> ### Produce the crash
>  > version
>           _
> platform i386-pc-mingw32
> arch     i386
> os       mingw32
> system   i386, mingw32
> status
> major    1
> minor    9.0
> year     2004
> month    04
> day      12
> language R
>  >
>  > load("X1.RData")
>  > c(class(X1), storage.mode(X1), dim(X1))
> [1] "matrix" "double" "30000"  "702"
>  > # all of the following 3 command consistently cause a crash
>  > hist(log(X1[,-(1:2)]+1))
>  > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5))
>  > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F)
> Error: cannot allocate vector of size 82031 Kb
> In addition: Warning message:
> Reached total allocation of 1024Mb: see help(memory.size)
> 
> [message that comes in a Windows dialog box after a wait of many seconds:]
> 
> R Console: Rgui.exe - Application Error
> The exception unknown software exception (0xc00000fd) occured in the 
> application at location 0x6b5b0a53
> 
> #### The following is a failed attempt to reproduce the crash with 
> psuedo-random
> #### data, i.e., R functions correctly (even when X1 is in memory)
>  >
>  > # Look at some characteristics of the original data in
>  > # order to produce a matrix of similar psuedo-random numbers.
>  > load("X1.RData")
>  > dim(X1)
> [1] 30000   702
>  > class(X1)
> [1] "matrix"
>  > storage.mode(X1)
> [1] "double"
>  > table(is.na(X1))
> 
>     FALSE
> 21060000
>  > table(X1==0)
> 
>     FALSE     TRUE
>   2284455 18775545
>  > exp(diff(log(table(X1==0))))
>      TRUE
> 8.218829
>  > table(X1>=0)
> 
>      TRUE
> 21060000
>  > range(X1)
> [1]      0 326022
>  > memory.limit()
> [1] 1073741824
>  > memory.limit()/2^20
> [1] 1024
>  > object.size(X1)/2^20
> [1] 161.0267
>  >
>  > set.seed(1)
>  > X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702)
>  > range(X)
> [1] 3.615044e-04 3.249415e+05
>  >
>  > # Both of thse commands seem to work without problems
>  > hist(log(X[,-(1:2)]+1))
>  > hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5))
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list