(PR#6955) Re: [Rd] strange apparently data-dependent crash with
large data
Prof Brian Ripley
ripley at stats.ox.ac.uk
Mon Jun 7 19:40:17 CEST 2004
It is not very surprising that the R process might crash once the maximum
memory limit is reached. View anything done in a session after that as
suspect. (The Unix equivalent is often to crash without even telling you
that you are out of memory.)
On Mon, 7 Jun 2004 tplate at blackmesacapital.com wrote:
> I'm consistently seeing R crash with a particular large data set. What's
> strange is that although the crash seems related to running out of memory,
> I'm unable to construct a pseudo-random data set of the same size that also
> causes the crash. Further adding to the strangeness is that the crash only
> happens if the dataset goes through a save()/load() cycle -- without that,
> the command in question just gives an out-of-memory error, but does not crash.
>
> To make this clear, three different versions of the same data consistently
> produce very different behavior:
>
> (1) original data read with read.table: memory error; fail to allocate
> 164062 Kb
> (2) original data through save()/load() cycle: memory error; fail to
> allocate 82031 Kb, followed by crash
> (3) psuedo-random data of same size and similar characteristics: works
> without problem
>
> This is with R-1.9.0 under Windows 2000. I'm not loading any optional
> packages. I get the same crash behavior with R-1.9.0 patched, and R-2.0.0
> alpha, but I didn't test success with the psuedo-random data under those
> programs. (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as
> pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am
> MDT on Jun 7, 2004.) Unfortunately, I don't have sufficient knowledge of
> how to debug memory problems in R to make further progress than I've made
> here, but maybe the following will provide some clues for someone else.
>
> All the following transcripts are from Rgui.exe, with new runs at each
> comment beginning with "###"
>
> ### Read in the data and get a out-of-memory error (but no crash)
> > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
> > X <- read.table("ClassifyTrain.txt", skip=2)
> > X1 <- as.matrix(X)
> > hist(log(X1[,-(1:2)]+1))
> Error: cannot allocate vector of size 164062 Kb
> In addition: Warning message:
> Reached total allocation of 1024Mb: see help(memory.size)
> >
>
> ### Read in the data and save it as a .RData file for faster runs (I
> initially did this for speed,
> ### but this seems to be essential to causing the crash)
> > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
> > X <- read.table("ClassifyTrain.txt", skip=2)
> > X1 <- as.matrix(X)
> > c(class(X1), storage.mode(X1), dim(X1))
> [1] "matrix" "double" "30000" "702"
> > save(list="X1", file="X1.RData")
>
> ### Produce the crash
> > version
> _
> platform i386-pc-mingw32
> arch i386
> os mingw32
> system i386, mingw32
> status
> major 1
> minor 9.0
> year 2004
> month 04
> day 12
> language R
> >
> > load("X1.RData")
> > c(class(X1), storage.mode(X1), dim(X1))
> [1] "matrix" "double" "30000" "702"
> > # all of the following 3 command consistently cause a crash
> > hist(log(X1[,-(1:2)]+1))
> > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5))
> > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F)
> Error: cannot allocate vector of size 82031 Kb
> In addition: Warning message:
> Reached total allocation of 1024Mb: see help(memory.size)
>
> [message that comes in a Windows dialog box after a wait of many seconds:]
>
> R Console: Rgui.exe - Application Error
> The exception unknown software exception (0xc00000fd) occured in the
> application at location 0x6b5b0a53
>
> #### The following is a failed attempt to reproduce the crash with
> psuedo-random
> #### data, i.e., R functions correctly (even when X1 is in memory)
> >
> > # Look at some characteristics of the original data in
> > # order to produce a matrix of similar psuedo-random numbers.
> > load("X1.RData")
> > dim(X1)
> [1] 30000 702
> > class(X1)
> [1] "matrix"
> > storage.mode(X1)
> [1] "double"
> > table(is.na(X1))
>
> FALSE
> 21060000
> > table(X1==0)
>
> FALSE TRUE
> 2284455 18775545
> > exp(diff(log(table(X1==0))))
> TRUE
> 8.218829
> > table(X1>=0)
>
> TRUE
> 21060000
> > range(X1)
> [1] 0 326022
> > memory.limit()
> [1] 1073741824
> > memory.limit()/2^20
> [1] 1024
> > object.size(X1)/2^20
> [1] 161.0267
> >
> > set.seed(1)
> > X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702)
> > range(X)
> [1] 3.615044e-04 3.249415e+05
> >
> > # Both of thse commands seem to work without problems
> > hist(log(X[,-(1:2)]+1))
> > hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5))
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list