[R] Error: cannot allocate vector of size...
Larry Hotchkiss
larryh at udel.edu
Wed Nov 11 17:14:33 CET 2009
Hi,
I'm responding to the question about storage error, trying to read a 3000000 x 100 dataset into a data.frame.
I wonder whether you can read the data as strings. If the numbers are all one digit, each cell would require just 1 byte instead of 8. That makes 300MB instead of 2.4GB. You can run crosstabs on the character values just as easily as if they were numeric. If you need numeric values, convert them a few at a time using as.numeric(). Here's an example --
# Generate some data and write it to a text file
v <- rnorm(5,0,0.7); C_xx <- diag(v^2)+v%o%v
C_xx
mu <- rep(5,5)
X.dat <- data.frame(round(mvrnorm(250, mu, C_xx)))
head(X.dat)
write.table(X.dat, "X.dat")
# Read the data using scan, convert it to a data.frame
Xstr.dat <- matrix(scan("X.dat", what="character", skip=1), 250, byrow=TRUE)
Xstr.dat <- as.data.frame(Xstr.dat[,2:6], stringsAsFactors=FALSE)
head(Xstr.dat)
# Run a crosstab
attach(Xstr.dat)
table(V1, V2)
Probably you do not need the option "stringsAsFactors=FALSE". Without it, the strings are converted to factors. Probably that does not change the amount of storage required.
Larry Hotchkiss
------------------------------------------------------------------------------------
Message: 6
Date: Tue, 10 Nov 2009 04:10:07 -0800 (PST)
From: maiya <maja.zaloznik at gmail.com>
Subject: [R] Error: cannot allocate vector of size...
To: r-help at r-project.org
Message-ID: <26282348.post at talk.nabble.com>
Content-Type: text/plain; charset=us-ascii
I'm trying to import a table into R the file is about 700MB. Here's my first
try:
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
In addition: Warning messages:
1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
Then I tried
> memory.limit(size=4095)
and got
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 11.3 Mb
but no additional errors. Then optimistically to clear up the workspace:
> rm()
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb?
I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable
memory is usually 2Gb. Surely they mean GB?
The file I'm importing has about 3 million cases with 100 variables that I
want to crosstabulate each with each. Is this completely unrealistic?
Thanks!
Maja
--
More information about the R-help
mailing list