[R] Error: cannot allocate vector of size...

"Jens Oehlschlägel" oehl_list at gmx.de
Wed Nov 11 14:23:16 CET 2009


For me with ff - on a 3 GB notebook - 3e6x100 works out of the box even without compression: doubles consume 2.2 GB on disk, but the R process remains under 100MB, rest of RAM used by file-system-cache.
If you are under windows, you can create the ffdf files in a compressed folder. For the random doubles this reduces size on disk to 230MB - which should even work on a 1GB notebook.
BTW: the most compressed datatype (vmode) that can handle NAs is "logical": consumes 2bit per tri-bool. The nextmost compressed is "byte" covering c(NA, -127:127) and consuming its name on disk and in fs-cache.

The code below should give an idea of how to do pairwise stats on columns where each pair fits easily into RAM. In the real world, you would not create the data but import it using read.csv.ffdf (expect that reading your file takes longer than reading/writing the ffdf).

Regards


Jens Oehlschlägel



library(ff)
k <- 100
n <- 3e6

# creating a ffdf dataframe of the requires size
l <- vector("list", k)
for (i in 1:k)
  l[[i]] <- ff(vmode="double", length=n, update=FALSE)
names(l) <- paste("c", 1:k, sep="")
d <- do.call("ffdf", l)

# writing 100 columns of 1e6 random data takes 90 sec
system.time(
for (i in 1:k){
  cat(i, " ")
  print(system.time(d[,i] <- rnorm(n))["elapsed"])
  }
)["elapsed"]


m <- matrix(as.double(NA), k, k)

# pairwise correlating one column against all others takes ~ 17.5 sec
# pairwise correlating all combinations takes 15 min
system.time(
for (i in 2:k){
  cat(i, " ")
  print(system.time({
    x <- d[[i]][]
    for (j in 1:(i-1)){
      m[i,j] <- cor(x, d[[j]][])
    }
  })["elapsed"])
}
)["elapsed"]


-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!




More information about the R-help mailing list