[R] converting large dataframes to matrix (was: large dataframes in ASCII)
Ott Toomet
siim at obs.ee
Sun Aug 11 17:16:35 CEST 2002
Hi,
True, write.matrix does quite a good job if the data already is in matrix
form. The problem arises using real data (labour force survey in my case),
which includes variables of different storage mode. The dataframe I used
contains mostly integers and factors in character form (most of dataframe
contains NAs, however).
My computer has 128M memory, R (1.5.1) took 52MB when dataframe e2000 was
loaded (7500x1200). Trying to transform it to a matrix
f2000 < as.matrix(e2000)
R grew to 155MB after which I killed the process. So, in this case the
block size does not help much.
Best wishes,
Ott
On Sun, 11 Aug 2002 ripley at stats.ox.ac.uk wrote:
The sort of `large' here is 7500x1200. That's 72Mb if real numbers, so
let's assume you have at least 256Mb to use. I ran the following on
Windows with a 256Mb limit (and I had to use Rdevel to do so). I actually
found it difficult to create a data frame of that size in 256Mb, and
resorted to

A1 < vector("list", 1000)
for(i in 1:1000) A1[[i]] < rnorm(8000)
class(A1) < "data.frame"
row.names(A1) < 1:8000

which took 15 secs and 140Mb as an underhand way to make a data frame.
(1.5.1 took too much memory here.)

Then

A2 < as.matrix(A1)

took 1.8secs (hardly slow) and an additional 64Mb to hold the object A2.
I then deleted A1. Running

write.table(A2, "foo.dat", blocksize=1000)

you mean write.matrix?
used about 150Mb in about four minutes. That is formatting 8 million
numbers, and 85% of the time was spent in the system calls, as one should
expect. (I suspect I did not need to delete A1, but didn't want to wait
around to find out.)

So

1) you could have checked your claims by some simple experiments.

2) as claimed, write.matrix does indeed do the job.
Agree, given there is sufficent memory and/or the data is of homogeneous
storage mode.
.......................................
rhelp mailing list  Read http://www.ci.tuwien.ac.at/~hornik/R/RFAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: rhelprequest at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the Rhelp
mailing list