[R] Slow reshape from 5x600000 to 6311 x 132
Christopher Austin-Lane
lanstin at aol.net
Fri Mar 5 05:31:02 CET 2004
I have a dataset that's a few hundred thousand rows from a database
(read in via dbreadTable). The database is like:
> str(measures)
`data.frame': 609363 obs. of 5 variables:
$ vih.id : int 1 2 3 4 5 6 7 8 9 10 ...
$ vi.id : int 1 2 3 4 5 6 7 8 9 10 ...
$ vih.value: chr "0" "1989" "0" "N/A" ...
$ vih.date : chr "20040226012314" "20040226012315" "20040226012315"
"20040226012315" ...
$ vih.run.n: int 1 1 1 1 1 1 1 1 1 1 ..
I'm reshaping it to be like
> str(better)
`data.frame': 132 obs. of 6311 variables:
$ vih.run.n : int 1 2 4 5 6 7 8 9 10 11 ...
$ vih.value.1 : chr "0" "0" "0" "0" ...
$ vih.value.2 : chr "1989" "1989" "1989" "1989" ...
$ vih.value.3 : chr "0" "0" "0" "0" ...
$ vih.value.4 : chr "N/A" "N/A" "N/A" "N/A" ...
$ vih.value.5 : chr "3163979" "3163979" "3163979" "3163979" ...
$ vih.value.6 : chr "5500073" "5500073" "5500073" "5500073" ...
(etc., etc.)
This takes about 4-8 hours to accomplish. Should I
a) try to put it into the wide format row by row as I get the data from
the DB instead of using dbReadTable,
or
b) try to tune something in R? (I'm trying it now with R
--min-vsize=600M --min-nsize=6M although it's not seeming fast; I won't
know if it's faster for a while).
(Using home compiled R 1.8.1 on Mac OS X 10.3.2, under emacs/ESS,
although my R 1.8.1 on Solaris 2.8 has been churning for a few hours as
well (on a split of the data that is 630 variables by 1000 obs).
--Chris
More information about the R-help
mailing list