[R] Slow reshape from 5x600000 to 6311 x 132

Christian Schulz ozric at web.de
Fri Mar 5 08:28:43 CET 2004


Hi,

my reshape's  from ~1.4 million obs. to  ~150.00 obs. & 50 attr. goes 
surprinsing fast (1-2 miniutes), but is less complex then yours. Perhaps it 
is faster if you have no character.string as value - if it's
possible for your data?

Reshaping in the database is possible with
innerselects  ,but i prefer reshape because it take
in the db really long time?

christian
 

Am Freitag, 5. März 2004 05:31 schrieb Christopher Austin-Lane:
> I have a dataset that's a few hundred thousand rows from a database
>
> (read in via dbreadTable).  The database is like:
>  > str(measures)
>
> `data.frame':   609363 obs. of  5 variables:
>   $ vih.id   : int  1 2 3 4 5 6 7 8 9 10 ...
>
>   $ vi.id    : int  1 2 3 4 5 6 7 8 9 10 ...
>
>   $ vih.value: chr  "0" "1989" "0" "N/A" ...
>
>   $ vih.date : chr  "20040226012314" "20040226012315" "20040226012315"
> "20040226012315" ...
>
>   $ vih.run.n: int  1 1 1 1 1 1 1 1 1 1 ..
> I'm reshaping it to be like
>
>  > str(better)
>
> `data.frame':   132 obs. of  6311 variables:
>   $ vih.run.n     : int  1 2 4 5 6 7 8 9 10 11 ...
>   $ vih.value.1   : chr  "0" "0" "0" "0" ...
>   $ vih.value.2   : chr  "1989" "1989" "1989" "1989" ...
>   $ vih.value.3   : chr  "0" "0" "0" "0" ...
>   $ vih.value.4   : chr  "N/A" "N/A" "N/A" "N/A" ...
>   $ vih.value.5   : chr  "3163979" "3163979" "3163979" "3163979" ...
>   $ vih.value.6   : chr  "5500073" "5500073" "5500073" "5500073" ...
>
> (etc., etc.)
>
> This takes about 4-8 hours to accomplish.  Should I
>
> a) try to put it into the wide format row by row as I get the data from
> the DB instead of using dbReadTable,
>
> or
>
> b) try to tune something in R?  (I'm trying it now with  R
> --min-vsize=600M --min-nsize=6M although it's not seeming fast; I won't
> know if it's faster for a while).
>
> (Using home compiled R 1.8.1 on Mac OS X 10.3.2, under emacs/ESS,
> although my R 1.8.1 on Solaris 2.8 has been churning for a few hours as
> well (on a split of the data that is 630 variables by 1000 obs).
>
> --Chris
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html




More information about the R-help mailing list