[R] reshape to wide format takes extremely long
Coen van Hasselt
coenvanhasselt at gmail.com
Thu Sep 2 09:24:21 CEST 2010
Hello,
I have a data.frame with the following format:
> head(clin2)
Study Subject Type Obs Cycle Day Date Time
1 A001101 10108 ALB 44.00000 98 1 2004-03-11 14:26
2 A001101 10108 ALP 95.00000 98 1 2004-03-11 14:26
3 A001101 10108 ALT 61.00000 98 1 2004-03-11 14:26
5 A001101 10108 AST 33.00000 98 1 2004-03-11 14:26
I want to transform this data.frame so that I have "Obs" columns for
each "Type". The full dataset is 45000 rows long. For a short subset
of 100 rows, reshaping takes 0.2 seconds, and produces what I want.
All columns are either numeric or character format (incl. date/time).
> reshape(clin2, v.names="Obs", timevar="Type", direction="wide",idvar=c("Study","Subject","Cycle","Day","Date","Time"),)
Study Subject Cycle Day Date Time Obs.ALB Obs.ALP Obs.ALT Obs.AST
1 A001101 10108 98 1 2004-03-11 14:26 44 95 61 33
11 A001101 10108 1 1 2004-03-12 14:01 41 85 39 33
21 A001101 10108 1 8 2004-03-22 10:34 40 90 70 34
30 A001101 10108 1 15 2004-03-29 09:56 45 97 66
48 [........]
However, when using the same reshape command for the full data.frame
of 45000 rows, it still wasn't finished when run overnight (8 GB RAM +
8 GB swap in use).
The time to process this data.frame from a 100-row subset to a
1000-row subset increases from 0.2 sec to 60 sec.
I would greatly appreciate a advice why the time for reshaping is
increasing exponentially with the nr. of rows, and how I can do this
in an elegant way.
Thanks!
Coen.
More information about the R-help
mailing list