[R] reshape is re-ordering my variables
Kevin E. Thorpe
kevin.thorpe at utoronto.ca
Tue Sep 21 21:01:11 CEST 2010
Is it an undocumented (at least I missed it if it's documented) feature
of the reshape function to do numeric variables followed by character?
I ask because that seems to be the case below.
> str(rcw)
'data.frame': 23 obs. of 21 variables:
$ ICU : int 1 18 17 9 22 19 6 16 25 26 ...
$ Q6.RC.1 : chr "SM" "JF" "IW" "MS" ...
$ Q6.FT.RC.1.years : int 0 8 12 3 9 1 5 16 5 5 ...
$ Q6.FT.RC.1.months: int 0 0 0 0 0 0 0 6 0 0 ...
$ Q6.PT.RC.1.years : int 2 0 0 1 2 0 0 0 0 0 ...
$ Q6.PT.RC.1.months: int 0 0 0 0 0 0 0 0 0 0 ...
$ Q6.RC.2 : chr "BA" "ML" "TM" "YL" ...
$ Q6.FT.RC.2.years : int 0 0 7 3 0 99999 0 0 0 0 ...
$ Q6.FT.RC.2.months: int 0 0 0 0 0 99999 0 0 0 0 ...
$ Q6.PT.RC.2.years : int 2 10 2 0 0 99999 0 5 0 0 ...
$ Q6.PT.RC.2.months: int 0 0 0 0 8 99999 1 0 6 6 ...
$ Q6.RC.3 : chr "LL" "TM" "99999" "99999" ...
$ Q6.FT.RC.3.years : int 6 0 99999 99999 99999 99999 0 99999 0 0 ...
$ Q6.FT.RC.3.months: int 0 0 99999 99999 99999 99999 0 99999 0 0 ...
$ Q6.PT.RC.3.years : int 0 8 99999 99999 99999 99999 0 99999 0 0 ...
$ Q6.PT.RC.3.months: int 0 0 99999 99999 99999 99999 1 99999 4 4 ...
$ Q6.RC.4 : chr "99999" "IW" "99999" "99999" ...
$ Q6.FT.RC.4.years : int 99999 0 99999 99999 99999 99999 99999 99999
99999 99999 ...
$ Q6.FT.RC.4.months: int 99999 0 99999 99999 99999 99999 99999 99999
99999 99999 ...
$ Q6.PT.RC.4.years : int 99999 12 99999 99999 99999 99999 99999 99999
99999 99999 ...
$ Q6.PT.RC.4.months: int 99999 0 99999 99999 99999 99999 99999 99999
99999 99999 ...
This data frame needs to be converted to long format with 5 variables
repeating over 4 observations.
> rcl <-
reshape(rcw,idvar="ICU",varying=2:21,direction="long",v.names=c("init","FTy","FTm","PTy","PTm"))
> str(rcl)
'data.frame': 92 obs. of 7 variables:
$ ICU : int 1 18 17 9 22 19 6 16 25 26 ...
$ time: int 1 1 1 1 1 1 1 1 1 1 ...
$ init: int 0 0 0 0 0 0 0 6 0 0 ...
$ FTy : int 0 8 12 3 9 1 5 16 5 5 ...
$ FTm : int 0 0 0 0 0 0 0 0 0 0 ...
$ PTy : int 2 0 0 1 2 0 0 0 0 0 ...
$ PTm : chr "SM" "JF" "IW" "MS" ...
- attr(*, "reshapeLong")=List of 4
..$ varying:List of 5
.. ..$ FTm : chr "Q6.FT.RC.1.months" "Q6.FT.RC.2.months"
"Q6.FT.RC.3.months" "Q6.FT.RC.4.months"
.. ..$ FTy : chr "Q6.FT.RC.1.years" "Q6.FT.RC.2.years"
"Q6.FT.RC.3.years" "Q6.FT.RC.4.years"
.. ..$ PTm : chr "Q6.PT.RC.1.months" "Q6.PT.RC.2.months"
"Q6.PT.RC.3.months" "Q6.PT.RC.4.months"
.. ..$ PTy : chr "Q6.PT.RC.1.years" "Q6.PT.RC.2.years"
"Q6.PT.RC.3.years" "Q6.PT.RC.4.years"
.. ..$ init: chr "Q6.RC.1" "Q6.RC.2" "Q6.RC.3" "Q6.RC.4"
.. ..- attr(*, "v.names")= chr "init" "FTy" "FTm" "PTy" ...
.. ..- attr(*, "times")= int 1 2 3 4
..$ v.names: chr "init" "FTy" "FTm" "PTy" ...
..$ idvar : chr "ICU"
..$ timevar: chr "time"
In the result, the values in the first of the varying variables goes
into the last variable while the other values are shifted left. The
attributes in the result are correct, but the contents of rcl$PTm are
what I expected in rcl$init.
> sessionInfo()
R version 2.11.1 Patched (2010-07-21 r52598)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=en_US
[7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.11.1
--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
More information about the R-help
mailing list