[R] Long to wide format without time variable
Rolf Turner
r.turner at auckland.ac.nz
Wed Jun 24 04:02:35 CEST 2009
On 24/06/2009, at 9:52 AM, Alan Cohen wrote:
> Hi all,
>
> I am trying to convert a data set of physician death codings (each
> individual's cause of death is coded by multiple physicians) from
> long to wide format, but the "reshape" function doesn't seem to
> work because it requires a "time" variable to identify the sequence
> among the repeated observations within individuals. My data set
> has no order, and different numbers of physicians code each death,
> up to 23. It is also quite large, so for-loops are very slow, and
> I'll need to repeat the procedure multiple times. So I'm looking
> for a processor-efficient way to replicate "reshape" without a time
> variable.
Basically your data ***should*** have a ``time variable''. To me
it looks perilous not to have one. Since you haven't got one, create
one:
make.time <- function(a) {
u <- tapply(1:length(a),a,function(x){
y <- 1:length(x)
names(y) <- x
y}
)
v <- unlist(u)
w <- as.numeric(unlist(lapply(u,names)))
z <- numeric(length(a))
z[w] <- v
z}
Now try the following:
id <- rep(1:5,2)
COD <- c("A01","A02","A03","A04","A05","B01","A02","B03","B04","A05")
MDid <- c(1:6,3,5,7,2)
data <- as.data.frame(cbind(id,COD,MDid))
data$time <- make.time(data$id)
wide <- reshape(data,timevar="time",v.names=c
("COD","MDid"),direction="wide")
Except for the order of the columns (which you can easily rearrange
if it matters,
which it doesn't) the result appears to be what you want.
cheers,
Rolf Turner
> Thanks in advance for any help you can provide. A worked example
> and some code I've tried are below. I'm working with R v2.8.1 on
> Windows XP Professional.
>
> Cheers,
> Alan Cohen
>
> Here's what my data look like now:
>
>> id <- rep(1:5,2)
>> COD <- c("A01","A02","A03","A04","A05","B01","A02","B03","B04","A05")
>> MDid <- c(1:6,3,5,7,2)
>> data <- as.data.frame(cbind(id,COD,MDid))
>> data
> id COD MDid
> 1 1 A01 1
> 2 2 A02 2
> 3 3 A03 3
> 4 4 A04 4
> 5 5 A05 5
> 6 1 B01 6
> 7 2 A02 3
> 8 3 B03 5
> 9 4 B04 7
> 10 5 A05 2
>
> And here's what I'd like them to look like:
>
>> id2 <- 1:5
>> COD.1 <- c("A01","A02","A03","A04","A05")
>> COD.2 <- c("B01","A02","B03","B04","A05")
>> MDid.1 <- 1:5
>> MDid.2 <-c(6,3,5,7,2)
>> data.wide <- as.data.frame(cbind(id2,COD.1,COD.2,MDid.1,MDid.2))
>> data.wide
> id2 COD.1 COD.2 MDid.1 MDid.2
> 1 1 A01 B01 1 6
> 2 2 A02 A02 2 3
> 3 3 A03 B03 3 5
> 4 4 A04 B04 4 7
> 5 5 A05 A05 5 2
>
> Here's the for-loop that's very slow (with or without the if-
> clauses activated):
>
> ids<-unique(data$id)
> ct<-length(ids)
> codes<-matrix(0,ct,11)
> colnames(codes)<-c
> ("ID","ICD1","Coder1","ICD2","Coder2","ICD3","Coder3","ICD4","Coder4",
> "ICD5","Coder5")
> j<-0
> for (i in 1:ct){
> kkk <- ids[i]
> rpt<-data[data$id==kkk,]
> j<-max(j,nrow(rpt))
> codes[i,1]<-kkk
> codes[i,2]<-rpt$ICDCode[1]
> codes[i,3]<-rpt$T_Physician_ID[1]
> #if (nrow(rpt)>=2){
> codes[i,4]<-rpt$ICDCode[2]
> codes[i,5]<-rpt$T_Physician_ID[2]
> #if (nrow(rpt)>=3) {
> codes[i,6]<-rpt$ICDCode[3]
> codes[i,7]<-rpt$T_Physician_ID[3]
> #if (nrow(rpt)>=4) {
> codes[i,8]<-rpt$ICDCode[4]
> codes[i,9]<-rpt$T_Physician_ID[4]
> #if (nrow(rpt)>=5) {
> codes[i,10]<-rpt$ICDCode[5]
> codes[i,11]<-rpt$T_Physician_ID[5]
> #}}}}
> }
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list