[R] Long to wide format without time variable

Wed Jun 24 04:02:35 CEST 2009

On 24/06/2009, at 9:52 AM, Alan Cohen wrote:

> Hi all,
>
> I am trying to convert a data set of physician death codings (each  
> individual's cause of death is coded by multiple physicians) from  
> long to wide format, but the "reshape" function doesn't seem to  
> work because it requires a "time" variable to identify the sequence  
> among the repeated observations within individuals.  My data set  
> has no order, and different numbers of physicians code each death,  
> up to 23.  It is also quite large, so for-loops are very slow, and  
> I'll need to repeat the procedure multiple times.  So I'm looking  
> for a processor-efficient way to replicate "reshape" without a time  
> variable.

	Basically your data ***should*** have a ``time variable''.  To me
	it looks perilous not to have one.  Since you haven't got one, create
	one:

	make.time <- function(a) {
		u <- tapply(1:length(a),a,function(x){
                 		y <- 1:length(x)
                 		names(y) <- x
                			y}
      		      )
		v <- unlist(u)
		w <- as.numeric(unlist(lapply(u,names)))
		z <- numeric(length(a))
		z[w] <- v
		z}

	Now try the following:

	id <- rep(1:5,2)
	COD <- c("A01","A02","A03","A04","A05","B01","A02","B03","B04","A05")
	MDid <- c(1:6,3,5,7,2)
	data <- as.data.frame(cbind(id,COD,MDid))
	data$time <- make.time(data$id)
	wide <- reshape(data,timevar="time",v.names=c 
("COD","MDid"),direction="wide")

	Except for the order of the columns (which you can easily rearrange  
if it matters,
	which it doesn't) the result appears to be what you want.

		cheers,

			Rolf Turner

> Thanks in advance for any help you can provide.  A worked example  
> and some code I've tried are below.  I'm working with R v2.8.1 on  
> Windows XP Professional.
>
> Cheers,
> Alan Cohen
>
> Here's what my data look like now:
>
>> id <- rep(1:5,2)
>> COD <- c("A01","A02","A03","A04","A05","B01","A02","B03","B04","A05")
>> MDid <- c(1:6,3,5,7,2)
>> data <- as.data.frame(cbind(id,COD,MDid))
>> data
>    id COD MDid
> 1   1 A01    1
> 2   2 A02    2
> 3   3 A03    3
> 4   4 A04    4
> 5   5 A05    5
> 6   1 B01    6
> 7   2 A02    3
> 8   3 B03    5
> 9   4 B04    7
> 10  5 A05    2
>
> And here's what I'd like them to look like:
>
>> id2 <- 1:5
>> COD.1 <- c("A01","A02","A03","A04","A05")
>> COD.2 <- c("B01","A02","B03","B04","A05")
>> MDid.1 <- 1:5
>> MDid.2 <-c(6,3,5,7,2)
>> data.wide <- as.data.frame(cbind(id2,COD.1,COD.2,MDid.1,MDid.2))
>> data.wide
>   id2 COD.1 COD.2 MDid.1 MDid.2
> 1   1   A01   B01      1      6
> 2   2   A02   A02      2      3
> 3   3   A03   B03      3      5
> 4   4   A04   B04      4      7
> 5   5   A05   A05      5      2
>
> Here's the for-loop that's very slow (with or without the if- 
> clauses activated):
>
> ids<-unique(data$id)
> ct<-length(ids)
> codes<-matrix(0,ct,11)
> colnames(codes)<-c 
> ("ID","ICD1","Coder1","ICD2","Coder2","ICD3","Coder3","ICD4","Coder4", 
> "ICD5","Coder5")
> j<-0
> for (i in 1:ct){
>   kkk <- ids[i]
>   rpt<-data[data$id==kkk,]
>   j<-max(j,nrow(rpt))
>   codes[i,1]<-kkk
>   codes[i,2]<-rpt$ICDCode[1]
>   codes[i,3]<-rpt$T_Physician_ID[1]
>   #if (nrow(rpt)>=2){
>    codes[i,4]<-rpt$ICDCode[2]
>    codes[i,5]<-rpt$T_Physician_ID[2]
>     #if (nrow(rpt)>=3) {
>      codes[i,6]<-rpt$ICDCode[3]
>      codes[i,7]<-rpt$T_Physician_ID[3]
>       #if (nrow(rpt)>=4) {
>        codes[i,8]<-rpt$ICDCode[4]
>        codes[i,9]<-rpt$T_Physician_ID[4]
>           #if (nrow(rpt)>=5) {
>            codes[i,10]<-rpt$ICDCode[5]
>            codes[i,11]<-rpt$T_Physician_ID[5]
> #}}}}
> }
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}