[R] More efficient data-block processing

arun smartpink111 at yahoo.com
Fri Mar 7 00:21:39 CET 2014


HI Craig,
Assuming that this is similar to what you tried:

dat <- read.table(text="member code
1 A
1 C
1 F
2 B
2 E
3 D
3 A
3 B
3 D
4 G
4 A",sep="",header=TRUE,stringsAsFactors=FALSE)
 code.list <- LETTERS[1:5]
n.mbr <- 4
 mbr.list <- 1:4
matrix.mat <- matrix(0,ncol=length(code.list),nrow=length(unique(dat$member)),dimnames=list(NULL,code.list))
 for(i in 1:n.mbr){
 mbr.i <- dat[dat$member==mbr.list[i],]
 matrix.mat[i,unique(match(mbr.i$code,code.list))]<- 1
 }
matrix.mat1 <- cbind(member=mbr.list,matrix.mat)
matrix.mat1

##This could be also done by:
library(reshape2)
res <- dcast(dat,member~code,length,value.var="code")
 res1 <- res[,c(TRUE,names(res)[-1] %in% code.list)]
res1[,-1][res1[,-1]>1] <- 1
res1

A.K.







I have a medical insurance datafile divided into blocks by member, with 
multiple lines per member. I am processing these into a one line per 
member model matrix. Member block sizes vary from 1 to 50+. I am 
matching attributes in claims data to columns in the model matrix and 
have been getting by with a for loop, but for large files it takes much 
too long. Is there vectorized/apply based method to do this more 
efficiently? 

thanks in advance, Craig McKinstry 

input data: 
member	code 
1	A 
1	C 
1	F 
2	B 
2	E 
3	D 
3	A 
3	B 
3	D 
4	G 
4	A 

code.list <- c(A,B,C,D,E) 
for(i in 1:n.mbr){ 
  mbr.i <- dat[dat$Rmbr==mbr.list[i],]	#EXTRACT BLOCK OF MEMBER CLAIMS 
  matrix.mat[i,unique(match(mbr.i$code,code.list))] <- 1 
} 

output model.matrix 
Member	A	B	C	D	E 
1	1	0	1	0	0 
2	0	1	0	0	1 
3	1	1	0	1	0 
4	1	0	0	0	0



More information about the R-help mailing list