[R] More efficient data-block processing
arun
smartpink111 at yahoo.com
Fri Mar 7 00:21:39 CET 2014
HI Craig,
Assuming that this is similar to what you tried:
dat <- read.table(text="member code
1 A
1 C
1 F
2 B
2 E
3 D
3 A
3 B
3 D
4 G
4 A",sep="",header=TRUE,stringsAsFactors=FALSE)
code.list <- LETTERS[1:5]
n.mbr <- 4
mbr.list <- 1:4
matrix.mat <- matrix(0,ncol=length(code.list),nrow=length(unique(dat$member)),dimnames=list(NULL,code.list))
for(i in 1:n.mbr){
mbr.i <- dat[dat$member==mbr.list[i],]
matrix.mat[i,unique(match(mbr.i$code,code.list))]<- 1
}
matrix.mat1 <- cbind(member=mbr.list,matrix.mat)
matrix.mat1
##This could be also done by:
library(reshape2)
res <- dcast(dat,member~code,length,value.var="code")
res1 <- res[,c(TRUE,names(res)[-1] %in% code.list)]
res1[,-1][res1[,-1]>1] <- 1
res1
A.K.
I have a medical insurance datafile divided into blocks by member, with
multiple lines per member. I am processing these into a one line per
member model matrix. Member block sizes vary from 1 to 50+. I am
matching attributes in claims data to columns in the model matrix and
have been getting by with a for loop, but for large files it takes much
too long. Is there vectorized/apply based method to do this more
efficiently?
thanks in advance, Craig McKinstry
input data:
member code
1 A
1 C
1 F
2 B
2 E
3 D
3 A
3 B
3 D
4 G
4 A
code.list <- c(A,B,C,D,E)
for(i in 1:n.mbr){
mbr.i <- dat[dat$Rmbr==mbr.list[i],] #EXTRACT BLOCK OF MEMBER CLAIMS
matrix.mat[i,unique(match(mbr.i$code,code.list))] <- 1
}
output model.matrix
Member A B C D E
1 1 0 1 0 0
2 0 1 0 0 1
3 1 1 0 1 0
4 1 0 0 0 0
More information about the R-help
mailing list