[R] rle on large data . . . without a for loop!
Justin Haynes
jtor14 at gmail.com
Sat Jun 18 00:55:10 CEST 2011
I think need to do something like this:
dat<-data.frame(state=sample(id=rep(1:5,each=200),1:3, 1000,
replace=T,prob=c(0.7,0.05,0.25)),V1=runif(1,10,1000),V2=rnorm(1000))
rle.dat<-rle(dat$state)
temp<-1
out<-data.frame(id=1:length(rle.dat$length))
for(i in 1:length(rle.dat$length)){
temp2<-temp+rle.dat$length[[i]]
out$V1[i]<-mean(dat$V1[temp:temp2])
out$V2[i]<-sum(dat$V2[temp:temp2])
out$state[i]<-rle.dat$value[[i]]
temp<-temp2
}
to a very large dataset. I want to apply a few summary functions to
some variables within a data.frame for given states. to complicate
things, id like to use plyr and split on the id variable before i do
any of this...
loop.func<-function(dat){
rle.dat<-rle(dat$state)
temp<-1
out<-data.frame(id=1:length(rle.dat$length))
for(i in 1:length(rle.dat$length)){
temp2<-temp+rle.dat$length[[i]]
out$V1[i]<-mean(dat$V1[temp:temp2])
out$V2[i]<-sum(dat$V2[temp:temp2])
out$state[i]<-rle.dat$value[[i]]
temp<-temp2
}
return(out)
}
out<-ddply(dat,.(id),loop.func)
mostly, i just don't understand how to use a list (especially in this
instance) in a plyr/apply statement...
Thanks,
Justin
More information about the R-help
mailing list