[R] rle on large data . . . without a for loop!
    Justin Haynes 
    jtor14 at gmail.com
       
    Sat Jun 18 00:55:10 CEST 2011
    
    
  
I think need to do something like this:
dat<-data.frame(state=sample(id=rep(1:5,each=200),1:3, 1000,
replace=T,prob=c(0.7,0.05,0.25)),V1=runif(1,10,1000),V2=rnorm(1000))
rle.dat<-rle(dat$state)
temp<-1
out<-data.frame(id=1:length(rle.dat$length))
for(i in 1:length(rle.dat$length)){
	temp2<-temp+rle.dat$length[[i]]
	out$V1[i]<-mean(dat$V1[temp:temp2])
	out$V2[i]<-sum(dat$V2[temp:temp2])
	out$state[i]<-rle.dat$value[[i]]
	temp<-temp2
}
to a very large dataset.  I want to apply a few summary functions to
some variables within a data.frame for given states. to complicate
things, id like to use plyr and split on the id variable before i do
any of this...
loop.func<-function(dat){
  rle.dat<-rle(dat$state)
  temp<-1
  out<-data.frame(id=1:length(rle.dat$length))
  for(i in 1:length(rle.dat$length)){
	temp2<-temp+rle.dat$length[[i]]
	out$V1[i]<-mean(dat$V1[temp:temp2])
	out$V2[i]<-sum(dat$V2[temp:temp2])
	out$state[i]<-rle.dat$value[[i]]
	temp<-temp2
  }
  return(out)
}
out<-ddply(dat,.(id),loop.func)
mostly, i just don't understand how to use a list (especially in this
instance) in a plyr/apply statement...
Thanks,
Justin
    
    
More information about the R-help
mailing list