[R] ragged data.frame? using plyr
Justin Haynes
jtor14 at gmail.com
Fri Jun 3 03:03:11 CEST 2011
I have a dataset that looks like:
set.seed(144)
sam<-sample(1000,100)
dat<-data.frame(id=letters[1:10],value=rnorm(1000),day=c(rep(1,100),rep(2,100),rep(3,100),rep(4,100),rep(5,100)))
I want to "normalise" it using the following function (unless you have
a better idea...):
adj.values<-function(dframe){
value_mean<-mean(dframe$value)
value_sd<-sd(dframe$value)
norm_value<-(dframe$value-value_mean)/value_sd
score_scale<-100
score_offset<-1000
scaled_value<-norm_value*score_scale+score_offset
names(scaled_value)<-dframe$id
return(scaled_value)
}
score_out<-ddply(dat,.(day),adj.values)
Gives me my data.frame all nice and pretty and ready to do the following:
score_out.melt<-melt(score_out,id='day')
names(score_out.melt)<-c('day','id','score')
tblscore_mean<-tapply(score_out.melt$score,INDEX=score_out.melt$id,mean)
tblscore_iqr<-tapply(score_out.melt$score,INDEX=score_out.melt$id,IQR)
score_mean_iqr<-data.frame(id=names(tblscore_iqr),mean=tblscore_mean,iqr=tblscore_iqr)
However, as it turns out, my data look more like:
dat<-dat[-sam]
ldply(dlply(dat,.(id,day),adj.values),length)
So on different days I only have data for some of the id variables
which leads to a "ragged" data.frame.
ddply(dat,.(id,day),adj.values)
can i do something like
ldply(dlply(dat,.(id.day),adj.values), function(x){put in a NA for the
places where data is missing?})
To give you a sense of where this is going, I'm eventually going to
plot the mean of each id variable over the time period vs. its IQR
(again unless you have a better idea...).
As always,
thanks for your help!
Justin
More information about the R-help
mailing list