[R] Standardizing the number of records by group

Peter Ehlers ehlers at ucalgary.ca
Mon Jul 25 21:43:54 CEST 2011


On 2011-07-25 12:24, Sam Albers wrote:
> Hello R-help,
>
> I have some data collected at regular intervals but for a varying
> length of time. I would like to standardize the length of time
> collected and I can do this by standardizing the number of records I
> use for my analysis.
>
> Take for example the data set below:
>
>
> library(plyr)
> x<- runif(18,10, 15)
> df<- as.data.frame(x)
> df$fac<- factor(c("Test1","Test1","Test1","Test1","Test1","Test1","Test1",
>                   "Test2","Test2","Test2","Test2","Test2",
>                   "Test3","Test3","Test3","Test3","Test3","Test3"))
>
> ## Here is where I would like to standardize the number of records
>
> df.avg<- ddply(df, c("fac"), function(df) return(c(x.avg=mean(df$x),
> n=length(df$x))))
> df.avg
>
> Here there is a different number of records for each factor level. Say
> I only wanted to use the first 4 records at each factor level. Prior
> to taking the mean of these values how might I drop all the records
> after 4? Can anyone suggest a good way to do this?

Just subset in your function definition:

   mean(df$x) --> mean(df$x[1:4])

But I would use summarize:

   ddply(df, .(fac), summarize, x.avg = mean(x[1:4]), n = length(x))

Peter Ehlers

>
> I am using R 2.12.1 and Emacs + ESS.
>
> Thanks so much in advance.
>
> Sam
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list