[R] How to calculate means for multiple variables in samples with different sizes
Matthew Dowle
mdowle at mdowle.plus.com
Fri Mar 11 13:53:42 CET 2011
Hi,
One liners in data.table are :
> x.dt[,lapply(.SD,mean),by=sample]
sample replicate height weight age
[1,] A 2.0 12.20000 0.5033333 6.000000
[2,] B 1.5 12.75000 0.7150000 4.500000
[3,] C 2.5 11.35250 0.5125000 3.750000
[4,] D 2.0 14.99333 0.6733333 5.333333
without the replicate column :
> x.dt[,lapply(list(height,weight,age),mean),by=sample]
sample V1 V2 V3
[1,] A 12.20000 0.5033333 6.000000
[2,] B 12.75000 0.7150000 4.500000
[3,] C 11.35250 0.5125000 3.750000
[4,] D 14.99333 0.6733333 5.333333
one (long) way to retain the column names :
> x.dt[,lapply(list(height=height,weight=weight,age=age),mean),by=sample]
sample height weight age
[1,] A 12.20000 0.5033333 6.000000
[2,] B 12.75000 0.7150000 4.500000
[3,] C 11.35250 0.5125000 3.750000
[4,] D 14.99333 0.6733333 5.333333
>
or this is shorter :
> ans = x.dt[,lapply(.SD,mean),by=sample]
> ans$replicate = NULL
> ans
sample height weight age
[1,] A 12.20000 0.5033333 6.000000
[2,] B 12.75000 0.7150000 4.500000
[3,] C 11.35250 0.5125000 3.750000
[4,] D 14.99333 0.6733333 5.333333
>
or another way :
> mycols = c("height","weight","age")
> x.dt[,lapply(.SD[,mycols,with=FALSE],mean),by=sample]
sample height weight age
[1,] A 12.20000 0.5033333 6.000000
[2,] B 12.75000 0.7150000 4.500000
[3,] C 11.35250 0.5125000 3.750000
[4,] D 14.99333 0.6733333 5.333333
>
or another way :
> x.dt[,lapply(.SD[,list(height,weight,age)],mean),by=sample]
sample height weight age
[1,] A 12.20000 0.5033333 6.000000
[2,] B 12.75000 0.7150000 4.500000
[3,] C 11.35250 0.5125000 3.750000
[4,] D 14.99333 0.6733333 5.333333
>
The way Jim showed :
> x.dt[, list(height = mean(height)
+ , weight = mean(weight)
+ , age = mean(age)
+ ), by = sample]
is the more flexible syntax for when you want different functions on
different columns, easily, and as a bonus is fast.
Matthew
"Dennis Murphy" <djmuser at gmail.com> wrote in message
news:AANLkTimxXL8BqTaYKUb=sAEE2CrA9fOSfuAp4QZkX8fe at mail.gmail.com...
> Hi:
>
> Here are a few one-liners. Calling your data frame dd,
>
> aggregate(cbind(height, weight, age) ~ sample, data = dd, FUN = mean)
> sample height weight age
> 1 A 12.20000 0.5033333 6.000000
> 2 B 12.75000 0.7150000 4.500000
> 3 C 11.35250 0.5125000 3.750000
> 4 D 14.99333 0.6733333 5.333333
>
> With package doBy:
>
> library(doBy)
> summaryBy(height + weight + age ~ sample, data = dd, FUN = mean)
> sample height.mean weight.mean age.mean
> 1 A 12.20000 0.5033333 6.000000
> 2 B 12.75000 0.7150000 4.500000
> 3 C 11.35250 0.5125000 3.750000
> 4 D 14.99333 0.6733333 5.333333
>
> With package plyr:
>
> library(plyr)
> ddply(dd, .(sample), colwise(mean, .(height, weight, age)))
> sample height weight age
> 1 A 12.20000 0.5033333 6.000000
> 2 B 12.75000 0.7150000 4.500000
> 3 C 11.35250 0.5125000 3.750000
> 4 D 14.99333 0.6733333 5.333333
>
> Dennis
>
> On Fri, Mar 11, 2011 at 1:32 AM, Aline Santos <alinexss at gmail.com> wrote:
>
>> Hello R-helpers:
>>
>> I have data like this:
>>
>> sample replicate height weight age
>> A 1.00 12.0 0.64 6.00
>> A 2.00 12.2 0.38 6.00
>> A 3.00 12.4 0.49 6.00
>> B 1.00 12.7 0.65 4.00
>> B 2.00 12.8 0.78 5.00
>> C 1.00 11.9 0.45 6.00
>> C 2.00 11.84 0.44 2.00
>> C 3.00 11.43 0.32 3.00
>> C 4.00 10.24 0.84 4.00
>> D 1.00 14.2 0.54 2.00
>> D 2.00 15.67 0.67 7.00
>> D 3.00 15.11 0.81 7.00
>>
>> Now, how can I calculate the mean for each condition (heigth, weigth,
>> age)
>> in each sample, considering the samples have different number of
>> replicates?
>>
>>
>> The final matrix should look like:
>>
>> sample height weight age
>> A 12.20 0.50 6.00
>> B 12.75 0.72 4.50
>> C 11.35 0.51 3.75
>> D 14.99 0.67 5.33
>>
>> This is a simplified version of my dataset, which consist of 100 samples
>> (unequally distributed in 530 replicates) for 600 different conditions.
>>
>> I appreciate all the help.
>>
>> A.S.
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
More information about the R-help
mailing list