[R] Odp: question about "mean"
Allan Engelhardt
allane at cybaea.com
Tue Jun 15 18:08:20 CEST 2010
This solution also seems to be the fastest of the proposed options for
this data set:
library("rbenchmark")
benchmark(columns = c("test", "elapsed", "relative"), order = "elapsed",
apply =apply(iris[, -5], 2, tapply, iris$Species, mean),
with = with(iris, rowsum(iris[, -5], Species)/table(Species)),
aggregate = aggregate(iris[,-5],list(iris[,5]),mean),
sapply = sapply(split(iris[,1:4], iris$Species), mean))
# 4 sapply 0.148 1.000000
# 1 apply 0.248 1.675676
# 2 with 0.310 2.094595
# 3 aggregate 0.313 2.114865
However, the 'with/rowsum/table' option proposed by Bill Venables
appears to scale better:
i <- rbind(iris, iris, iris, iris, iris)
i <- rbind(i, i, i, i, i); i <- rbind(i, i, i, i, i); i <- rbind(i, i,
i, i, i)
NROW(i)
# [1] 93750
benchmark(columns=c("test", "elapsed", "relative"), order="elapsed",
apply=apply(i[, -5], 2, tapply, i$Species, mean),
with=with(i, rowsum(i[, -5], Species)/table(Species)),
aggregate=aggregate(i[,-5],list(i[,5]),mean),
sapply=sapply(split(i[,1:4], i$Species), mean))
# test elapsed relative
# 2 with 2.708 1.000000
# 4 sapply 5.189 1.916174
# 3 aggregate 15.990 5.904727
# 1 apply 31.646 11.686115
(Because I care about these things...)
Allan
On 10/06/10 09:44, Petr PIKAL wrote:
> Hi
>
> split/sapply can be used besides other options
>
> sapply(split(iris[,1:4], iris$Species), mean)
>
> Regards
> Petr
>
> r-help-bounces at r-project.org napsal dne 10.06.2010 00:43:29:
>
>
>> Hi there:
>> I have a question about generating mean value of a data.frame. Take
>> iris data for example, if I have a data.frame looking like the
>>
> following:
>
>> ---------------------
>> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
>> 1 5.1 3.5 1.4
>> 0.2 setosa
>> 2 4.9 3.0 1.4
>> 0.2 setosa
>> 3 4.7 3.2 1.3
>> 0.2 setosa
>> . . . .
>> . .
>> . . . .
>> . .
>> . . . .
>> . .
>> -----------------------
>> There are three different species in this table. I want to make a table
>>
> and
>
>> calculate mean value for each specie as the following table:
>>
>> -----------------
>> Sepal.Length Sepal.Width Petal.Length
>> Petal.Width
>> mean.setosa 5.006 3.428 1.462
>> 0.246
>> mean.versicolor 5.936 2.770 4.260
>> 1.326
>> mean.virginica 6.588 2.974 5.552
>> 2.026
>> -----------------
>> Is there any short syntax can do it?? I mean shorter than the code I
>>
> wrote
>
>> as following:
>>
>> attach(iris)
>> mean.setosa<-mean(iris[Species=="setosa", 1:4])
>> mean.versicolor<-mean(iris[Species=="versicolor", 1:4])
>> mean.virginica<-mean(iris[Species=="virginica", 1:4])
>> data.mean<-rbind(mean.setosa, mean.versicolor, mean.virginica)
>> detach(iris)
>> ------------------
>>
>> Thanks a million!!!
>>
>>
>> --
>> =====================================
>> Shih-Hsiung, Chou
>> System Administrator / PH.D Student at
>> Department of Industrial Manufacturing
>> and Systems Engineering
>> Kansas State University
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>
> http://www.R-project.org/posting-guide.html
>
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list