[R] Re : Re : descriptive statistics
Matthieu Lesnoff
matthieu.lesnoff at gmail.com
Mon Dec 13 20:29:25 CET 2010
You could also use aggstat() of package tdisplay (available at
http://forums.cirad.fr/logiciel-R/viewtopic.php?t=3367). See the help
page.
> mydata <- data.frame(
+ y1 = c(NA, rnorm(n = 8, mean = 10, sd = 5), NA),
+ y2 = c(rep(NA, 2), rnorm(n = 6, mean = 10, sd = 5), rep(NA, 2)),
+ y3 = rnorm(n = 10, mean = 10, sd = 5),
+ y4 = rnorm(n = 10, mean = 10, sd = 5),
+ f1 = rep(c("a", NA, "b"), times = c(3, 1, 6)),
+ f2 = rep(c("c", "d", NA), times = c(5, 3, 2)),
+ f3 = rep(c("e", "f", "g"), times = c(3, 3, 4))
+ )
> mydata
y1 y2 y3 y4 f1 f2 f3
1 NA NA 11.277582 13.120160 a c e
2 -0.7843488 NA 18.633881 9.095533 a c e
3 11.6555526 15.563409 9.433654 16.062916 a c e
4 12.2523768 5.567119 19.381132 13.734706 <NA> c f
5 11.4456084 8.170626 5.039419 7.135086 b c f
6 16.1444098 2.518970 7.468279 5.441936 b d f
7 9.4774380 5.114297 14.777489 8.884707 b d g
8 13.9189684 13.090211 17.060803 12.467241 b d g
9 12.0196222 NA 4.551620 9.506194 b <NA> g
10 NA NA 8.377446 6.572499 b <NA> g
> aggstat(formula = cbind(y1, y2, y3) ~ f1 + f2, data = mydata, FUN = mean)
aggstat(formula = cbind(y1, y2, y3) ~ f1 + f2, data = mydata,
FUN = mean)
f1 f2 y1 y2 y3
1 a c 5.435602 15.563409 13.115039
2 b c 11.445608 8.170626 5.039419
3 b d 13.180272 6.907826 13.102191
See also the function univar():
> mydata <- data.frame(
+ f1 = c(NA, rep("a", 2), rep("b", 5), NA, "a", "a"),
+ f2 = rep(c("c", "d"), times = c(5, 6)),
+ f3 = rep(c("e", NA, "f"), times = c(4, 1, 6)),
+ y1 = c(rnorm(n = 9, mean = 10, sd = 5), NA, 2.1)
+ )
> mydata
f1 f2 f3 y1
1 <NA> c e 6.948897
2 a c e 20.115954
3 a c e 13.569935
4 b c e 12.159732
5 b c <NA> 11.862606
6 b d f 21.610803
7 b d f 10.820413
8 b d f 13.200561
9 <NA> d f 9.694245
10 a d f NA
11 a d f 2.100000
> univar(formula = y1 ~ f1, data = mydata)
f1 NA's n min q25 median mean q75 max sd iqr range cv
1 a 1 3 2.10 7.835 13.57 11.929 16.843 20.116 9.119 9.008 18.016 0.764
2 b 0 5 10.82 11.863 12.16 13.931 13.201 21.611 4.376 1.338 10.790 0.314
> univar(formula = y1 ~ f1 + f2, data = mydata)
f1 f2 NA's n min q25 median mean q75 max sd iqr
range cv
1 a c 0 2 13.570 15.206 16.843 16.843 18.479 20.116 4.629 3.273
6.546 0.275
3 a d 1 1 2.100 2.100 2.100 2.100 2.100 2.100 NA 0.000
0.000 NA
2 b c 0 2 11.863 11.937 12.011 12.011 12.085 12.160 0.210 0.149
0.297 0.017
4 b d 0 3 10.820 12.010 13.201 15.211 17.406 21.611 5.669 5.395
10.790 0.373
--
Matthieu Lesnoff
CIRAD
Bamako, Mali
On 13 December 2010 17:04, Ivan Calandra <ivan.calandra at uni-hamburg.de> wrote:
> Do it with aggregate(), something like this should do:
> aggregate(.~cluster, FUN=summary, data=data)
>
> Now if you don't want to run summary(), replace it with the function you'd
> like.
>
> HTH,
> Ivan
>
> Le 12/13/2010 17:17, effeesse a écrit :
>>
>> what am I supposed to put into function(x)? The indicator for extracting
>> the
>> subgroups?
>> data is the df. cluster={1,...,14}.
>>
>> This is how I was compiling:
>>
>> "for (i in 1:14) {
>> my.summary<-data$cluster==i c(mean(?),var(?))
>>
>> summary(var_A~cluster, fun=my.summary,data=data)
>> summary(var_B~cluster, fun=my.summary,data=data)
>> summary(var_C~cluster, fun=my.summary,data=data)
>> summary(var_D~cluster, fun=my.summary,data=data)
>> summary(var_E~cluster, fun=my.summary,data=data)
>> summary(var_F~cluster, fun=my.summary,data=data)
>> summary(var_G~cluster, fun=my.summary,data=data)
>> }"
>>
>> thanks for your patience.
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calandra at uni-hamburg.de
>
> **********
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list