[R] descriptive stats by cells in factorial design
David Carlson
dcarlson at tamu.edu
Mon Aug 5 16:36:55 CEST 2013
This is a bit simpler. The function quantile() labels the
output whereas fivenum() does not:
aggregate(Age ~ Generation + Zygosity + Sex + Cohort +
ESstatus, data=x,
function(x) c(mean=mean(x), sd=sd(x), quantile(x)))
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Mike Miller
Sent: Sunday, August 4, 2013 3:15 AM
To: R-Help List
Subject: Re: [R] descriptive stats by cells in factorial
Summary of my question:
"I have a 5-way factorial design, two levels per factor, so 32
cells, and
I mostly just want the means and standard deviations for the
contents of
every cell. Similarly, it would be nice to also have the
range and maybe
some percentiles, if there is a function that would just pump
them out."
I received three answers:
On Sat, 3 Aug 2013, Søren Højsgaard wrote:
> The summaryBy function in the doBy package may help you.
On Sat, 3 Aug 2013, Jim Lemon wrote:
> You may find that the barNest function in plotrix is useful
for showing
> the means and standard deviations of nested designs.
On Sat, 3 Aug 2013, David Winsemius wrote:
> 'tapply' lets one apply a function to tabulated items. There
> 'describe' functions in a variety of packages.
I'll try to study the second two a bit more eventually, but
the first
answer solved my problem quite perfectly. I wanted it to give
the 25% and
75% quantiles, so I made functions for those, then I did what
you see
below. (Code and output at the end.)
Note that the neat fivenum() function would provide min, q25,
median, q75
and max, so I wouldn't need to create functions for q25 and
q75, but
having one function pump out a vector instead of a scalar
seems to mess up
the column naming scheme. Using this function list...
FUN=c(mean, sd, min, q25, median, q75, max, length)
...gave me these column names:
Age.mean Age.sd Age.min Age.q25 Age.median Age.q75 Age.max
Which are what I want, but using this function list...
FUN=c(mean, sd, length, fivenum)
...gave me these much less descriptive numbered column names:
Age.FUN1 Age.FUN2 Age.FUN3 Age.FUN4 Age.FUN5 Age.FUN6 Age.FUN7
That is, it probably sees the length of the output vector for
all of the
functions and then creates labels. If the length of that
output vector
equals the length of the function list, it uses appropriate
Otherwise it doesn't know the correspondence of functions with
elements, so it uses a numbering scheme.
My code:
> x <- read.delim("ID_data.txt",
> str(x)
'data.frame': 4434 obs. of 7 variables:
$ ID : chr "200" "201" "211" "2000" ...
$ Cohort : Factor w/ 2 levels "11","17": 2 2 2 2 2 2 2 2
2 2 ...
$ Age : num 18.1 18.1 49.2 18 18 ...
$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2
2 2 2 2 2 ...
$ Zygosity : Factor w/ 2 levels "DZ","MZ": 2 2 2 1 1 1 1 2
2 2 ...
$ Generation: Factor w/ 2 levels "Offspring","Parent": 1 1
2 1 1 1 1 1 1 2 ...
$ ESstatus : Factor w/ 2 levels "ES","notES": 2 2 2 2 2 2
2 2 2 2 ...
> install.packages("doBy")
> library(doBy)
> q25 <- function(x){quantile(x,.25,names=F)}
> q75 <- function(x){quantile(x,.75,names=F)}
> summaryBy(Age ~ Generation + Zygosity + Sex + Cohort +
ESstatus, data=x, FUN=c(mean, sd, min, q25, median, q75, max,
Generation Zygosity Sex Cohort ESstatus Age.mean
Age.sd Age.min Age.q25 Age.median Age.q75 Age.max Age.length
1 Offspring DZ Female 11 ES 17.78528
0.3535863 16.93 17.6000 17.775 17.9650 18.92
2 Offspring DZ Female 11 notES 18.13679
0.5555968 16.76 17.8525 18.190 18.4575 19.50
3 Offspring DZ Female 17 notES 17.47529
0.4569588 16.56 17.0700 17.590 17.8700 18.29
4 Offspring DZ Male 11 ES 17.76149
0.3467540 17.18 17.5150 17.715 18.0000 18.71
5 Offspring DZ Male 11 notES 17.87667
0.5187333 16.83 17.4600 17.860 18.2400 19.02
6 Offspring DZ Male 17 notES 17.50418
0.3915823 16.73 17.1900 17.530 17.8300 18.52
7 Offspring MZ Female 11 ES 17.87628
0.4506530 16.86 17.6775 17.805 18.1000 19.12
8 Offspring MZ Female 11 notES 18.05739
0.6103713 16.76 17.6300 18.050 18.4200 19.70
9 Offspring MZ Female 17 notES 17.41061
0.4956190 16.55 16.9700 17.340 17.8200 18.45
10 Offspring MZ Male 11 ES 17.77174
0.3236917 16.84 17.5800 17.790 17.9700 19.02
11 Offspring MZ Male 11 notES 17.87718
0.6472397 16.56 17.3300 17.855 18.2100 20.01
12 Offspring MZ Male 17 notES 17.49114
0.3961757 16.65 17.1775 17.500 17.8100 18.35
13 Parent DZ Female 11 ES 44.61512
5.1246314 32.17 41.3400 44.680 48.2800 57.95
14 Parent DZ Female 11 notES 42.54346
4.3670998 34.03 39.3450 42.110 45.5500 57.06
15 Parent DZ Female 17 notES 46.30559
4.9177705 36.10 42.7275 45.765 48.3350 62.69
16 Parent DZ Male 11 ES 44.60206
4.5605484 34.31 41.4475 44.890 47.4975 58.75
17 Parent DZ Male 11 notES 42.71121
4.9600561 32.05 39.2400 42.760 45.2700 58.20
18 Parent DZ Male 17 notES 46.77458
4.0226198 40.18 44.1250 46.000 48.8200 61.12
19 Parent MZ Female 11 ES 44.23476
5.0214627 29.55 40.6925 44.125 47.7300 56.73
20 Parent MZ Female 11 notES 42.31988
5.3622671 30.31 38.6050 41.835 46.0175 56.58
21 Parent MZ Female 17 notES 46.36490
5.1770435 34.88 42.4200 45.950 49.4950 63.18
22 Parent MZ Male 11 ES 43.40787
5.3507439 31.28 39.9700 43.440 46.4800 64.65
23 Parent MZ Male 11 notES 41.56363
4.6564818 32.10 38.0250 41.390 44.6450 65.29
24 Parent MZ Male 17 notES 46.69298
5.2421896 34.45 43.1500 45.890 49.0050 63.80
Thanks very much.
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
More information about the R-help
mailing list