[R] Automatically Extracting F- and P- vals from ANOVA

Tue Feb 1 00:18:22 CET 2005

On Mon, 2005-01-31 at 16:06 -0500, Uri wrote:
> Dear R community,
> 
> I'm currently using R to analyze functional Magnetic Resonance Imaging
> data.  Each analysis involves running ~120,000 repeated-measures
> ANOVAs.
> 
> I would like to know if there is any automatic way to access the F-
> and P-value data that are associated with each of these 120,000
> ANOVAs.
> 
> For example, if the summary output (for the 1st ANOVA of 120,000)
> shows the following value for factor "C", then I would like to
> automatically extract the F value (4.1) and P value (0.13) from the
> summary() output:
> 
> > summary(anova.1)
> 
> Error: S
>           Df Sum Sq Mean Sq F value Pr(>F)
> Residuals  1   12.5    12.5
> 
> Error: S:C
>           Df  Sum Sq Mean Sq F value Pr(>F)
> C          3 128.500  42.833  4.2131 0.1341
> Residuals  3  30.500  10.167
> 
> Of course, I would be interested only in factor "C".
> 
> I would then push these values to a hash of hashes (i.e, multi-dim
> array) that summarizes the results of all the ANOVAs.  {Anova1,
> (F=4.21), (P=0.13)} etc'.
> 
> I "dumped" a sample aov object, and my impression was that the F and P
> values for each effect are not hard coded, but constructed on the fly
> by the summary() function.
> 
> Is there a script or function that already does what I am looking for?
>  If not, my second option would be to sink the summary of each anova
> to a text file, and then post-process it, but this latter option is
> sub-optimal.
> 
> Best,
> Uri Hasson

It is actually a little complicated, but you are "most" of the way
there, in that your impression is correct. If you read through the code
for summary.aov, you will see how the output is generated.

Briefly, (since I do not have your data), let me use the output from the
final example in ?aov:

> summary(npk.aovE)

Error: block
          Df  Sum Sq Mean Sq F value Pr(>F)
N:P:K      1  37.002  37.002  0.4832 0.5252
Residuals  4 306.293  76.573

Error: Within
          Df  Sum Sq Mean Sq F value   Pr(>F)
N          1 189.282 189.282 12.2587 0.004372 **
P          1   8.402   8.402  0.5441 0.474904
K          1  95.202  95.202  6.1657 0.028795 *
N:P        1  21.282  21.282  1.3783 0.263165
N:K        1  33.135  33.135  2.1460 0.168648
P:K        1   0.482   0.482  0.0312 0.862752
Residuals 12 185.287  15.441
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

If you look at the structure of the summary, you get:

> str(summary(npk.aovE))
List of 2
 $ Error: block :List of 1
  ..$ :Classes anova  and `data.frame': 2 obs. of  5 variables:
  .. ..$ Df     : num [1:2] 1 4
  .. ..$ Sum Sq : num [1:2]  37 306
  .. ..$ Mean Sq: num [1:2] 37.0 76.6
  .. ..$ F value: num [1:2] 0.483    NA
  .. ..$ Pr(>F) : num [1:2] 0.525    NA
  ..- attr(*, "class")= chr [1:2] "summary.aov" "listof"
 $ Error: Within:List of 1
  ..$ :Classes anova  and `data.frame': 7 obs. of  5 variables:
  .. ..$ Df     : num [1:7] 1 1 1 1 1 1 12
  .. ..$ Sum Sq : num [1:7] 189.3   8.4  95.2  21.3  33.1 ...
  .. ..$ Mean Sq: num [1:7] 189.3   8.4  95.2  21.3  33.1 ...
  .. ..$ F value: num [1:7] 12.259  0.544  6.166  1.378  2.146 ...
  .. ..$ Pr(>F) : num [1:7] 0.00437 0.47490 0.02880 0.26317 0.16865 ...
  ..- attr(*, "class")= chr [1:2] "summary.aov" "listof"
 - attr(*, "class")= chr "summary.aovlist"

Note that it is composed of two lists, each component of which is in
turn a list, containing a data frame.  Confused yet?  ;-)

Thus, you need to get down a couple of list levels, before you can
extract the elements in the data frame. Thus:

> summary(npk.aovE)[["Error: Within"]][[1]]

          Df  Sum Sq Mean Sq F value   Pr(>F)
N          1 189.282 189.282 12.2587 0.004372 **
P          1   8.402   8.402  0.5441 0.474904
K          1  95.202  95.202  6.1657 0.028795 *
N:P        1  21.282  21.282  1.3783 0.263165
N:K        1  33.135  33.135  2.1460 0.168648
P:K        1   0.482   0.482  0.0312 0.862752
Residuals 12 185.287  15.441
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Gives you just the second part of the output as a data frame. Now, use:

> summary(npk.aovE)[["Error: Within"]][[1]]["N:P" , "F value"]
[1] 1.378297

and

> summary(npk.aovE)[["Error: Within"]][[1]]["N:P" , "Pr(>F)"]
[1] 0.2631653

To get the F statistic and p value, respectively.

So, probably in your aov models, you could use something like:

summary(anova.1)[["Error: S:C""][[1]]["C", "F value"]

and 

summary(anova.1)[["Error: S:C""][[1]]["C", "Pr(>F)"]

to get what you want.

If it helps to keep things straight, you could assign the intermediate
extraction to a data frame and then subset if from there. So something
like:

df <- summary(anova.1)[["Error: S:C""][[1]]
df["C", "F value"]

should probably work also.

HTH,

Marc Schwartz
< Now I need a wee bit of single malt after that... ;-) >