[R] How to read ANOVA output

Thu Aug 19 11:45:32 CEST 2010

----- Original Message ----

From: "Ted.Harding at manchester.ac.uk" <Ted.Harding at manchester.ac.uk>
To: r-help at r-project.org
Cc: Stephen Liu <satimis at yahoo.com>
Sent: Wed, August 18, 2010 4:41:11 PM
Subject: RE: [R] How to read ANOVA output

Hi Ted,

Thanks for your advice.

- snip -

>You need to understand how that works (basic
>statistical theory) before even thinking of looking at the
>Tukey thing (omitted in this reply).

I have been googling a while.  There were many documents discovered.  I wonder 
where shall I start?  Which direction shall I choose?  Could you please shed me 
some hints.  TIA

I found follows;

Basic Inferential Statistics: Theory and Application
http://owl.english.purdue.edu/owl/resource/672/05/

Basic Statistics-I
http://works.bepress.com/durgesh_chandra_pathak/10/
file download
basic_Statistics-I-fulltext.pdf

>The following is an explanation of your 1-way ANOVA written
>entirely in R (preceded by a duplicate of your ANOVA output):

Performed following steps:-

## anova(lm(values ~ ind, data = tablets))
## Analysis of Variance Table
## Response: values
##          Df    Sum Sq   Mean Sq   F value      Pr(>F)
## ind       2   2.05787   1.02893    45.239   2.015e-05 ***
## Residuals 9   0.20470   0.02274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

tabA = c(5.67, 5.67, 5.55, 5.57)
tabB = c(5.75, 5.47, 5.43, 5.45)
tabC = c(4.74, 4.45, 4.65, 4.94)

nA <- length(tabA) ; nB <- length(tabB) ; nC <- length(tabC)
nG <- nA + nB + nC
> nG
[1] 12

mG <- mean(c(tabA,tabB,tabC))
mA <- mean(tabA) ; mB <- mean(tabB) ; mC <- mean(tabC)
SSres <- sum((tabA-mA)^2) + sum((tabB-mB)^2) + sum((tabC-mC)^2)
SSres # = 0.2047
[1] 0.2047

( I suppose - ^2 here means a raised to the power of 2) ??
( SSres is the sum of squares residual (or sum of squares error it is sometimes 
called), which is the variation in the dependent variable that is not predicted 
by the model. Adding the SSreg to the SSres gives the SStotal, which represents 
how much variation there is in the data overall) ??

SSeff <- nA*(mA-mG)^2 + nB*(mB-mG)^2 + nC*(mC-mG)^2
SSeff # = 2.057867
[1] 2.057867

(What does SSeff refer to here)??

## Number of groups = 3 hence df.groups = (3-1) = 2

(?df
Description:

     Density, distribution function, quantile function and random
     generation for the F distribution with ‘df1’ and ‘df2’ degrees of
     freedom (and optional non-centrality parameter ‘ncp’).

What does df refer here?  
) ??

df.groups <- 2
meanSSeff <- SSeff/df.groups
meanSSeff # = 1.028933
[1] 0.02274444

## df for residuals in each group = (n.group - 1):
df.res <- (nA-1) + (nB-1) + (nC-1)  ## = 3 + 3 + 3 = 9
meanSSres <- SSres/df.res
meanSSres # = 0.02274444
[1] 0.02274444

## Fisher's F-ratio statistic = meanSSeff/meanSSres:
F <- meanSSeff/meanSSres
F         # = 45.23889
[1] 45.23889

(Fisher's F-ratio
F-test ???
http://en.wikipedia.org/wiki/F-test
)

## P-value for F as test of difference between group means
## relative to within-group residuals (upper tail):
Pval <- pf(F, df.groups, df.res, lower.tail=FALSE)
Pval      # = 2.015227e-05
[1] 2.015227e-05

(The P-values for the Popular Distributions
http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/pvalues.htm
) ??

If I'm wrong please correct me.  TIA

B.R.
Stephen