[R] Interpreting summary.lm for a 2 factor anova

Fox, John jfox at mcmaster.ca
Sat Dec 3 14:45:55 CET 2016

```Dear Ashim,

Sorry to chime in late, and my apologies if someone has already pointed this out, but here's the relationship between the cell means and the model coefficients, using the row-basis of the model matrix:

-------------------------- snip ------------------------

> means <- with( warpbreaks, tapply( breaks, interaction(wool, tension), mean ) )
> x.A <- rep(c(0, 1), 3)
> x.B1 <- rep(c(0, 1, 0), each=2)
> x.B2 <- rep(c(0, 0, 1), each=2)
> x.AB1 <- x.A*x.B1
> x.AB2 <- x.A*x.B2
> X.basis <- cbind(1, x.A, x.B1, x.B2, x.AB1, x.AB2)
> X.basis
x.A x.B1 x.B2 x.AB1 x.AB2
[1,] 1   0    0    0     0     0
[2,] 1   1    0    0     0     0
[3,] 1   0    1    0     0     0
[4,] 1   1    1    0     1     0
[5,] 1   0    0    1     0     0
[6,] 1   1    0    1     0     1
> solve(X.basis, means)
x.A      x.B1      x.B2     x.AB1     x.AB2
44.55556 -16.33333 -20.55556 -20.00000  21.11111  10.55556
> coef(aov(breaks ~ wool * tension, data = warpbreaks))
(Intercept)          woolB       tensionM       tensionH woolB:tensionM
44.55556      -16.33333      -20.55556      -20.00000       21.11111
woolB:tensionH
10.55556

-------------------------- snip ------------------------

I hope this helps,
John

-----------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario
Web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashim Kapoor
> Sent: December 3, 2016 12:19 AM
> To: David Winsemius <dwinsemius at comcast.net>
> Cc: r-help at r-project.org
> Subject: Re: [R] Interpreting summary.lm for a 2 factor anova
>
> Please allow me to rephrase myquery.
>
> > model.tables(model,"m")
> Tables of means
> Grand mean
>
> 28.14815
>
>  wool
> wool
>      A      B
> 31.037 25.259
>
>  tension
> tension
>     L     M     H
> 36.39 26.39 21.67
>
>  wool:tension
>     tension
> wool L     M     H
>    A 44.56 24.00 24.56
>    B 28.22 28.78 18.78
> >
>
>
> The above is the same as :
>
> with( warpbreaks, tapply( breaks, interaction(wool, tension), mean ) )
>      A.L      B.L      A.M      B.M      A.H      B.H
> 44.55556 28.22222 24.00000 28.77778 24.55556 18.77778
>
> For reference:
>
> > model <- aov(breaks ~ wool * tension, data = warpbreaks)
> > summary.lm(model)
>
> Call:
> aov(formula = breaks ~ wool * tension, data = warpbreaks)
>
> Residuals:
>      Min       1Q   Median       3Q      Max
> -19.5556  -6.8889  -0.6667   7.1944  25.4444
>
> Coefficients:
>                Estimate Std. Error t value Pr(>|t|)
> (Intercept)      44.556      3.647  12.218 2.43e-16 ***
> woolB           -16.333      5.157  -3.167 0.002677 **
> tensionM        -20.556      5.157  -3.986 0.000228 ***
> tensionH        -20.000      5.157  -3.878 0.000320 ***
> woolB:tensionM   21.111      7.294   2.895 0.005698 **
> woolB:tensionH   10.556      7.294   1.447 0.154327
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 10.94 on 48 degrees of freedom
> Multiple R-squared:  0.3778,    Adjusted R-squared:  0.3129
> F-statistic: 5.828 on 5 and 48 DF,  p-value: 0.0002772
>
>
> Now I'll explain what is confusing me in the output of summary.lm.
>
> Coeff of Intercept = 44.556  = cell mean for A.L. This is the base.
>
> Coeff of woolB:L = -16.333 = 28.22222 - 44.556. This is the difference of this
> cell mean(B:L) from the base.
>
> Coeff of woolA:tensionM = -20.556  = 24.000- 44.556. This is the difference of
> this cell mean (A:M)  from the base.
>
> Coeff of woolA:tensionH = -20.000  = 24.55556 - 44.556. This is the difference
> of this cell mean(A:H) from the base.
>
> This is where it stops being the difference from the base.
>
> Coeff of woolB:tensionM = 21.111 should turn out to be 28.77778 - 44.556 but
> this is -15.77822
>
> Coeff of woolB:tensionH = 10.556 should turn out to be  18.77778 - 44.556 but
> this is -25.77822
>
> In the above 2 cases, we can't say that the coefficient = cell mean - base case.
> Can you tell me what should be the statement to be made ?
>
>
> Best Regards,
> Ashim
>
> PS : My apologies for emailing my query to this list. Can you tell me the names
> of a few (active) statistics help list ?
>
> On Sat, Dec 3, 2016 at 1:33 AM, David Winsemius <dwinsemius at comcast.net>
> wrote:
>
> >
> > > On Dec 2, 2016, at 9:09 AM, David Winsemius <dwinsemius at comcast.net>
> > wrote:
> > >
> > >>
> > >> On Dec 2, 2016, at 6:16 AM, Ashim Kapoor <ashimkapoor at gmail.com>
> wrote:
> > >>
> > >> Dear Pikal,
> > >>
> > >> All levels except the interactions are compared to the Intercept.
> > >> I'm a little confused as to what's going on in interaction terms
> > >> eg. the cell wool B : tension M. It's mean is :
> > >> 28.78 and 28.78 - 44.56 = -15.78 != 21.111.
> > >>
> > >> It's something like 44.56 (intercept) -16.333 (wool B) -.20.556
> > >> (tension
> > >> M)  + 21.111 (woolB:tensionM) = 28.782.
> > >>
> > >> I don't know how to sum up the above line in terms of differences
> > >> succinctly.
> > >
> > > The aov estimate will not exactly equal the observed mean (this is
> > _statistics_ after all). You should be comparing the mean of that cell
> > to the estimate:
> > >
> > > 44.556 + (-16.33) +(-20.556) + (21.11)
> >
> > A respected participant advised me to look at this more closely. In
> > this case (and I think in most such cases)  where there are the same
> > number of parameters as there are means, the model is "saturated" and
> > there is no
> > difference:
> >
> >  with( warpbreaks, tapply( breaks, interaction(wool, tension), mean ) )
> >      A.L      B.L      A.M      B.M      A.H      B.H
> > 44.55556 28.22222 24.00000 28.77778 24.55556 18.77778
> >
> > So the B:M estimate is identical up to rounding with the observed mean:
> >
> >  44.556 + (-16.33) +(-20.556) + (21.11) [1] 28.78
> >
> >
> >
> > >
> > > The difference between the observed mean and the estimated mean is
> known
> > as a 'residual'
> >
> > I've also been privately but gently chided for this misstatement.
> > Residuals are the difference between data and estimates.
> >
> > > and the squared sum of the all residuals is what this being minimized
> > ... over all the cells including the one implicitly associated with the
> > Intercept.
> > >
> > > This isn't really on-topic for Rhelp since you are not having difficulty
> > in getting the R program to perform its duties, but are rather in need of
> > statistical education. That not what this mailing list is set up for.
> > >
> > > --
> > > David.
> > >
> > >>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashim
> > >>>> Kapoor
> > >>>> Sent: Thursday, December 1, 2016 2:48 PM
> > >>>> To: r-help at r-project.org
> > >>>> Subject: [R] Interpreting summary.lm for a 2 factor anova
> > >>>>
> > >>>> Dear all,
> > >>>>
> > >>>> Here is a small example : -
> > >>>>
> > >>>>> model <- aov(breaks ~ wool * tension, data = warpbreaks)
> > >>>>> summary.lm(model)
> > >>>>
> > >>>> Call:
> > >>>> aov(formula = breaks ~ wool * tension, data = warpbreaks)
> > >>>>
> > >>>> Residuals:
> > >>>>    Min       1Q   Median       3Q      Max
> > >>>> -19.5556  -6.8889  -0.6667   7.1944  25.4444
> > >>>>
> > >>>> Coefficients:
> > >>>>              Estimate Std. Error t value Pr(>|t|)
> > >>>> (Intercept)      44.556      3.647  12.218 2.43e-16 ***
> > >>>> woolB           -16.333      5.157  -3.167 0.002677 **
> > >>>> tensionM        -20.556      5.157  -3.986 0.000228 ***
> > >>>> tensionH        -20.000      5.157  -3.878 0.000320 ***
> > >>>> woolB:tensionM   21.111      7.294   2.895 0.005698 **
> > >>>> woolB:tensionH   10.556      7.294   1.447 0.154327
> > >>>> ---
> > >>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >>>>
> > >>>> Residual standard error: 10.94 on 48 degrees of freedom
> > >>>> Multiple R-squared:  0.3778,    Adjusted R-squared:  0.3129
> > >>>> F-statistic: 5.828 on 5 and 48 DF,  p-value: 0.0002772
> > >>>>
> > >>>>> model.tables(model,"e")
> > >>>> Tables of effects
> > >>>>
> > >>>> wool
> > >>>> wool
> > >>>>     A       B
> > >>>> 2.8889 -2.8889
> > >>>>
> > >>>> tension
> > >>>> tension
> > >>>>    L      M      H
> > >>>> 8.241 -1.759 -6.481
> > >>>>
> > >>>> wool:tension
> > >>>>   tension
> > >>>> wool L      M      H
> > >>>>  A  5.278 -5.278  0.000
> > >>>>  B -5.278  5.278  0.000
> > >>>>
> > >>>>
> > >>>>> model.tables(model,"m")
> > >>>> Tables of means
> > >>>> Grand mean
> > >>>>
> > >>>> 28.14815
> > >>>>
> > >>>> wool
> > >>>> wool
> > >>>>    A      B
> > >>>> 31.037 25.259
> > >>>>
> > >>>> tension
> > >>>> tension
> > >>>>   L     M     H
> > >>>> 36.39 26.39 21.67
> > >>>>
> > >>>> wool:tension
> > >>>>   tension
> > >>>> wool L     M     H
> > >>>>  A 44.56 24.00 24.56
> > >>>>  B 28.22 28.78 18.78
> > >>>>>
> > >>>>
> > >>>> I don't follow the output of summary.lm. I understand the output of
> > >>>> model.tables for effects and means. For instance what does 44.556
> > >>>> represent ? Is it the grand average ? The grand mean is 28.14815. Can
> > >>>> someone help me understand the output of summary.lm ?
> > >>>>
> > >>>> Best Regards,
> > >>>> Ashim
> > >>>>
> > >>>>     [[alternative HTML version deleted]]
> > >>>>
> > >>>> ______________________________________________
> > >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>> guide.html
> > >>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> > >>> ________________________________
> > >>> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> > >>> určeny pouze jeho adresátům.
> > >>> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> > >>> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
> > kopie
> > >>> vymažte ze svého systému.
> > >>> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> > email
> > >>> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> > >>> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou
> > modifikacemi
> > >>> či zpožděním přenosu e-mailu.
> > >>>
> > >>> V případě, že je tento e-mail součástí obchodního jednání:
> > >>> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> > >>> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> > >>> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně
> > přijmout;
> > >>> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> > >>> příjemce s dodatkem či odchylkou.
> > >>> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> > >>> výslovným dosažením shody na všech jejích náležitostech.
> > >>> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> > >>> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně
> > zmocněn
> > >>> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi
> > tohoto
> > >>> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> > >>> existence je adresátovi či osobě jím zastoupené známá.
> > >>>
> > >>> This e-mail and any documents attached to it may be confidential and
> > are
> > >>> intended only for its intended recipients.
> > >>> If you received this e-mail by mistake, please immediately inform its
> > >>> sender. Delete the contents of this e-mail with all attachments and its
> > >>> copies from your system.
> > >>> If you are not the intended recipient of this e-mail, you are not
> > >>> authorized to use, disseminate, copy or disclose this e-mail in any
> > manner.
> > >>> The sender of this e-mail shall not be liable for any possible damage
> > >>> caused by modifications of the e-mail or by delay with transfer of the
> > >>> email.
> > >>>
> > >>> In case that this e-mail forms part of business dealings:
> > >>> - the sender reserves the right to end negotiations about entering
> > into a
> > >>> contract in any time, for any reason, and without stating any
> > reasoning.
> > >>> - if the e-mail contains an offer, the recipient is entitled to
> > >>> immediately accept such offer; The sender of this e-mail (offer)
> > excludes
> > >>> any acceptance of the offer on the part of the recipient containing any
> > >>> amendment or variation.
> > >>> - the sender insists on that the respective contract is concluded only
> > >>> upon an express mutual agreement on all its aspects.
> > >>> - the sender of this e-mail informs that he/she is not authorized to
> > enter
> > >>> into any contracts on behalf of the company except for cases in which
> > >>> he/she is expressly authorized to do so in writing, and such
> > authorization
> > >>> or power of attorney is submitted to the recipient or the person
> > >>> represented by the recipient, or the existence of such authorization is
> > >>> known to the recipient of the person represented by the recipient.
> > >>>
> > >>
> > >>      [[alternative HTML version deleted]]
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >
> > > David Winsemius
> > > Alameda, CA, USA
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help