[R] Trouble about the interpretation of intercept in lm models

Tue Jan 13 18:25:57 CET 2009

Marc Schwartz wrote:

> 
>> DF.fitted
>           Y A B     F.lm
> 1  21.86773 0 a 23.52957
> 2  25.91822 0 a 23.52957
> 3  20.82186 0 a 23.52957
> 4  42.97640 1 a 36.18023
> 5  36.64754 1 a 36.18023
> 6  30.89766 1 a 36.18023
> 7  47.43715 0 b 46.50615
> 8  48.69162 0 b 46.50615
> 9  47.87891 0 b 46.50615
> 10 53.47306 1 b 59.15681
> 11 62.55891 1 b 59.15681
> 12 56.94922 1 b 59.15681
> 13 61.89380 0 c 62.98442
> 14 53.92650 0 c 62.98442
> 15 70.62465 0 c 62.98442
> 16 74.77533 1 c 75.63508
> 17 74.91905 1 c 75.63508
> 18 79.71918 1 c 75.63508
> 
> 
> # Now get the means of the fitted values across
> # the combinations of A and B
> M <- with(DF.fitted, tapply(F.lm, list(A = A, B = B), mean))
> 
>> M
>    B
> A          a        b        c
>   0 23.52957 46.50615 62.98442
>   1 36.18023 59.15681 75.63508
> 
> 
> Thus:
> 
> # Intercept = *fitted* mean at A = 0; B = "a"
>> M["0", "a"]
> [1] 23.52957

Actually, notice that you are averaging identical values, so the "mean"
in the tapply is slightly misleading.

Notice also that the intercept may be defined even when _no_
observations have zero entries in the design matrix. This is the usual
case in linear regression, for instance, but it can happen in factorial
designs (unbalanced, or using other than treatment contrasts) as well.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907