[R] why NA coefficients

Tue Nov 8 20:32:29 CET 2011

It might make the discussion easier to follow if you used
a smaller dataset that anyone can make and did some experiments
with contrasts. E.g.,

> D <- data.frame(expand.grid(X1=LETTERS[1:3], X2=letters[24:26])[-1,], Y=2^(1:8))
> D
  X1 X2   Y
2  B  x   2
3  C  x   4
4  A  y   8
5  B  y  16
6  C  y  32
7  A  z  64
8  B  z 128
9  C  z 256
> lm(data=D, Y ~ X1 * X2)

Call:
lm(formula = Y ~ X1 * X2, data = D)

Coefficients:
(Intercept)          X1B          X1C  
       -188          190          192  
        X2y          X2z      X1B:X2y  
        196          252         -182  
    X1C:X2y      X1B:X2z      X1C:X2z  
       -168         -126           NA  

> lm(data=D, Y ~ X1 * X2, contrasts=list(X2="contr.SAS"))

Call:
lm(formula = Y ~ X1 * X2, data = D, contrasts = list(X2 = "contr.SAS"))

Coefficients:
(Intercept)          X1B          X1C  
         64           64          192  
        X2x          X2y      X1B:X2x  
       -252          -56          126  
    X1C:X2x      X1B:X2y      X1C:X2y  
         NA          -56         -168  

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of array chip
> Sent: Tuesday, November 08, 2011 10:57 AM
> To: Dennis Murphy
> Cc: r-help at r-project.org
> Subject: Re: [R] why NA coefficients
> 
> Hi Dennis, Thanks very much for the details. All those explanations about non-estimable mu_{12} when
> it has no data make sense to me!
> 
> Regarding my specific example data where mu_{12} should NOT be estimable in a linear model with
> interaction because it has no data, yet the linear model I created by using lm() in R still CAN
> estimate the mean mu_{12}, while on the other hand, mu_{72} is instead NOT estimable from lm() even
> this category does have data. Does this contradiction to the theory imply that the linear model by
> lm() in R on my specific example data is NOT reliable/trustable and should not be used?
> 
> Thanks
> 
> John
> 
> 
> 
> ________________________________
> From: Dennis Murphy <djmuser at gmail.com>
> 
> Cc: "r-help at r-project.org" <r-help at r-project.org>
> Sent: Tuesday, November 8, 2011 10:22 AM
> Subject: Re: [R] why NA coefficients
> 
> The cell mean mu_{12} is non-estimable because it has no data in the
> cell. How can you estimate something that's not there (at least
> without imputation :)? Every parametric function that involves mu_{12}
> will also be non-estimable - in particular,  the interaction term and
> the population marginal means . That's why you get the NA estimates
> and the warning. All this follows from the linear model theory
> described in, for example, Milliken and Johnson (1992), Analysis of
> Messy Data, vol. 1, ch. 13.
> 
> Here's an example from Milliken and Johnson (1992) to illustrate:
>           B1         B2       B3
> T1      2, 6                   8, 6
> T2        3          14      12, 9
> T3        6           9
> 
> Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means
> are estimated by the cell averages.
> 
> From M & J (p. 173, whence this example is taken):
> "Whenever treatment combinations are missing, certain
> hypotheses cannot be tested without making some
> additional assumptions about the parameters in the model.
> Hypotheses involving parameters corresponding to the
> missing cells generally cannot be tested. For example,
> for the data [above] it is not possible to estimate any
> linear combinations (or to test any hypotheses) that
> involve parameters \mu_{12} and \mu_{33} unless one
> is willing to make some assumptions about them."
> 
> They continue:
> "One common assumption is that there is no
> interactions between the levels of T and the levels of B.
> In our opinion, this assumption should not be made
> without some supporting experimental evidence."
> 
> In other words, removing the interaction term makes the
> non-estimability problem disappear, but it's a copout unless there is
> some tangible scientific justification for an additive rather than an
> interaction model.
> 
> For the above data, M & J note that it is not possible to estimate all
> of the expected marginal means - in particular, one cannot estimate
> the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$,
> $\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and
> $\bar{\mu}_{.1}$ since these functions of the parameters involve terms
> associated with the means of the missing cells. Moreover, any
> hypotheses involving parametric functions that contain non-estimable
> cell means are not testable. In this example, the test of equal row
> population marginal means is not testable because $\bar{\mu}_{1.}$ and
> $\bar{\mu}_{3.}$ are not estimable.
> 
> [Aside: if the term parametric function is not familiar, in this
> context it refers to linear combinations of model parameters.  In the
> M & J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a
> parametric function.]
> 
> Hopefully this sheds some light on the situation.
> 
> Dennis
> 
> 
> > Hi Dennis,
> > The cell mean mu_12 from the model involves the intercept and factor 2:
> > Coefficients:
> >                   (Intercept)                 factor(treat)2
> > factor(treat)3
> >                      0.429244
> > 0.499982                       0.352971
> >                factor(treat)4                 factor(treat)5
> > factor(treat)6
> >                     -0.204752
> > 0.142042                       0.044155
> >                factor(treat)7                 factor(group)2
> > factor(treat)2:factor(group)2
> >                     -0.007775
> > -0.337907                      -0.208734
> > factor(treat)3:factor(group)2  factor(treat)4:factor(group)2
> > factor(treat)5:factor(group)2
> >                     -0.195138
> > 0.800029                       0.227514
> > factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
> >                      0.331548                             NA
> > So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by:
> >> predict(fit,data.frame(list(treat=1,group=2)))
> >          1
> > 0.09133691
> > Warning message:
> > In predict.lm(fit, data.frame(list(treat = 1, group = 2))) :
> >   prediction from a rank-deficient fit may be misleading
> >
> > But as you can see, it gave a warning about rank-deficient fit... why this
> > is a rank-deficient fit?
> > Because "treat 1_group 2" has no cases, so why it is still estimable while
> > on the contrary, "treat 7_group 2" which has 2 cases is not?
> > Thanks
> > John
> >
> >
> >
> >
> > ________________________________
> > From: Dennis Murphy <djmuser at gmail.com>
> 
> > Sent: Monday, November 7, 2011 9:29 PM
> > Subject: Re: [R] why NA coefficients
> >
> > Hi John:
> >
> > What is the estimate of the cell mean \mu_{12}? Which model effects
> > involve that cell mean? With this data arrangement, the expected
> > population marginal means of treatment 1 and group 2 are not estimable
> > either, unless you're willing to assume a no-interaction model.
> >
> > Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data
> > (vol. 1) cover this topic in some detail, but it assumes you're
> > familiar with the matrix form of a linear statistical model. Both
> > chapters cover the two-way model with interaction - Ch.13 from the
> > cell means model approach and Ch. 14 from the model effects approach.
> > Because this was written in the mid 80s and republished in the early
> > 90s, all the code used is in SAS.
> >
> > HTH,
> > Dennis
> >
> 
> >> Thanks David. The only category that has no cases is "treat 1-group 2":
> >>
> >>> with(test,table(treat,group))
> >>      group
> >> treat 1 2
> >>     1 8 0
> >>     2 1 5
> >>     3 5 5
> >>     4 7 3
> >>     5 7 4
> >>     6 3 3
> >>     7 8 2
> >>
> >> But why the coefficient for "treat 7-group 2" is not estimable?
> >>
> >> Thanks
> >>
> >> John
> >>
> >>
> >>
> >>
> >> ________________________________
> >> From: David Winsemius <dwinsemius at comcast.net>
> >>
> >> Cc: "r-help at r-project.org" <r-help at r-project.org>
> >> Sent: Monday, November 7, 2011 5:13 PM
> >> Subject: Re: [R] why NA coefficients
> >>
> >>
> >> On Nov 7, 2011, at 7:33 PM, array chip wrote:
> >>
> >>> Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat
> >>> has 7 levels, group has 2 levels). I found the coefficient for the last
> >>> interaction term is always 0, see attached dataset and the code below:
> >>>
> >>>> test<-read.table("test.txt",sep='\t',header=T,row.names=NULL)
> >>>> lm(y~factor(treat)*factor(group),test)
> >>>
> >>> Call:
> >>> lm(formula = y ~ factor(treat) * factor(group), data = test)
> >>>
> >>> Coefficients:
> >>>                   (Intercept)                 factor(treat)2
> >>>    factor(treat)3
> >>>                      0.429244                       0.499982
> >>>          0.352971
> >>>                factor(treat)4                 factor(treat)5
> >>>    factor(treat)6
> >>>                     -0.204752                       0.142042
> >>>          0.044155
> >>>                factor(treat)7                 factor(group)2
> >>> factor(treat)2:factor(group)2
> >>>                     -0.007775                      -0.337907
> >>>         -0.208734
> >>> factor(treat)3:factor(group)2  factor(treat)4:factor(group)2
> >>> factor(treat)5:factor(group)2
> >>>                     -0.195138                       0.800029
> >>>          0.227514
> >>> factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
> >>>                      0.331548                             NA
> >>>
> >>>
> >>> I guess this is due to model matrix being singular or collinearity among
> >>> the matrix columns? But I can't figure out how the matrix is singular in
> >>> this case? Can someone show me why this is the case?
> >>
> >> Because you have no cases in one of the crossed categories.
> >>
> >> --David Winsemius, MD
> >> West Hartford, CT
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> >
> 	[[alternative HTML version deleted]]