[R] missing level of a nested factor results in an NA in lm output

Timothy Clough tclough at purdue.edu
Sun Sep 20 20:06:05 CEST 2009


Hello All,

I have posted to this list before regarding the same issue so I  
apologize for the multiple e-mails.  I am still struggling with this  
issue so I thought I'd give it another try.  This time I have included  
reproducible code and a subset of the data I am analyzing.

I am running an ANOVA with three factors:  GROUP (5 levels), FEATURE  
(2 levels), and PATIENT (2 levels), where PATIENT is nested within  
GROUP.  I am interested in estimating various linear functions of the  
model coefficients (which I sometimes refer to as 'contrasts' below).   
An example of the data can be set up using the following code:

example <- data.frame(
		ABUNDANCE = rnorm(30, 12),
		FEATURE = factor(rep(c(3218, 4227, 6374), 10)),
		GROUP = factor(rep(c(0, 1, 2, 3, 4), 6)),
		PATIENT = factor(rep(c(1, 2), 15))
)

I am using the lm function to run the model as shown.

fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT,  
example)
summary(fit)

The output of this code is below.

 > fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/ 
PATIENT, example)
 > summary(fit)

Call:
lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/ 
PATIENT,
     data = example)

Residuals:
        Min         1Q     Median         3Q        Max
-5.510e-01 -1.961e-01  3.469e-17  1.961e-01  5.510e-01

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)         12.6487     0.4037  31.331 2.58e-11 ***
FEATURE4227         -0.3271     0.4944  -0.661  0.52325
FEATURE6374         -1.0743     0.4944  -2.173  0.05492 .
GROUP1              -1.2641     0.5709  -2.214  0.05120 .
GROUP2              -0.2359     0.5709  -0.413  0.68825
GROUP3              -1.3081     0.5709  -2.291  0.04492 *
GROUP4              -0.5867     0.5709  -1.028  0.32836
FEATURE4227:GROUP1   1.5651     0.6993   2.238  0.04915 *
FEATURE6374:GROUP1   1.1435     0.6993   1.635  0.13302
FEATURE4227:GROUP2   0.4372     0.6993   0.625  0.54577
FEATURE6374:GROUP2   0.6728     0.6993   0.962  0.35867
FEATURE4227:GROUP3   1.0135     0.6993   1.449  0.17786
FEATURE6374:GROUP3   2.2665     0.6993   3.241  0.00885 **
FEATURE4227:GROUP4   1.3278     0.6993   1.899  0.08679 .
FEATURE6374:GROUP4   0.5610     0.6993   0.802  0.44103
GROUP0:PATIENT2     -0.5569     0.4037  -1.379  0.19785
GROUP1:PATIENT2     -0.1104     0.4037  -0.273  0.79014
GROUP2:PATIENT2     -0.9702     0.4037  -2.403  0.03712 *
GROUP3:PATIENT2     -0.1400     0.4037  -0.347  0.73586
GROUP4:PATIENT2     -0.5947     0.4037  -1.473  0.17147
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4944 on 10 degrees of freedom
Multiple R-squared: 0.8004,     Adjusted R-squared: 0.4211
F-statistic:  2.11 on 19 and 10 DF,  p-value: 0.1133


I then use the estimable function to estimate a linear combination of  
the parameter estimates.

myEstimate <- cbind(
			'(Intercept)' = 1,
			'GROUP1' = 1,
			'FEATURE4227:GROUP1' = 0.5,
			'FEATURE6374:GROUP1' = 0.5,
			'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"

 > estimable(fit, myEstimate)
      Estimate Std. Error  t value DF     Pr(>|t|)
test 12.18198  0.6694812 18.19615 10 5.395944e-09


I am able to get the t-statistic and associated p-value for the  
contrast as desired.  However, I sometimes have a case where there is  
a missing patient within one of the groups.  To give an example of  
this situation, I remove patient 2 from group 4 and perform the same  
analysis.

example2 <- example[!(example$GROUP == 4 & example$PATIENT == 2),]

 > fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/ 
PATIENT, example2)
 > summary(fit)

Call:
lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/ 
PATIENT,
     data = example2)

Residuals:
        Min         1Q     Median         3Q        Max
-5.510e-01 -2.084e-01  6.099e-20  2.084e-01  5.510e-01

Coefficients: (1 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)         12.6487     0.4442  28.475  2.5e-09 ***
FEATURE4227         -0.3271     0.5440  -0.601   0.5644
FEATURE6374         -1.0743     0.5440  -1.975   0.0837 .
GROUP1              -1.2641     0.6282  -2.012   0.0790 .
GROUP2              -0.2359     0.6282  -0.375   0.7171
GROUP3              -1.3081     0.6282  -2.082   0.0709 .
GROUP4              -0.4570     0.7023  -0.651   0.5335
FEATURE4227:GROUP1   1.5651     0.7694   2.034   0.0764 .
FEATURE6374:GROUP1   1.1435     0.7694   1.486   0.1755
FEATURE4227:GROUP2   0.4372     0.7694   0.568   0.5854
FEATURE6374:GROUP2   0.6728     0.7694   0.874   0.4074
FEATURE4227:GROUP3   1.0135     0.7694   1.317   0.2242
FEATURE6374:GROUP3   2.2665     0.7694   2.946   0.0185 *
FEATURE4227:GROUP4   1.2146     0.9423   1.289   0.2334
FEATURE6374:GROUP4   0.2850     0.9423   0.302   0.7700
GROUP0:PATIENT2     -0.5569     0.4442  -1.254   0.2454
GROUP1:PATIENT2     -0.1104     0.4442  -0.248   0.8101
GROUP2:PATIENT2     -0.9702     0.4442  -2.184   0.0605 .
GROUP3:PATIENT2     -0.1400     0.4442  -0.315   0.7606
GROUP4:PATIENT2          NA         NA      NA       NA
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.544 on 8 degrees of freedom
Multiple R-squared: 0.7852,     Adjusted R-squared: 0.3019
F-statistic: 1.625 on 18 and 8 DF,  p-value: 0.2462


As you can see there are now NAs in the row corresponding to the  
removed patient.  I still want to estimate my contrast so I run the  
same code as before, but I receive the following error messages:

myEstimate <- cbind(
			'(Intercept)' = 1,
			'GROUP1' = 1,
			'FEATURE4227:GROUP1' = 0.5,
			'FEATURE6374:GROUP1' = 0.5,
			'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"

 > estimable(fit, myEstimate)
Error in estimable.default(fit, myEstimate) :
  Dimension of structure(c(1, 0, 0, 1, 0, 0, 0, 0.5, 0.5, 0, 0, 0, 0,  
0, 0, : 1x20, not compatible with no of parameters in fit: 19
  Dimension of 1, 0, 0, 0, 0), .Dim = c(1L, 20L), .Dimnames =  
list("test", c("(Intercept)", : 1x20, not compatible with no of  
parameters in fit: 19
  Dimension of "FEATURE4227", "FEATURE6374", "GROUP1", "GROUP2",  
"GROUP3", "GROUP4", : 1x20, not compatible with no of parameters in  
fit: 19
  Dimension of "FEATURE4227:GROUP1", "FEATURE6374:GROUP1",  
"FEATURE4227:GROUP2", : 1x20, not compatible with no of parameters in  
fit: 19
  Dimension of "FEATURE6374:GROUP2", "FEATURE4227:GROUP3",  
"FEATURE6374:GROUP3", : 1x20, not compatible with no of parameters in  
fit: 19
  Dimension of "FEATURE4227:GROUP4", "FEATURE6374:GROUP4",  
"GROUP0:PATIENT2", : 1x20, not compatible with no of parameters in  
fit: 19
  Dimension of "GROUP1:PATIENT2", "GROUP2:PATIENT2",  
"GROUP3:PATIENT2", "GROUP4:PATIENT2": 1x20, not compatible with no of  
parameters in fit: 19
  Dimension of ))): 1x2

Apparently the NA in the parameter estimate table is preventing the  
estimation of the contrast.  I have tried multiple alternatives in the  
coding of the estimable statement, but all give me errors similar to  
the one above.

I think the problem is that R expects to see each patient within each  
group and when it doesn't see patient 2 in group 4, it yields an NA  
parameter estimate for this combination.  That's fine - the results of  
the other parameter estimates are still correct.   However it's  
preventing me from estimating a linear function of the model  
coefficients, and I know that there must be a way to do this even in  
the presence of the missing combination.

Has anyone run into this problem before?  Any input would be  
appreciated.  Sorry for the lengthy message.

Tim




More information about the R-help mailing list