[R] missing level of a nested factor results in an NA in lm output
Timothy Clough
tclough at purdue.edu
Sun Sep 20 20:06:05 CEST 2009
Hello All,
I have posted to this list before regarding the same issue so I
apologize for the multiple e-mails. I am still struggling with this
issue so I thought I'd give it another try. This time I have included
reproducible code and a subset of the data I am analyzing.
I am running an ANOVA with three factors: GROUP (5 levels), FEATURE
(2 levels), and PATIENT (2 levels), where PATIENT is nested within
GROUP. I am interested in estimating various linear functions of the
model coefficients (which I sometimes refer to as 'contrasts' below).
An example of the data can be set up using the following code:
example <- data.frame(
ABUNDANCE = rnorm(30, 12),
FEATURE = factor(rep(c(3218, 4227, 6374), 10)),
GROUP = factor(rep(c(0, 1, 2, 3, 4), 6)),
PATIENT = factor(rep(c(1, 2), 15))
)
I am using the lm function to run the model as shown.
fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT,
example)
summary(fit)
The output of this code is below.
> fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT, example)
> summary(fit)
Call:
lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT,
data = example)
Residuals:
Min 1Q Median 3Q Max
-5.510e-01 -1.961e-01 3.469e-17 1.961e-01 5.510e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.6487 0.4037 31.331 2.58e-11 ***
FEATURE4227 -0.3271 0.4944 -0.661 0.52325
FEATURE6374 -1.0743 0.4944 -2.173 0.05492 .
GROUP1 -1.2641 0.5709 -2.214 0.05120 .
GROUP2 -0.2359 0.5709 -0.413 0.68825
GROUP3 -1.3081 0.5709 -2.291 0.04492 *
GROUP4 -0.5867 0.5709 -1.028 0.32836
FEATURE4227:GROUP1 1.5651 0.6993 2.238 0.04915 *
FEATURE6374:GROUP1 1.1435 0.6993 1.635 0.13302
FEATURE4227:GROUP2 0.4372 0.6993 0.625 0.54577
FEATURE6374:GROUP2 0.6728 0.6993 0.962 0.35867
FEATURE4227:GROUP3 1.0135 0.6993 1.449 0.17786
FEATURE6374:GROUP3 2.2665 0.6993 3.241 0.00885 **
FEATURE4227:GROUP4 1.3278 0.6993 1.899 0.08679 .
FEATURE6374:GROUP4 0.5610 0.6993 0.802 0.44103
GROUP0:PATIENT2 -0.5569 0.4037 -1.379 0.19785
GROUP1:PATIENT2 -0.1104 0.4037 -0.273 0.79014
GROUP2:PATIENT2 -0.9702 0.4037 -2.403 0.03712 *
GROUP3:PATIENT2 -0.1400 0.4037 -0.347 0.73586
GROUP4:PATIENT2 -0.5947 0.4037 -1.473 0.17147
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4944 on 10 degrees of freedom
Multiple R-squared: 0.8004, Adjusted R-squared: 0.4211
F-statistic: 2.11 on 19 and 10 DF, p-value: 0.1133
I then use the estimable function to estimate a linear combination of
the parameter estimates.
myEstimate <- cbind(
'(Intercept)' = 1,
'GROUP1' = 1,
'FEATURE4227:GROUP1' = 0.5,
'FEATURE6374:GROUP1' = 0.5,
'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"
> estimable(fit, myEstimate)
Estimate Std. Error t value DF Pr(>|t|)
test 12.18198 0.6694812 18.19615 10 5.395944e-09
I am able to get the t-statistic and associated p-value for the
contrast as desired. However, I sometimes have a case where there is
a missing patient within one of the groups. To give an example of
this situation, I remove patient 2 from group 4 and perform the same
analysis.
example2 <- example[!(example$GROUP == 4 & example$PATIENT == 2),]
> fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT, example2)
> summary(fit)
Call:
lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT,
data = example2)
Residuals:
Min 1Q Median 3Q Max
-5.510e-01 -2.084e-01 6.099e-20 2.084e-01 5.510e-01
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.6487 0.4442 28.475 2.5e-09 ***
FEATURE4227 -0.3271 0.5440 -0.601 0.5644
FEATURE6374 -1.0743 0.5440 -1.975 0.0837 .
GROUP1 -1.2641 0.6282 -2.012 0.0790 .
GROUP2 -0.2359 0.6282 -0.375 0.7171
GROUP3 -1.3081 0.6282 -2.082 0.0709 .
GROUP4 -0.4570 0.7023 -0.651 0.5335
FEATURE4227:GROUP1 1.5651 0.7694 2.034 0.0764 .
FEATURE6374:GROUP1 1.1435 0.7694 1.486 0.1755
FEATURE4227:GROUP2 0.4372 0.7694 0.568 0.5854
FEATURE6374:GROUP2 0.6728 0.7694 0.874 0.4074
FEATURE4227:GROUP3 1.0135 0.7694 1.317 0.2242
FEATURE6374:GROUP3 2.2665 0.7694 2.946 0.0185 *
FEATURE4227:GROUP4 1.2146 0.9423 1.289 0.2334
FEATURE6374:GROUP4 0.2850 0.9423 0.302 0.7700
GROUP0:PATIENT2 -0.5569 0.4442 -1.254 0.2454
GROUP1:PATIENT2 -0.1104 0.4442 -0.248 0.8101
GROUP2:PATIENT2 -0.9702 0.4442 -2.184 0.0605 .
GROUP3:PATIENT2 -0.1400 0.4442 -0.315 0.7606
GROUP4:PATIENT2 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.544 on 8 degrees of freedom
Multiple R-squared: 0.7852, Adjusted R-squared: 0.3019
F-statistic: 1.625 on 18 and 8 DF, p-value: 0.2462
As you can see there are now NAs in the row corresponding to the
removed patient. I still want to estimate my contrast so I run the
same code as before, but I receive the following error messages:
myEstimate <- cbind(
'(Intercept)' = 1,
'GROUP1' = 1,
'FEATURE4227:GROUP1' = 0.5,
'FEATURE6374:GROUP1' = 0.5,
'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"
> estimable(fit, myEstimate)
Error in estimable.default(fit, myEstimate) :
Dimension of structure(c(1, 0, 0, 1, 0, 0, 0, 0.5, 0.5, 0, 0, 0, 0,
0, 0, : 1x20, not compatible with no of parameters in fit: 19
Dimension of 1, 0, 0, 0, 0), .Dim = c(1L, 20L), .Dimnames =
list("test", c("(Intercept)", : 1x20, not compatible with no of
parameters in fit: 19
Dimension of "FEATURE4227", "FEATURE6374", "GROUP1", "GROUP2",
"GROUP3", "GROUP4", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "FEATURE4227:GROUP1", "FEATURE6374:GROUP1",
"FEATURE4227:GROUP2", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "FEATURE6374:GROUP2", "FEATURE4227:GROUP3",
"FEATURE6374:GROUP3", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "FEATURE4227:GROUP4", "FEATURE6374:GROUP4",
"GROUP0:PATIENT2", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "GROUP1:PATIENT2", "GROUP2:PATIENT2",
"GROUP3:PATIENT2", "GROUP4:PATIENT2": 1x20, not compatible with no of
parameters in fit: 19
Dimension of ))): 1x2
Apparently the NA in the parameter estimate table is preventing the
estimation of the contrast. I have tried multiple alternatives in the
coding of the estimable statement, but all give me errors similar to
the one above.
I think the problem is that R expects to see each patient within each
group and when it doesn't see patient 2 in group 4, it yields an NA
parameter estimate for this combination. That's fine - the results of
the other parameter estimates are still correct. However it's
preventing me from estimating a linear function of the model
coefficients, and I know that there must be a way to do this even in
the presence of the missing combination.
Has anyone run into this problem before? Any input would be
appreciated. Sorry for the lengthy message.
Tim
More information about the R-help
mailing list