[R] missing level of a nested factor results in an NA in lm output
Timothy Clough
tclough at purdue.edu
Mon Sep 21 20:19:35 CEST 2009
Dear All,
I have posted to this list before regarding the same issue so I
apologize for the multiple e-mails. I am still struggling with this
issue so I thought I'd give it another try. This time I have included
reproducible code and a subset of the data I am analyzing.
I am running an ANOVA with three factors: GROUP (5 levels), FEATURE
(2 levels), and PATIENT (2 levels), where PATIENT is nested within
GROUP. I am interested in estimating various linear functions of the
model coefficients (which I sometimes refer to as 'contrasts' below).
An example of the data can be set up using the following code:
example <- data.frame(
ABUNDANCE = rnorm(30, 12),
FEATURE = factor(rep(c(3218, 4227, 6374), 10)),
GROUP = factor(rep(c(0, 1, 2, 3, 4), 6)),
PATIENT = factor(rep(c(1, 2), 15))
)
I am using the lm function to run the model as shown.
fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/PATIENT,
example)
summary(fit)
The output of this code is below.
> fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT, example)
> summary(fit)
Call:
lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT,
data = example)
Residuals:
Min 1Q Median 3Q Max
-5.510e-01 -1.961e-01 3.469e-17 1.961e-01 5.510e-01
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.6487 0.4037 31.331 2.58e-11 ***
FEATURE4227 -0.3271 0.4944 -0.661 0.52325
FEATURE6374 -1.0743 0.4944 -2.173 0.05492 .
GROUP1 -1.2641 0.5709 -2.214 0.05120 .
GROUP2 -0.2359 0.5709 -0.413 0.68825
GROUP3 -1.3081 0.5709 -2.291 0.04492 *
GROUP4 -0.5867 0.5709 -1.028 0.32836
FEATURE4227:GROUP1 1.5651 0.6993 2.238 0.04915 *
FEATURE6374:GROUP1 1.1435 0.6993 1.635 0.13302
FEATURE4227:GROUP2 0.4372 0.6993 0.625 0.54577
FEATURE6374:GROUP2 0.6728 0.6993 0.962 0.35867
FEATURE4227:GROUP3 1.0135 0.6993 1.449 0.17786
FEATURE6374:GROUP3 2.2665 0.6993 3.241 0.00885 **
FEATURE4227:GROUP4 1.3278 0.6993 1.899 0.08679 .
FEATURE6374:GROUP4 0.5610 0.6993 0.802 0.44103
GROUP0:PATIENT2 -0.5569 0.4037 -1.379 0.19785
GROUP1:PATIENT2 -0.1104 0.4037 -0.273 0.79014
GROUP2:PATIENT2 -0.9702 0.4037 -2.403 0.03712 *
GROUP3:PATIENT2 -0.1400 0.4037 -0.347 0.73586
GROUP4:PATIENT2 -0.5947 0.4037 -1.473 0.17147
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4944 on 10 degrees of freedom
Multiple R-squared: 0.8004, Adjusted R-squared: 0.4211
F-statistic: 2.11 on 19 and 10 DF, p-value: 0.1133
I then use the estimable function to estimate a linear combination of
the parameter estimates.
myEstimate <- cbind(
'(Intercept)' = 1,
'GROUP1' = 1,
'FEATURE4227:GROUP1' = 0.5,
'FEATURE6374:GROUP1' = 0.5,
'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"
> estimable(fit, myEstimate)
Estimate Std. Error t value DF Pr(>|t|)
test 12.18198 0.6694812 18.19615 10 5.395944e-09
I am able to get the t-statistic and associated p-value for the
contrast as desired. However, I sometimes have a case where there is
a missing patient within one of the groups. To give an example of
this situation, I remove patient 2 from group 4 and perform the same
analysis.
example2 <- example[!(example$GROUP == 4 & example$PATIENT == 2),]
> fit <- lm(ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT, example2)
> summary(fit)
Call:
lm(formula = ABUNDANCE ~ FEATURE + GROUP + FEATURE:GROUP + GROUP/
PATIENT,
data = example2)
Residuals:
Min 1Q Median 3Q Max
-5.510e-01 -2.084e-01 6.099e-20 2.084e-01 5.510e-01
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.6487 0.4442 28.475 2.5e-09 ***
FEATURE4227 -0.3271 0.5440 -0.601 0.5644
FEATURE6374 -1.0743 0.5440 -1.975 0.0837 .
GROUP1 -1.2641 0.6282 -2.012 0.0790 .
GROUP2 -0.2359 0.6282 -0.375 0.7171
GROUP3 -1.3081 0.6282 -2.082 0.0709 .
GROUP4 -0.4570 0.7023 -0.651 0.5335
FEATURE4227:GROUP1 1.5651 0.7694 2.034 0.0764 .
FEATURE6374:GROUP1 1.1435 0.7694 1.486 0.1755
FEATURE4227:GROUP2 0.4372 0.7694 0.568 0.5854
FEATURE6374:GROUP2 0.6728 0.7694 0.874 0.4074
FEATURE4227:GROUP3 1.0135 0.7694 1.317 0.2242
FEATURE6374:GROUP3 2.2665 0.7694 2.946 0.0185 *
FEATURE4227:GROUP4 1.2146 0.9423 1.289 0.2334
FEATURE6374:GROUP4 0.2850 0.9423 0.302 0.7700
GROUP0:PATIENT2 -0.5569 0.4442 -1.254 0.2454
GROUP1:PATIENT2 -0.1104 0.4442 -0.248 0.8101
GROUP2:PATIENT2 -0.9702 0.4442 -2.184 0.0605 .
GROUP3:PATIENT2 -0.1400 0.4442 -0.315 0.7606
GROUP4:PATIENT2 NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.544 on 8 degrees of freedom
Multiple R-squared: 0.7852, Adjusted R-squared: 0.3019
F-statistic: 1.625 on 18 and 8 DF, p-value: 0.2462
As you can see there are now NAs in the row corresponding to the
removed patient. I still want to estimate my contrast so I run the
same code as before, but I receive the following error messages:
myEstimate <- cbind(
'(Intercept)' = 1,
'GROUP1' = 1,
'FEATURE4227:GROUP1' = 0.5,
'FEATURE6374:GROUP1' = 0.5,
'GROUP0:PATIENT2' = 1
)
rownames(myEstimate) <- "test"
> estimable(fit, myEstimate)
Error in estimable.default(fit, myEstimate) :
Dimension of structure(c(1, 0, 0, 1, 0, 0, 0, 0.5, 0.5, 0, 0, 0, 0, 0,
0, : 1x20, not compatible with no of parameters in fit: 19
Dimension of 1, 0, 0, 0, 0), .Dim = c(1L, 20L), .Dimnames =
list("test", c("(Intercept)", : 1x20, not compatible with no of
parameters in fit: 19
Dimension of "FEATURE4227", "FEATURE6374", "GROUP1", "GROUP2",
"GROUP3", "GROUP4", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "FEATURE4227:GROUP1", "FEATURE6374:GROUP1",
"FEATURE4227:GROUP2", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "FEATURE6374:GROUP2", "FEATURE4227:GROUP3",
"FEATURE6374:GROUP3", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "FEATURE4227:GROUP4", "FEATURE6374:GROUP4",
"GROUP0:PATIENT2", : 1x20, not compatible with no of parameters in
fit: 19
Dimension of "GROUP1:PATIENT2", "GROUP2:PATIENT2", "GROUP3:PATIENT2",
"GROUP4:PATIENT2": 1x20, not compatible with no of parameters in fit: 19
Dimension of ))): 1x2
Apparently the NA in the parameter estimate table is preventing the
estimation of the contrast. I have tried multiple alternatives in the
coding of the estimable statement, but all give me errors similar to
the one above.
I think the problem is that R expects to see each patient within each
group and when it doesn't see patient 2 in group 4, it yields an NA
parameter estimate for this combination. That's fine - the results of
the other parameter estimates are still correct. However it's
preventing me from estimating a linear function of the model
coefficients, and I know that there must be a way to do this even in
the presence of the missing combination.
Has anyone run into this problem before?
Tim
--
Timothy Clough
Ph.D. Student
Department of Statistics
Purdue University
765-496-7263
More information about the R-help
mailing list