[R] car::deltaMethod() fails when a particular combination of categorical variables is not present
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Tue Sep 26 17:39:10 CEST 2023
Dear Michael,
My previous response was inaccurate: First, linearHypothesis() *is* able
to accommodate aliased coefficients by setting the argument singular.ok
= TRUE:
> linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0",
+ singular.ok=TRUE)
Linear hypothesis test:
bt2 + csent + bt2:csent = 0
Model 1: restricted model
Model 2: a ~ b * c
Res.Df RSS Df Sum of Sq F Pr(>F)
1 16 9392.1
2 15 9266.4 1 125.67 0.2034 0.6584
Moreover, when there is an empty cell, this F-test is (for a reason that
I haven't worked out, but is almost surely due to how the rank-deficient
model is parametrized) *not* equivalent to the t-test for the
corresponding coefficient in the raveled version of the two factors:
> df$bc <- factor(with(df, paste(b, c, sep=":")))
> m <- lm(a ~ bc, data=df)
> summary(m)
Call:
lm(formula = a ~ bc, data = df)
Residuals:
Min 1Q Median 3Q Max
-57.455 -11.750 0.439 14.011 37.545
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.50 17.57 1.166 0.2617
bct1:unsent 37.50 24.85 1.509 0.1521
bct2:other 32.00 24.85 1.287 0.2174
bct2:sent 17.17 22.69 0.757 0.4610 <<< cf. F = 0.2034, p
= 0.6584
bct2:unsent 38.95 19.11 2.039 0.0595
Residual standard error: 24.85 on 15 degrees of freedom
Multiple R-squared: 0.2613, Adjusted R-squared: 0.06437
F-statistic: 1.327 on 4 and 15 DF, p-value: 0.3052
In the full-rank case, however, what I said is correct -- that is, the
F-test for the 1 df hypothesis on the three coefficients is equivalent
to the t-test for the corresponding coefficient when the two factors are
raveled:
> linearHypothesis(minimal_model_fixed, "bt2 + csent + bt2:csent = 0")
Linear hypothesis test:
bt2 + csent + bt2:csent = 0
Model 1: restricted model
Model 2: a ~ b * c
Res.Df RSS Df Sum of Sq F Pr(>F)
1 15 9714.5
2 14 9194.4 1 520.08 0.7919 0.3886
> df_fixed$bc <- factor(with(df_fixed, paste(b, c, sep=":")))
> m <- lm(a ~ bc, data=df_fixed)
> summary(m)
Call:
lm(formula = a ~ bc, data = df_fixed)
Residuals:
Min 1Q Median 3Q Max
-57.455 -11.750 0.167 14.011 37.545
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.000 25.627 2.497 0.0256
bct1:sent -43.500 31.387 -1.386 0.1874
bct1:unsent -12.000 36.242 -0.331 0.7455
bct2:other -11.500 31.387 -0.366 0.7195
bct2:sent -26.333 29.591 -0.890 0.3886 << cf.
bct2:unsent -4.545 26.767 -0.170 0.8676
Residual standard error: 25.63 on 14 degrees of freedom
Multiple R-squared: 0.2671, Adjusted R-squared: 0.005328
F-statistic: 1.02 on 5 and 14 DF, p-value: 0.4425
So, to summarize:
(1) You can use linearHypothesis() with singular.ok=TRUE to test the
hypothesis that you specified, though I suspect that this hypothesis
probably isn't testing what you think in the rank-deficient case. I
suspect that the hypothesis that you want to test is obtained by
raveling the two factors.
(2) There is no reason to use deltaMethod() for a linear hypothesis, but
there is also no intrinsic reason that deltaMethod() shouldn't be able
to handle a rank-deficient model. We'll probably fix that.
My apologies for the confusion,
John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/
On 2023-09-26 9:49 a.m., John Fox wrote:
> Caution: External email.
>
>
> Dear Michael,
>
> You're testing a linear hypothesis, so there's no need to use the delta
> method, but the linearHypothesis() function in the car package also
> fails in your case:
>
> > linearHypothesis(minimal_model, "bt2 + csent + bt2:csent = 0")
> Error in linearHypothesis.lm(minimal_model, "bt2 + csent + bt2:csent =
> 0") :
> there are aliased coefficients in the model.
>
> One work-around is to ravel the two factors into a single factor with 5
> levels:
>
> > df$bc <- factor(with(df, paste(b, c, sep=":")))
> > df$bc
> [1] t2:unsent t2:unsent t2:unsent t2:unsent t2:sent t2:unsent
> [7] t2:unsent t1:sent t2:unsent t2:unsent t2:other t2:unsent
> [13] t1:unsent t1:sent t2:unsent t2:other t1:unsent t2:sent
> [19] t2:sent t2:unsent
> Levels: t1:sent t1:unsent t2:other t2:sent t2:unsent
>
> > m <- lm(a ~ bc, data=df)
> > summary(m)
>
> Call:
> lm(formula = a ~ bc, data = df)
>
> Residuals:
> Min 1Q Median 3Q Max
> -57.455 -11.750 0.439 14.011 37.545
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 20.50 17.57 1.166 0.2617
> bct1:unsent 37.50 24.85 1.509 0.1521
> bct2:other 32.00 24.85 1.287 0.2174
> bct2:sent 17.17 22.69 0.757 0.4610
> bct2:unsent 38.95 19.11 2.039 0.0595
>
> Residual standard error: 24.85 on 15 degrees of freedom
> Multiple R-squared: 0.2613, Adjusted R-squared: 0.06437
> F-statistic: 1.327 on 4 and 15 DF, p-value: 0.3052
>
> Then the hypothesis is tested directly by the t-value for the
> coefficient bct2:sent.
>
> I hope that this helps,
> John
>
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://www.john-fox.ca/
>
> On 2023-09-26 1:12 a.m., Michael Cohn wrote:
>> Caution: External email.
>>
>>
>> I'm running a linear regression with two categorical predictors and their
>> interaction. One combination of levels does not occur in the data, and as
>> expected, no parameter is estimated for it. I now want to significance
>> test
>> a particular combination of levels that does occur in the data (ie, I
>> want
>> to get a confidence interval for the total prediction at given levels of
>> each variable).
>>
>> In the past I've done this using car::deltaMethod() but in this dataset
>> that does not work, as shown in the example below: The regression model
>> gives the expected output, but deltaMethod() gives this error:
>>
>> error in t(gd) %*% vcov. : non-conformable arguments
>>
>> I believe this is because there is no parameter estimate for when the
>> predictors have the values 't1' and 'other'. In the df_fixed dataframe,
>> putting one person into that combination of categories causes
>> deltaMethod()
>> to work as expected.
>>
>> I don't know of any theoretical reason that missing one interaction
>> parameter estimate should prevent getting a confidence interval for a
>> different combination of predictors. Is there a way to use
>> deltaMethod() or
>> some other function to do this without changing my data?
>>
>> Thank you,
>>
>> - Michael Cohn
>> Vote Rev (http://voterev.org)
>>
>>
>> Demonstration:
>> ------
>>
>> library(car)
>> # create dataset with outcome and two categorical predictors
>> outcomes <- c(91,2,60,53,38,78,48,33,97,41,64,84,64,8,66,41,52,18,57,34)
>> persontype <-
>> c("t2","t2","t2","t2","t2","t2","t2","t1","t2","t2","t2","t2","t1","t1","t2","t2","t1","t2","t2","t2")
>> arm_letter <-
>> c("unsent","unsent","unsent","unsent","sent","unsent","unsent","sent","unsent","unsent","other","unsent","unsent","sent","unsent","other","unsent","sent","sent","unsent")
>> df <- data.frame(a = outcomes, b=persontype, c=arm_letter)
>>
>> # note: there are no records with the combination 't1' + 'other'
>> table(df$b,df$c)
>>
>>
>> #regression works as expected
>> minimal_formula <- formula("a ~ b*c")
>> minimal_model <- lm(minimal_formula, data=df)
>> summary(minimal_model)
>>
>> #use deltaMethod() to get a prediction for individuals with the
>> combination
>> 'b2' and 'sent'
>> # deltaMethod() fails with "error in t(gd) %*% vcov. : non-conformable
>> arguments."
>> deltaMethod(minimal_model, "bt2 + csent + `bt2:csent`", rhs=0)
>>
>> # duplicate the dataset and change one record to be in the previously
>> empty
>> cell
>> df_fixed <- df
>> df_fixed[c(13),"c"] <- 'other'
>> table(df_fixed$b,df_fixed$c)
>>
>> #deltaMethod() now works
>> minimal_model_fixed <- lm(minimal_formula, data=df_fixed)
>> deltaMethod(minimal_model_fixed, "bt2 + csent + `bt2:csent`", rhs=0)
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list