[R] Profile confidence intervals and LR chi-square test

Tue Nov 14 18:46:05 CET 2006

I did mention dropterm (and drop1 can also be used).  This is more 
efficient, simpler and less-error prone than setting up your own anova()'s 
(especially if you want to do LRTs for dropping several terms).

On Tue, 14 Nov 2006, Inman, Brant A.   M.D. wrote:

>
> Thank you to Prof Ripley and Henric Nilsson for their observation that I
> was using "anova" inappropriately for the question that I was trying to
> answer. Note that the Wald statistics and confidence interval were
> calculable in the previous email but that I prefered using the more
> accurate deviance statistics.  I will demonstrate my error for the
> benefit of those new users of R (like me) that may not have appreciated
> how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY
> actually means in this context.
>
> Since the "anova" function is a sequential test of the current model,
> only the statistic for the last term in the model formula is a true
> deviance chi-square statistic for (full model) .vs. (full model - term).
> For instance, using the data upon which this question was based,
> consider the following 2 models:
>
> ----------------------
>
>> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),
> na.action=na.omit)
>> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)
>
> ----------------------
>
> Here, fit0 is the null (intercept-only) model and fit1 is the full model
> (without interactions because interactions are not biologically
> plausible in this medical dataset). Now notice the result of the "anova"
> function for the full model:
>
> ----------------------
>
>> anova(fit1, test='Chisq')
>
> ...
>
>     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989
> x1    1    8.267        98    129.721     0.004
> x2    1    5.639        97    124.083     0.018
> x3    1    3.433        96    120.650     0.064
>
> -----------------------
>
> It is incorrect to interpret the deviance chi-square test presented
> above for x1 (P=0.004) as the deviance chi-square statistic comparing
> (y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1)
> .vs. (y~x1). Similarly, the deviance chi-square statistic calculated for
> x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs.
> (y~x1+x2).  Lastly, the deviance chi-square statistic for x3(P=0.064) is
> probably the most intuitive because it is for the comparison of
> (y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to
> present for x3 in the full model.  So how do you get the correct
> deviance chi-square statistics for x1 and x2 in the full model?  There
> are a couple of ways of arriving at the same answer as I will
> demonstrate for the case of x1.
>
> Option#1: Reorder the full model so that x1 is the last term in the
> model formula
>
> -----------------------
>
>> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),
> na.action=na.omit)
>> anova(fit2, test='Chisq')
> ...
>
>     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL                    99    137.989
> x2    1    7.305        98    130.683     0.007
> x3    1    3.821        97    126.862     0.051
> x1    1    6.212        96    120.650     0.013
>
> -----------------------
>
> Notice that the deviance chi-square statistics for all of the variables
> have changed, despite fit2 being identical in content to fit1. Just the
> order of the terms in the model formula have changed from fit1 to fit2.
> The deviance statistic for x1 is now the correct one for the full model,
> that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3).
>
> Option#2: Compare the full model to a reduced model that does not
> include x1.
>
> -----------------------
>
>> fit3 <- update(fit1, . ~ . - x1)
>> anova(fit1, fit3, test='Chisq')
> ...
>
> Model 1: y ~ x1 + x2 + x3
> Model 2: y ~ x2 + x3
>  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
> 1        96    120.650
> 2        97    126.862 -1   -6.212     0.013
>
> -----------------------
>
> Fit3 is the same model as fit1 except that it is missing the x1 term.
> Therefore, any change in the deviance chi-square statistic is due to the
> deletion of x1 from full model. Notice that the difference in residual
> deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same
> the difference b/t x1 and x3 in option#1.
>
>
> Brant
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595