[R] fixed effect significance_NB mixed models_further pursuit

Wed Jan 7 22:22:42 CET 2009

7 Jan 09

Hello,

I am using R version 2.7.0 in a Windows XP context.

I am also using the glmm.admb package (created by Dave Fournier, Hans  
Skaug, and Anders Nielson) to run mixed-effects negative binomial  
models.

To the best of my knowledge and ability, I have searched and studied  
the R-help, R-sig-mixed models, and ADMB NBMM for R (through Otter  
Research Ltd) list servs; R help documentation; and the web in general  
for information pertaining to the use and interpretation of the  
glmm.admb package.

My models are able to run without problems. I would, however, like to  
be able to state whether each fixed-effect predictor used in my  
regression models is ?significant,? in the hypothesis-testing-sense of  
?significance.?

I am seeking advice on how best to assess the significance /  
importance of individual predictors in mixed-effects negative binomial  
regressions.

 From what I gather, there is no consensus on how best to go about  
this; although there are two main lines of thought (at least as would  
apply to someone in my case with limited statistical and programming  
experience):

1)	Log-likelihood ratio tests (LLRT), comparing the saturated model to  
reduced models (wherein each predictor is dropped in turn); and

2)	Wald tests, using parameter estimates and parameter standard errors  
to calculate the Wald statistic.

I understand there are problems inherent to each approach. Following  
the logic of D. Bates (R-sig-mixed models list serv: 13 Oct 08), I  
decided to pursue the LLRT approach.

Still, I have several questions.

If this is my model:

m <- glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2, random = ~ 1,  
group = ?subject?, family = ?nbinom?, data = dataset)

and

b1 = a categorical variable with 3 levels
b2 = a continuous variable

and I?ve set my contrasts as follows:

options(contrasts =c("contr.treatment", "contr.poly"))
dataset$b1 <- factor (as.character(dataset$b1), levels=c("L1", "L2", "L3"))

My questions are:

1)Is it possible to test the significance of the individual  
predictors, for those predictors that are also used in the interaction  
term (i.e., for b1 and b2, individually, not just the b1:b2  
interaction)?

Using the LLRT approach, when I drop one of these individual  
predictors (e.g., b2) without also dropping its interaction term  
(i.e., b1:b2), I obtain a reduced model with a loglikehood estimate  
that is equal to the loglikehood estimate of the saturated model.  A  
LLRT between this reduced and saturated model has 0 degrees of freedom  
(because the same number of parameters is estimated for both the  
reduced and the saturated model); as in the output below:

Model 1:   glmm.admb(fixed = y ~ b1 + I(b2^2) + b1:b2?
Model 2:   glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2?.

NoPar  LogLik Df  -2logQ P.value

1   10.00 -190.66
2   10.00 -190.66  0    0.00    0.00

I thought this outcome might arise because the program might  
automatically implement a ?heredity? rule (see R-sig-mixed models list  
serv: A. Robinson, 1 Mar. 07), whereby if an interaction is included  
in a model, all lesser components of the interaction are also  
automatically included in the model. A summary of the fixed effects of  
the reduced model shows that there is no such heredity rule in action  
(i.e., there is no parameter estimate for b2 alone):

Formula: y ~ b1 + I(b2^2) + b1:b2

                     (Intercept)                          b1L2
                      8.9272e-01                     -1.5081e+00

                     b1L3                  	         I(b2^2)
                     -6.3097e-01                     -7.2719e-05

                       b1L1:b2		        b1L2:b2
                     2.1573e-02                      2.5767e-02

            	    b1L3:b2
                     2.3130e-02

Ultimately, is it possible to test and discuss the significance of  
these individual predictors? (If so, how?) Or is it really only  
practical to discuss the significance of the interaction (and if the  
interaction is not significant to remove the interaction from the  
model, and then retest the significance of the individual predictors)?

2) In a similar vein, if I use a LLRT to compare a reduced model with  
the squared term (i.e., I(b2^2)) removed, to the saturated model, I  
obtain different LL values for the two models, but  0 degrees of  
freedom for the test (because the same number of parameters are being  
estimated in the two models), as shown below.

Model 1:   glmm.admb(fixed = y ~ b1 + b2 +  b1:b2?
Model 2:   glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2?.

NoPar  LogLik Df  -2logQ P.value
1   10.00 -191.22
2   10.00 -190.66  0    1.13    0.00

Is there an alternative / better way to test for the significance of  
the squared term (i.e., other than an LLRT as I have set it up?)

3) How can I assess the significance of contrasts between different  
levels of the categorical predictor (i.e., b1)?

I am used to the summary.glme command in S+ 8.0, which generates:  
parameter estimates, parameter standard errors, t values, and p-values  
(describing the significance of the t values) for each level of a  
categorical predictor, compared to the reference level for that  
predictor.

In a manner similar to summary.glme, would it be appropriate to  
conduct a Wald test of parameter-estimate significance for each level  
of the categorical predictor, compared to the reference level?

If not, is there a different approach I could use?

If it would be appropriate, how could I generate the parameter  
standard errors (which are necessary in order to calculate the Wald  
statistics)?

This question of how to calculate fixed-effect parameter standard  
errors for glmm.admb-generated models, has been posed, but only  
partially answered, several times on the aforementioned list servs  
(e.g., R-help list serv: H. Skaug, 25 Feb 06).

I know that, using the glmm.admb package, I could call:

m$stdbeta

to obtain the standard deviations of the fixed effect estimates.

However, this is insufficient for calculating Wald statistics or even  
for generating confidence intervals, as both require standard errors,  
not standard deviations.

Cameron and Trivedi (Regression Analysis of Count Data; 1998, p  
62-69), provide a formula for easily calculating negative binomial  
parameter standard errors from maximum likelihood Hessian standard  
errors (i.e., the standard errors produced via a Poisson regression,  
assuming equidispersion), but since these are not automatically  
produced in the glmm.admb output, this is of little help to me.

Any help that could be provided with respect to any of my questions  
would be greatly appreciated.

I apologize in advance for the length of this post, but wanted to  
include all of my questions ? since they are interrelated.

I can easily provide additional information related to my examples, if needed.

Thank you for your help,

J.S. Veysey

Doctoral Student
University of New Hampshire
Durham, NH USA