[R] fixed effect significance_NB mixed models_further pursuit
Jessica S Veysey
jss4 at cisunix.unh.edu
Wed Jan 7 22:22:42 CET 2009
7 Jan 09
Hello,
I am using R version 2.7.0 in a Windows XP context.
I am also using the glmm.admb package (created by Dave Fournier, Hans
Skaug, and Anders Nielson) to run mixed-effects negative binomial
models.
To the best of my knowledge and ability, I have searched and studied
the R-help, R-sig-mixed models, and ADMB NBMM for R (through Otter
Research Ltd) list servs; R help documentation; and the web in general
for information pertaining to the use and interpretation of the
glmm.admb package.
My models are able to run without problems. I would, however, like to
be able to state whether each fixed-effect predictor used in my
regression models is ?significant,? in the hypothesis-testing-sense of
?significance.?
I am seeking advice on how best to assess the significance /
importance of individual predictors in mixed-effects negative binomial
regressions.
From what I gather, there is no consensus on how best to go about
this; although there are two main lines of thought (at least as would
apply to someone in my case with limited statistical and programming
experience):
1) Log-likelihood ratio tests (LLRT), comparing the saturated model to
reduced models (wherein each predictor is dropped in turn); and
2) Wald tests, using parameter estimates and parameter standard errors
to calculate the Wald statistic.
I understand there are problems inherent to each approach. Following
the logic of D. Bates (R-sig-mixed models list serv: 13 Oct 08), I
decided to pursue the LLRT approach.
Still, I have several questions.
If this is my model:
m <- glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2, random = ~ 1,
group = ?subject?, family = ?nbinom?, data = dataset)
and
b1 = a categorical variable with 3 levels
b2 = a continuous variable
and I?ve set my contrasts as follows:
options(contrasts =c("contr.treatment", "contr.poly"))
dataset$b1 <- factor (as.character(dataset$b1), levels=c("L1", "L2", "L3"))
My questions are:
1)Is it possible to test the significance of the individual
predictors, for those predictors that are also used in the interaction
term (i.e., for b1 and b2, individually, not just the b1:b2
interaction)?
Using the LLRT approach, when I drop one of these individual
predictors (e.g., b2) without also dropping its interaction term
(i.e., b1:b2), I obtain a reduced model with a loglikehood estimate
that is equal to the loglikehood estimate of the saturated model. A
LLRT between this reduced and saturated model has 0 degrees of freedom
(because the same number of parameters is estimated for both the
reduced and the saturated model); as in the output below:
Model 1: glmm.admb(fixed = y ~ b1 + I(b2^2) + b1:b2?
Model 2: glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2?.
NoPar LogLik Df -2logQ P.value
1 10.00 -190.66
2 10.00 -190.66 0 0.00 0.00
I thought this outcome might arise because the program might
automatically implement a ?heredity? rule (see R-sig-mixed models list
serv: A. Robinson, 1 Mar. 07), whereby if an interaction is included
in a model, all lesser components of the interaction are also
automatically included in the model. A summary of the fixed effects of
the reduced model shows that there is no such heredity rule in action
(i.e., there is no parameter estimate for b2 alone):
Formula: y ~ b1 + I(b2^2) + b1:b2
(Intercept) b1L2
8.9272e-01 -1.5081e+00
b1L3 I(b2^2)
-6.3097e-01 -7.2719e-05
b1L1:b2 b1L2:b2
2.1573e-02 2.5767e-02
b1L3:b2
2.3130e-02
Ultimately, is it possible to test and discuss the significance of
these individual predictors? (If so, how?) Or is it really only
practical to discuss the significance of the interaction (and if the
interaction is not significant to remove the interaction from the
model, and then retest the significance of the individual predictors)?
2) In a similar vein, if I use a LLRT to compare a reduced model with
the squared term (i.e., I(b2^2)) removed, to the saturated model, I
obtain different LL values for the two models, but 0 degrees of
freedom for the test (because the same number of parameters are being
estimated in the two models), as shown below.
Model 1: glmm.admb(fixed = y ~ b1 + b2 + b1:b2?
Model 2: glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2?.
NoPar LogLik Df -2logQ P.value
1 10.00 -191.22
2 10.00 -190.66 0 1.13 0.00
Is there an alternative / better way to test for the significance of
the squared term (i.e., other than an LLRT as I have set it up?)
3) How can I assess the significance of contrasts between different
levels of the categorical predictor (i.e., b1)?
I am used to the summary.glme command in S+ 8.0, which generates:
parameter estimates, parameter standard errors, t values, and p-values
(describing the significance of the t values) for each level of a
categorical predictor, compared to the reference level for that
predictor.
In a manner similar to summary.glme, would it be appropriate to
conduct a Wald test of parameter-estimate significance for each level
of the categorical predictor, compared to the reference level?
If not, is there a different approach I could use?
If it would be appropriate, how could I generate the parameter
standard errors (which are necessary in order to calculate the Wald
statistics)?
This question of how to calculate fixed-effect parameter standard
errors for glmm.admb-generated models, has been posed, but only
partially answered, several times on the aforementioned list servs
(e.g., R-help list serv: H. Skaug, 25 Feb 06).
I know that, using the glmm.admb package, I could call:
m$stdbeta
to obtain the standard deviations of the fixed effect estimates.
However, this is insufficient for calculating Wald statistics or even
for generating confidence intervals, as both require standard errors,
not standard deviations.
Cameron and Trivedi (Regression Analysis of Count Data; 1998, p
62-69), provide a formula for easily calculating negative binomial
parameter standard errors from maximum likelihood Hessian standard
errors (i.e., the standard errors produced via a Poisson regression,
assuming equidispersion), but since these are not automatically
produced in the glmm.admb output, this is of little help to me.
Any help that could be provided with respect to any of my questions
would be greatly appreciated.
I apologize in advance for the length of this post, but wanted to
include all of my questions ? since they are interrelated.
I can easily provide additional information related to my examples, if needed.
Thank you for your help,
J.S. Veysey
Doctoral Student
University of New Hampshire
Durham, NH USA
More information about the R-help
mailing list