[R-sig-ME] Variable selection: AIC versus Z-statistic

Thu Dec 24 22:51:41 CET 2009

Kardynal,Kevin [Yel] wrote:
>> Hello,
>>
>> I am analyzing time-series data for multiple songbird species where
>> data were collected at 3 point count stations within a stand (~150
>> stations) visited twice a year and with multiple observers. Point
>> count stations along seismic lines (10-12 stations/seismic line) were
>> added three years after the start of the surveys with 10-12
>> stations/seismic line which created an unbalanced design. I am using a
>> linear mixed effects model in lme4 that includes year and visit (first
>> or second visit to a sight) as fixed effects and observer and station
>> nested within stand (or seismic line) (to account for spatial
>> auto-correlation) as random variables. The full model includes a
>> quadratic effect of year plus their interaction terms (several species
>> have shown a quadratic response to budworm outbreak): 
>>
>> lme1a<-lmer(Abundance ~ Yr + Yr2 + Seismic + Yr*Seismic + Yr2*Seismic
>> + Visit + (1|Site/Station) + (1|Observer), data=CAWA, family =
>> poisson(link=log))
>> summary(lme1a)
>>
>> Reduced models:
>> lme2a<-lmer(Abundance ~ Yr + Seismic + Yr*Seismic + Visit +
>> (1|Observer) + (1|Site/Station), data=CAWA, family =
>> poisson(link=log))
>> summary(lme2a)
>>
>> lme3a<-lmer(Abundance ~ Yr + Yr2 + Visit + (1|Site/Station)+
>> (1|Observer), data=CAWA, family = poisson(link=log))
>> summary(lme3a)
>>
>> lme4a <- lmer(Abundance ~ Yr + Visit + (1|Site/Station)+ (1|Observer),
>> data=CAWA, family = poisson(link = log))
>> summary(lme4a)
>>
>> I initially included visit as a random effect but since there are only
>> 2 levels in this parameter, made it a fixed effect. Doing this changes
>> the parameter estimates for the fixed effects only slightly, including
>> their AIC values (delta <0.5 in most cases) and has not changed their
>> 'significance' (according to the Z-statistic).  
>>
>> Based on AIC, the full model fits the data the best (although delta
>> AIC for all models is <3). 
>>
>>   AIC  BIC logLik deviance
>>  1398 1458 -689.1     1378
>> Random effects:
>>  Groups       Name        Variance Std.Dev.
>>  Station:Site (Intercept) 0.272577 0.52209 
>>  Site         (Intercept) 2.922394 1.70950 
>>  Observer     (Intercept) 0.045124 0.21242 
>> Number of obs: 2807, groups: Station:Site, 242; Site, 65; Observer, 19
>>
>> Fixed effects:
>>              Estimate Std. Error z value Pr(>|z|)    
>> (Intercept) -2.847531   0.316017  -9.011   <2e-16 ***
>> Yr           0.127510   0.067730   1.883   0.0598 .  
>> Yr2         -0.014141   0.006561  -2.155   0.0311 *  
>> Seismic     -0.437059   1.263792  -0.346   0.7295    
>> Visit       -0.201875   0.092130  -2.191   0.0284 *  
>> Yr:Seismic  -0.159138   0.351301  -0.453   0.6506    
>> Yr2:Seismic  0.024156   0.026194   0.922   0.3564    
>>
>>       Df     AIC     BIC  logLik  Chisq Chi Df Pr(>Chisq)  
>> lme4a  6 1400.97 1436.61 -694.49                           
>> lme3a  7 1401.86 1443.44 -693.93 1.1086      1    0.29238  
>> lme2a  8 1398.46 1445.98 -691.23 5.4011      1    0.02012 *
>> lme1a 10 1398.15 1457.54 -689.07 4.3172      2    0.11548 
>>
>> Question #1: Because the parameter estimates change only slightly with
>> the inclusion of visit as a fixed instead of a random effect, is this
>> evidence that we could model visit as a random effect since visit is
>> really a nuisance variable?

  Opinions differ.  Since we know that random-effects estimates are
likely to be quite inaccurate with only 2 levels (consider estimating a
variance from 2 values), I would say that fixed is probably better.  In
principle one could say whether one had better predictive performance
(or characterize some other loss function) on the basis of completely
pooled (no effect), random/mixed, or completely separated models, and
which summary statistics would tell you which case you were in, but in
this case I would simply go with the fixed effect.  If you were really
committed to keeping visit as a random effect I would probably use a
Bayesian hierarchical model with an informative prior, or a half-Cauchy
uninformative prior, but that's more work.

>>
>> Question #2: I am interested only in knowing if the trend of a species
>> is significant or if changes (slopes) in trends between seismic and
>> non-seismic are different. To select models, I initially used AIC to
>> select the most appropriate model and then reduced the fixed
>> parameters in the model based on their Z-statistic (a la Zuur et al
>> 2009) but have since read that that step-wise model selection is not
>> appropriate for GLMM. 

  Opinions differ, again. Some (including me) would say that stepwise
selection, and in fact ANY selection followed by interpretation of the
remaining non-zero effects, is wrong -- Harrell _Applied Regression
Strategies_ is the best place to read about this.

I've also heard (without actually seeing any
>> published reference) that model averaging can't be used with quadratic
>> terms(?) 

  In general model averaging is tricky/often wrong/hard to interpret
when you are considering a 'main effect' in the presence of 'interaction
effects' (the _principle of marginality_) -- quadratic models are a
special case of this.  It's not *necessarily* wrong, but my rule is
'don't try it unless you know why it is wrong in general and you know
why it's *not* wrong in your case' -- I suggest Venables "Exegesis on
Linear Models" for this.

and log likelihoods are anti-conservative.

  Likelihood ratio tests are only valid asymptotically, and in some
cases the "N" that must be large for the LRT to be accurate is the
number of random-effects units.  I wouldn't guarantee that the results
are necessarily anticonservative, but Pinheiro and Bates 2000 find that
they are anticonservative for one particular case that they analyze.

  Of course, AIC is based on the same kind of asymptotic arguments, and
thus may be wrong as well.

Leaving variables
>> in a model to predict trend that apparently have little effect on the
>> results seems illogical in this case, particularly when no one model
>> fits the data much 'better' than any other model. 

  Harrell would argue (I think -- he may speak up for himself since he
sometimes reads this list) that the correct thing to do **if you want to
test significance of effects** is to use the full model.

  If you want to make predictions (rather than test significance) then
you should use model averaging.

>>
> Can anyone help me with understanding the most appropriate approach to
> selecting a model/parameters with my data set/goals?

   That's my two cents.  On the other hand, I don't think you will go
terribly far wrong using Zuur's approach either, especially because the
AIC says to keep the full model.  My guess is that the parameter values,
predictions, etc., are all pretty close together in all of your
different models, so model averaging OR just taking the top (full)
model or just _a priori_ using the full model will all give similar
answers.  If your different models do *not* give similar answers (which
is not impossible, but slightly pathological) then you should look
carefully at how & why the answers differ and proceed with great caution.

  By the way,  "Yr*Seismic" in your formula expands to "Yr:Seismic +
Seismic + Yr" -- you probably mean "Yr:Seismic" for the interaction.
Gets adjusted automatically in this case, but you might as well be
precise ...

Venables, W. N. 1998. Exegeses on Linear Modelsin . Washington, DC.
Retrieved from http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf.

Harrell, F. J. 2001. Regression Modeling Strategies. Springer.

Pinheiro, J. C., and D. M. Bates. 2000. Mixed-effects models in S and
S-PLUS. Springer, New York.

> 
>> (I've read and re-read multiple papers, books and threads on this
>> mailing list but have they have only left me more confused!)
>>
>> Thanks a lot for your help!
>>
>> Kevin