[R-sig-eco] AIC / BIC vs P-Values / MAM

Wed Aug 4 19:01:44 CEST 2010

On 10-08-04 10:55 AM, Chris Mcowen wrote:
> Dear List,
>
> I was after some advice on model selection,
>    

   OK, you asked for it ...

> I am using AIC model selection rather than P-value based stepwise regression as i feel it is more robust (Burnham&  Anderson, 2002). However there seems to be a huge difference in my results.
>    

     In my opinion model selection via AIC shares most of the 
disadvantages of p-value based model selection. "All model" selection is 
slightly better than stepwise approaches because it is less susceptible 
to getting stuck in some weird local branch, but whether you select 
models via p-value or AIC *should* be based on whether you are trying to 
test hypotheses or make predictions, and you should seriously question 
whether you should be doing model selection in the first place. You 
should *not* select a model and then make inferences about the 
'significance' of what remains in the model ...

   AIC is great but it's not a panacea.

    Now -- on to "p vs AIC" question.
> The factors with the highest p-values , and therefore retained in the MAM, when i did an explanatory stepwise regression, do not appear in the model with the lowest AIC value - do the two approaches generally not match?
>
> The factors retained by the MAM are theoretically what i would expect, so i am a bit surprised as to why the model with the lowest AIC doesn't contain them? I have ranked the AIC models with Akaike weights, but still the top ranked models don't incorporate the traits i would expect / retained in the MAM.
>
> LOWEST AIC MODEL
>
> model43<- lmer(threatornot~1+(1|order/family) + geophyte + seasonality + pollendispersal + woodyness, family=binomial)
>    
>> model43
>>      
> Generalized linear mixed model fit by the Laplace approximation
> Formula: threatornot ~ 1 + (1 | order/family) + geophyte + seasonality +      pollendispersal + woodyness
>    AIC  BIC logLik deviance
>   1395 1430 -690.6     1381
> Random effects:
>   Groups       Name        Variance Std.Dev.
>   family:order (Intercept) 0.37447  0.61194
>   order        (Intercept) 0.00000  0.00000
> Number of obs: 1116, groups: family:order, 43; order, 9
>
> Fixed effects:
>                   Estimate Std. Error z value Pr(>|z|)
> (Intercept)       0.40234    0.43237   0.931  0.35208
> geophyte2         0.06453    0.19616   0.329  0.74218
> seasonality2     -1.06900    0.34241  -3.122  0.00180 **
> pollendispersal2  0.64474    0.31089   2.074  0.03809 *
> woodyness2        0.47599    0.25646   1.856  0.06346 .
>
> BEST STEPWISE MAM
>
> Generalized linear mixed model fit by the Laplace approximation
> Formula: threatornot ~ 1 + (1 | order/family) + breedingsystem * fruit +      woodyness
>    AIC  BIC logLik deviance
>   1409 1454 -695.3     1391
> Random effects:
>   Groups       Name        Variance Std.Dev.
>   family:order (Intercept) 0.52475  0.7244
>   order        (Intercept) 0.00000  0.0000
> Number of obs: 1116, groups: family:order, 43; order, 9
>
> Fixed effects:
>                         Estimate Std. Error z value Pr(>|z|)
> (Intercept)             -1.1290     0.4909  -2.300   0.0215 *
> breedingsystem2          0.8123     0.4756   1.708   0.0876 .
> breedingsystem3          0.9449     0.5246   1.801   0.0717 .
> fruit2                   1.3885     0.6221   2.232   0.0256 *
> woodyness2               0.5484     0.2627   2.088   0.0368 *
> breedingsystem2:fruit2  -1.6218     0.6577  -2.466   0.0137 *
> breedingsystem3:fruit2  -1.6645     0.7449  -2.235   0.0255 *
>
>
> The breedingsystem* fruit interaction, should, based on theory be important so why is it not in the model with the lowest AIC but is in the MAM?
>
> I am not sure if it is because i did not set out my candidate models correctly, I did a different model for every combination of traits (2 to the power of 7) -1 as i was unsure of which models would be important. I was given the data, i didn't collect it, therefore i have to work with what i have.
>    

    My best guess as to what's going on here is that you have a good 
deal of correlation among your factors (in this case, with
discrete factors, that means that some combinations of factors are 
under/overrepresented in the data set), which means that quite
different combinations of factors can fit/explain the data approximately 
equally well.
    It's really hard to say without going through the data in detail.
    My advice would be to (a) read [or skim] Frank Harrell's book on 
Regression Modeling Strategies, particularly about the
dangers of model reduction; (b) if you're interested in **testing 
hypotheses about which factors are important**, simply fit
the full model and base your inference on the estimates and confidence 
intervals from the full model.

   good luck,
     Ben Bolker