[R-sig-eco] AIC / BIC vs P-Values / MAM

Wed Aug 4 22:05:16 CEST 2010

On Wed, Aug 4, 2010 at 3:31 PM, Chris Mcowen <chrismcowen at gmail.com> wrote:
>> If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly
>
> My approach was to rank the model according to -  AIC  (model of interest) – AICmin (aic value of minimum model) = relative AIC difference and then only use model averaging on the set of models where the value was 0-2 - (Burnham & Anderson, 2002).
>

   Do they really recommend dropping all models below delta-AIC=2??
If you're going
to drop anything, I would say a cut-off of 10 or so would be more
practical.  Just as
a (slightly extreme) example, suppose you had three models with delta-AIC=0 (the
best), 3, and 3.  Then the AIC weight of the top model would only be
1/(1+2*exp(-1.5)) approx 0.7 -- by dropping the other models you'd be throwing
out 30% of the model weight ...
>
>>  I don't quite understand.
>
> Sorry i was trying to say i then need to think of a way of validating the goodness of fit as i want to use my training data to predict my test data, and i have never used a model to predict unknown values. But i am sure i will come to it if  read around!
>
> Thanks for all your help, it is greatly appreciated
>
>
>
> On 4 Aug 2010, at 20:09, Ben Bolker wrote:
>
> On 10-08-04 01:13 PM, Chris Mcowen wrote:
>> Hi Ben,
>>
>> That is great thanks.
>>
>>
>>> whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions
>>>
>> I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too.
>>
>> The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc)  if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?
>>
>
>  If you are *really* trying to predict (rather than test hypotheses), and you really use model averaging, then I would be fine with this approach -- but then you wouldn't be spending any time worrying about which models were weighted how strongly (although I do admit that wondering why p-values and AIC gave different rankings is worth thinking about -- I'm just not sure there's a short answer without looking through all of the data).
>
>  You should take a look at the AICcmodavg and MuMIn packages on CRAN -- one or the other may (?) be able to handle lmer fits.
>>
>>> My best guess as to what's going on here is that you have a good deal of correlation among your factors
>>>
>> I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?
>>
>
>   Often but not necessarily.  Zuur et al have a recent paper in Methods in Ecology and Evolution you might want to look at.
>>
>>> some combinations of factors are under/overrepresented in the data set)
>>>
>> Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?
>>
>>
>>> simply fit
>>> the full model and base your inference on the estimates and confidence intervals from the full mode
>>>
>> I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?
>>
>
>  I don't quite understand.
>
>  Ben
>
>