[R-sig-eco] AIC / BIC vs P-Values / MAM
Ben Bolker
bbolker at gmail.com
Wed Aug 4 21:09:12 CEST 2010
On 10-08-04 01:13 PM, Chris Mcowen wrote:
> Hi Ben,
>
> That is great thanks.
>
>
>> whether you select models via p-value or AIC *should* be based on whether you are trying to test hypotheses or make predictions
>>
> I have 7 factors of which 5 have been shown, theoretically and empirically, to have an impact on my response variable. The other two are somewhat wild shots, but i have a hunch they are important too.
>
> The problem is there are no clear analytical patterns of the variables, they don't fit into neat boxed themes (size, shape etc) if you will, therefore making a hypotheses about how they inter-react is hard. Therefore forming a subset of models to test is very difficult, my approach has been to use all combinations of factors to generate the candidate models. I am worried that this approach is taking me down the data dredging/ model simplification route i am trying to avoid. Is it bad practice to use all combinations? As long as i rank them by akaike weight and use model averaging techniques isn't this OK?
>
If you are *really* trying to predict (rather than test hypotheses),
and you really use model averaging, then I would be fine with this
approach -- but then you wouldn't be spending any time worrying about
which models were weighted how strongly (although I do admit that
wondering why p-values and AIC gave different rankings is worth thinking
about -- I'm just not sure there's a short answer without looking
through all of the data).
You should take a look at the AICcmodavg and MuMIn packages on CRAN
-- one or the other may (?) be able to handle lmer fits.
>
>> My best guess as to what's going on here is that you have a good deal of correlation among your factors
>>
> I tested this with Pearson's R and only one combination showed up as having a strong correlation, is this not sufficient?
>
Often but not necessarily. Zuur et al have a recent paper in
Methods in Ecology and Evolution you might want to look at.
>
>> some combinations of factors are under/overrepresented in the data set)
>>
> Thats is certainly the case, but i cant do much about that, is it not just sufficent to rely on Pearson's values as mentioned above?
>
>
>> simply fit
>> the full model and base your inference on the estimates and confidence intervals from the full mode
>>
> I want to be able to predict the threat status ( the response variable) for species i only have traits (factors) for, this approach would not really let me do this would it?
>
I don't quite understand.
Ben
More information about the R-sig-ecology
mailing list