[R-sig-eco] AIC / BIC vs P-Values / MAM
Chris Howden
chris at trickysolutions.com.au
Thu Aug 5 02:13:29 CEST 2010
Hi Chris,
If u want good predictive ability, which is exactly what u do want when
using a model for prediction, then why not use its predictive ability as a
model selection criteria?
This can be done by calculating the predictive error of various models on
your test data set and use that as a model selection criteria. Maybe use
AIC to decide which models to bother testing, but use its predictive
ability as the final test. I usually also look at min and max errors, and
the error distribution in general.
When it comes to hypothesis testing I sometimes fit a series of simple
models, one for each predictor. This allows me to test each one's "sole"
correlation/association. It works very well when there is a lot of
correlation amongst predictors, which is when a full model will not work
as well and can give very misleading results. If there are any known
co-variates then I might fit them also so I can test the hypothesis
predictors effect in conjunction with the covariates.
Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
chris at trickysolutions.com.au
-----Original Message-----
From: r-sig-ecology-bounces at r-project.org
[mailto:r-sig-ecology-bounces at r-project.org] On Behalf Of Chris Mcowen
Sent: Thursday, 5 August 2010 5:01 AM
To: Ben Bolker
Cc: r-sig-ecology at r-project.org
Subject: Re: [R-sig-eco] AIC / BIC vs P-Values / MAM
> If you are *really* trying to predict (rather than test hypotheses), and
you really use model averaging, then I would be fine with this approach --
but then you wouldn't be spending any time worrying about which models
were weighted how strongly
My approach was to rank the model according to - AIC (model of interest)
- AICmin (aic value of minimum model) = relative AIC difference and then
only use model averaging on the set of models where the value was 0-2 -
(Burnham & Anderson, 2002).
> I don't quite understand.
Sorry i was trying to say i then need to think of a way of validating the
goodness of fit as i want to use my training data to predict my test data,
and i have never used a model to predict unknown values. But i am sure i
will come to it if read around!
Thanks for all your help, it is greatly appreciated
On 4 Aug 2010, at 20:09, Ben Bolker wrote:
On 10-08-04 01:13 PM, Chris Mcowen wrote:
> Hi Ben,
>
> That is great thanks.
>
>
>> whether you select models via p-value or AIC *should* be based on
whether you are trying to test hypotheses or make predictions
>>
> I have 7 factors of which 5 have been shown, theoretically and
empirically, to have an impact on my response variable. The other two are
somewhat wild shots, but i have a hunch they are important too.
>
> The problem is there are no clear analytical patterns of the variables,
they don't fit into neat boxed themes (size, shape etc) if you will,
therefore making a hypotheses about how they inter-react is hard.
Therefore forming a subset of models to test is very difficult, my
approach has been to use all combinations of factors to generate the
candidate models. I am worried that this approach is taking me down the
data dredging/ model simplification route i am trying to avoid. Is it bad
practice to use all combinations? As long as i rank them by akaike weight
and use model averaging techniques isn't this OK?
>
If you are *really* trying to predict (rather than test hypotheses), and
you really use model averaging, then I would be fine with this approach --
but then you wouldn't be spending any time worrying about which models
were weighted how strongly (although I do admit that wondering why
p-values and AIC gave different rankings is worth thinking about -- I'm
just not sure there's a short answer without looking through all of the
data).
You should take a look at the AICcmodavg and MuMIn packages on CRAN --
one or the other may (?) be able to handle lmer fits.
>
>> My best guess as to what's going on here is that you have a good deal
of correlation among your factors
>>
> I tested this with Pearson's R and only one combination showed up as
having a strong correlation, is this not sufficient?
>
Often but not necessarily. Zuur et al have a recent paper in Methods
in Ecology and Evolution you might want to look at.
>
>> some combinations of factors are under/overrepresented in the data set)
>>
> Thats is certainly the case, but i cant do much about that, is it not
just sufficent to rely on Pearson's values as mentioned above?
>
>
>> simply fit
>> the full model and base your inference on the estimates and confidence
intervals from the full mode
>>
> I want to be able to predict the threat status ( the response variable)
for species i only have traits (factors) for, this approach would not
really let me do this would it?
>
I don't quite understand.
Ben
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
More information about the R-sig-ecology
mailing list