[R-sig-eco] AIC / BIC vs P-Values / MAM

Thu Aug 5 08:44:12 CEST 2010

Hi Chris,

There are many methods (Boyce index, maxKappa, etc.) to evaluated the 
predictions of a model when applied to test dataset. More information is 
given in the paper of Hirzel et al 2006 Ecological Modelling. 
Furthermore, have a look at the package PresenceAbsence 
(http://rss.acs.unt.edu/Rdoc/library/PresenceAbsence/html/PresenceAbsence.package.html). 
This one contains many evaluators except for Boyce index.

Kind regards,

Maarten

Chris Mcowen wrote:
> Hi Chris and Ben,
>
> This is exactly what intended to do, I took 20 percent of my data set and left it out of the data I used to build the model to test later.
>
> I am relatively new to models in general and my PhD supervisors are both ecology/conservation based. I was therefore wondering if you could offer some advice as to the best method of evaluating the predictive ability of a model, both the method for actually predicting the result and then how to check the confidence. If this is too much to ask ( a workflow) then a few steps from which I can build upon would be gratefully received.
>
> Thanks again for your help,
>
> Chris 
>   
>> On 5 Aug 2010, at 02:12, Chris Howden <chris at trickysolutions.com.au> wrote:
>>
>>     
>>> Hi Ben,
>>>
>>>  
>>>
>>> Your absolutely right.
>>>
>>>  
>>>
>>> Which was why I said you should test the models predictive ability on the â€œtest data setâ€. I likely should have it a bit more clear that the â€œtest data setâ€ isnâ€™t used when building the model. And I agree that Cross Validation is best, if U have the time and code that does it.
>>>
>>>  
>>>
>>> Itâ€™s also why I said that using AIC to decide which models to actually bother testing would be a good idea.
>>>
>>>  
>>>
>>> At least thatâ€™s the approach I usually use i.e.
>>>
>>>  
>>>
>>> 1.      Create the model and initially evaluate which are best using AIC, comparing each models log-likelihood to the Null model and other applicable models,  and some common sense.
>>>
>>>  
>>>
>>> 2.      Then I evaluate the predictive ability of the best few models on a â€œtest data setâ€ which wasnâ€™t used to create them.
>>>
>>>  
>>>
>>> Chris Howden
>>>
>>> Founding Partner
>>>
>>> Tricky Solutions
>>>
>>> Tricky Solutions 4 Tricky Problems
>>>
>>> Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training
>>>
>>> (mobile) 0410 689 945
>>>
>>> (fax / office) (+618) 8952 7878
>>>
>>> chris at trickysolutions.com.au
>>>
>>>  
>>>
>>> From: bbolker at gmail.com [mailto:bbolker at gmail.com] 
>>> Sent: Thursday, 5 August 2010 10:17 AM
>>> To: Chris Howden
>>> Cc: Chris Mcowen; r-sig-ecology at r-project.org
>>> Subject: Re: Re: [R-sig-eco] AIC / BIC vs P-Values / MAM
>>>
>>>  
>>>
>>> On Aug 4, 2010 8:13pm, Chris Howden <chris at trickysolutions.com.au> wrote:
>>>       
>>>> Hi Chris,
>>>>
>>>> If u want good predictive ability, which is exactly what u do want when
>>>> using a model for prediction, then why not use its predictive ability as a
>>>> model selection criteria?
>>>>         
>>> Because this will typically lead to overfitting the data, i.e. getting a great
>>> fit to the 'training' set but then doing miserably on future data? Unless you do
>>> something like split the data set into a training and a validation set, or
>>> use cross-validation (which is a more sophisticated version of the same idea),
>>> just finding the model with the best predictive capability on a specified
>>> data set will *not* give you a good model in general. That's why approaches
>>> such as AIC, corrected R^2, and so forth, include a penalty for model
>>> complexity.
>>>
>>> Unless I'm missing something really obvious, in which case I apologize. 
>>>
>>> Ben Bolker
>>>
>>>       
>
>
> 	[[alternative HTML version deleted]]
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>