[R] model simplification using Crawley as a guide

Marc Schwartz marc_schwartz at comcast.net
Thu Jun 12 03:10:10 CEST 2008


on 06/11/2008 05:53 PM Frank E Harrell Jr wrote:
> Ben Bolker wrote:
>> Lucke, Joseph F <Joseph.F.Lucke <at> uth.tmc.edu> writes:
>>
>>> And to follow FH and HW
>>>
>>> What level of significance are you using? .05 is excessively liberal.
>>> Are you adjusting your p-values for the number of possible models? Do
>>> you realize the p-values for dropping a term, being selected as the
>>> maximum of a set of p-values, do not follow their usual distributions?
>>> How are you compensating for sample size, as a p-value's being
>>> significant is a function of sample size?  How are you compensating for
>>> the fact that the current model choice is dependent on the previous
>>> model choices? How do you know your tree of model choices is the optimal
>>> one?  Have you considered cross-validation?  Are you looking for a model
>>> that true describes a phenomenon or a predictive model that can be used
>>> for practical purposes?
>>>
>>
>>    Ouch.  While Frank Harrell and Joseph Lucke are raising
>> serious issues about model selection, maybe we could keep in mind that
>> we don't want to scare off all the students who ever try to use R
>> to figure out basic statistics.  I would follow Peter Dalgaard's advice
>> (about "drop1") and Hadley Wickham's (about graphical diagnostics), 
>> and if possible bring up the other issues about
>> model selection with others around you -- if you're a student, ask
>> your prof. or someone in the stats department.  It can be tough
>> to try to do things right if those around you are still
>> doing them wrong ...  If you tell us what field you're in we
>> may be able to point you to more subject-specific references
>> (e.g. Whittingham, Mark J., Philip A. Stephens, Richard B. Bradbury, 
>> and Robert
>> P. Freckleton. 2006. Why do we still use stepwise modelling in ecology 
>> and
>> behaviour? Journal of Animal Ecology 75, no. 5: 1182-1189)
>>
>>    Ben Bolker
> 
> Good points Ben.  For now I'd recommend simply that the allergic 
> reaction to insignificant statistical tests be treated with an 
> antihistamine :-)


A vote for Frank's comment to be added to the 'fortunes' package.

:-)

Regards,

Marc



More information about the R-help mailing list