[R-sig-eco] Question About Syntax For Complex ANOVA Design

hadley wickham h.wickham at gmail.com
Mon Nov 10 23:57:41 CET 2008


On Mon, Nov 10, 2008 at 2:02 PM, Ben Bolker <bolker at ufl.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> hadley wickham wrote:
>> On Mon, Nov 10, 2008 at 9:22 AM, Mike Dunbar <mdu at ceh.ac.uk> wrote:
>>> (apologies - I should have written coast * MBL not ML)
>>>
>>> I'm not sure of my ground here, but surely do lose something -
>
> you wouldn't retain coast:MBL if it's not significant, as you lose
> degrees of freedom,
>
> and this gets worse the more terms and the more interactions you consider.
>>
>> But if you drop the term you are effectively spending your degrees of
>> freedom twice - once to estimate the effect that you drop, and then
>> again in the new model.  Another way of to see the problem is to think
>> about the null distribution of the p-values - if you only include
>> significant p values in your model, the standard null hypothesis is
>> clearly not appropriate.
>>
>> I think there's a good discussion of this in Frank Harrell's
>> regression modelling strategies, but unfortunately I don't have a copy
>> on hand to point you to the exact location.
>>
>> Hadley
>
>  See e.g. sections 4.2 through 4.4 (pp. 56-60).  The discussion
> above does not mean that overfitted models are good, or that there
> isn't a penalty to overspecifying models (or otherwise one would
> always throw everything into the models), but that data-driven
> model selection has some very fundamental problems ...

But of course, not using data when selecting models has some pretty
fundamental problems too! ;)

Hadley

-- 
http://had.co.nz/



More information about the R-sig-ecology mailing list