[R-sig-eco] Question About Syntax For Complex ANOVA Design

Ben Bolker bolker at ufl.edu
Mon Nov 10 21:02:27 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hadley wickham wrote:
> On Mon, Nov 10, 2008 at 9:22 AM, Mike Dunbar <mdu at ceh.ac.uk> wrote:
>> (apologies - I should have written coast * MBL not ML)
>>
>> I'm not sure of my ground here, but surely do lose something - 

you wouldn't retain coast:MBL if it's not significant, as you lose
degrees of freedom,

and this gets worse the more terms and the more interactions you consider.
> 
> But if you drop the term you are effectively spending your degrees of
> freedom twice - once to estimate the effect that you drop, and then
> again in the new model.  Another way of to see the problem is to think
> about the null distribution of the p-values - if you only include
> significant p values in your model, the standard null hypothesis is
> clearly not appropriate.
> 
> I think there's a good discussion of this in Frank Harrell's
> regression modelling strategies, but unfortunately I don't have a copy
> on hand to point you to the exact location.
> 
> Hadley

  See e.g. sections 4.2 through 4.4 (pp. 56-60).  The discussion
above does not mean that overfitted models are good, or that there
isn't a penalty to overspecifying models (or otherwise one would
always throw everything into the models), but that data-driven
model selection has some very fundamental problems ...

  cheers
   Ben Bolker

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkYk1MACgkQc5UpGjwzenOcvgCePr2fJx+GfV++s6Q14pQe/Ryj
vf8An2Gxc3SCzsCHj7x53yOXAx/NZng4
=Os6f
-----END PGP SIGNATURE-----



More information about the R-sig-ecology mailing list