[R] sample size > 20K? Was: fitness of regression tree: how to measure???

Thu Apr 1 19:08:37 CEST 2010

> Incidentally, there is nothing new or radical in this; indeed, John Tukey,
> Leo Breiman, George Box, and others wrote eloquently about this decades ago.
> And Breiman's random forest modeling procedure explicitly abandoned efforts
> to build simply interpretable models (from which one might infer causality)
> in favor of building better interpolators, although assessment of "variable
> importance" does try to recover some of that interpretability (however, no
> guarantees are given).

I've found the making distinction between models for explanation and
models for prediction to be particularly helpful. I was first made
aware of this split by Brian Ripley's talk "Selecting amongst large
classes of models", presented at a symposium in honour of John
Nelder's 80th birthday -
http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/