[R] FW: logistic regression
Gavin Simpson
gavin.simpson at ucl.ac.uk
Mon Sep 29 11:02:08 CEST 2008
On Sun, 2008-09-28 at 21:23 -0500, Frank E Harrell Jr wrote:
> Darin Brooks wrote:
> > I certainly appreciate your comments, Bert. It is abundantly clear that I
<snip />
> >
> > Darin Brooks
>
> Darin,
>
> I think the point is that the confidence you can assign to the "best
> available variables" is zero. That is the probability that stepwise
> variable selection will select the correct variables.
>
> It is probably better to build a model based on the knowledge in the
> field you alluded to, rather than to use P-values to decide.
>
> Frank Harrell
Hi Frank, et al
I don't have Darin's original email to hand just now, but IIRC he turned
on the testing by p-values, something that add1 and drop1 do not do by
default.
Venables and Ripley's MASS contains stepAIC and there they make use of
drop1 in the regression chapters (Apologies if I have made sweeping
statements that are just plain wrong here - I'm at home this morning and
don't seem to have either of my two MASS copies here with me).
Would the same criticisms made by yourself and Bert, amongst others, in
this thread be levelled at simplifying models using AIC rather than via
p-values? Part of the issue with stepwise procedures is that they don't
correct the overall Type I error rate (even if you use 0.05 as your
cut-off for each test, overall your error rate can be much larger). Does
AIC allow one to get out of this bit of the problem with stepwise
methods?
I'd appreciate any thoughts you or others on the list may have on this.
All the best, and thanks for an interesting discussion thus far.
G
>
> >
> > -----Original Message-----
> > From: Bert Gunter [mailto:gunter.berton at gene.com]
> > Sent: Sunday, September 28, 2008 6:26 PM
> > To: 'David Winsemius'; 'Darin Brooks'
> > Cc: r-help at stat.math.ethz.ch; ted.harding at manchester.ac.uk
> > Subject: RE: [R] FW: logistic regression
> >
> >
> > The Inferno awaits me -- but I cannot resist a comment (but DO look at
> > Frank's website).
> >
> > There is a deep and disconcerting dissonance here. Scientists are
> > (naturally) interested in getting at mechanisms, and so want to know which
> > of the variables "count" and which do not. But statistical analysis --
> > **any** statistical analysis -- cannot tell you that. All statistical
> > analysis can do is build models that give good predictions (and only over
> > the range of the data). The models you get depend **both** on the way Nature
> > works **and** the peculiarities of your data (which is what Frank referred
> > to in his comment on data reduction). In fact, it is highly likely that with
> > your data there are many alternative prediction equations built from
> > different collections of covariates that perform essentially equally well.
> > Sometimes it is otherwise, typically when prospective, carefully designed
> > studies are performed -- there is a reason that the FDA insists on clinical
> > trials, after all (and reasons why such studies are difficult and expensive
> > to do!).
> >
> > The belief that "data mining" (as it is known in the polite circles that
> > Frank obviously eschews) is an effective (and even automated!) tool for
> > discovering how Nature works is a misconception, but one that for many
> > reasons is enthusiastically promoted. If you are looking only to predict,
> > it may do; but you are deceived if you hope for Truth. Can you get hints? --
> > well maybe, maybe not. Chaos beckons.
> >
> > I think many -- maybe even most -- statisticians rue the day that stepwise
> > regression was invented and certainly that it has been marketed as a tool
> > for winnowing out the "important" few variables from the blizzard of
> > "irrelevant" background noise. Pogo was right: " We have seen the enemy --
> > and it is us."
> >
> > (As I said, the Inferno awaits...)
> >
> > Cheers to all,
> > Bert Gunter
> >
> > DEFINITELY MY OWN OPINIONS HERE!
> >
> >
> >
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> > Behalf Of David Winsemius
> > Sent: Saturday, September 27, 2008 5:34 PM
> > To: Darin Brooks
> > Cc: r-help at stat.math.ethz.ch; ted.harding at manchester.ac.uk
> > Subject: Re: [R] FW: logistic regression
> >
> > It's more a statement that it expresses a statistical perspective very
> > succinctly, somewhat like a Zen koan. Frank's book,"Regression Modeling
> > Strategies", has entire chapters on reasoned approaches to your question.
> > His website also has quite a bit of material free for the taking.
> >
> > --
> > David Winsemius
> > Heritage Laboratories
> >
> > On Sep 27, 2008, at 7:24 PM, Darin Brooks wrote:
> >
> >> Glad you were amused.
> >>
> >> I assume that "booking this as a fortune" means that this was an
> >> idiotic way to model the data?
> >>
> >> MARS? Boosted Regression Trees? Any of these a better choice to
> >> extract significant predictors (from a list of about 44) for a
> >> measured dependent variable?
> >>
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
> >> [mailto:r-help-bounces at r-project.org
> >> ] On
> >> Behalf Of Ted Harding
> >> Sent: Saturday, September 27, 2008 4:30 PM
> >> To: r-help at stat.math.ethz.ch
> >> Subject: Re: [R] FW: logistic regression
> >>
> >>
> >>
> >> On 27-Sep-08 21:45:23, Dieter Menne wrote:
> >>> Frank E Harrell Jr <f.harrell <at> vanderbilt.edu> writes:
> >>>
> >>>> Estimates from this model (and especially standard errors and
> >>>> P-values)
> >>>> will be invalid because they do not take into account the stepwise
> >>>> procedure above that was used to torture the data until they
> >>>> confessed.
> >>>>
> >>>> Frank
> >>> Please book this as a fortune.
> >>>
> >>> Dieter
> >> Seconded!
> >> Ted.
> >>
>
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list