[R] FW: logistic regression

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon Sep 29 10:53:10 CEST 2008


On Sun, 2008-09-28 at 19:26 -0600, Darin Brooks wrote:
> I certainly appreciate your comments, Bert.  It is abundantly clear that I
> won't be invited to any of the cocktail parties hosted by the "polite
> circles".  I am not a statistician.  I am merely a geographer (in the field
> of ecology) trying to develop a predictor to assist in a forestry-based
> decision making process.  My work in the natural world has taught me that
> NOTHING is predictable ... and the very idea of a bullet-proof ecological
> predictive model is doomed to fail.  
> That said, there ARE some basic predictors that assist foresters in their
> salvage decisions.  They use these on a daily basis.  The problem is that
> most of the evidence and modeling is anecdotal.  There really are no models
> in the field that I am working in.  And for good reason ... The natural
> world isn't interested in being modeled.  I think we can all agree on this -
> guru or not.

Hi Darin,

As an ecologist myself, I think you overstate things a bit here. Clearly
there are features of the "ecological" world out there that follow
"rules" --- otherwise we might as well consign the whole branch of
theoretical ecology to the bin. These things can be modelled, but we are
often looking for a relatively small signal in a whole load of noise.

You really do need to "model" your system in order to make predictions
about it. How you go about the "modelling" is another matter.

I think you may be better off with some of the more algorithm-centric
data mining methods that are currently the rage in some quarters of
ecology (predicting climate change effects on species +/-, change in
range etc); things like regression/classification trees and
randomForest, boosting etc. Names to look out for in this literature are
JR Leathwick, Antoine Guisan, Miguel B Araujo and J Elith. You'll find a
lot of work looking at these modern methods in these authors' work, and
that of others. These methods have less statistical theoretical
underpinnings, but can be evaluated on how well they make predictions.
Which is often the whole point of doing the analysis.

> But even the most basic predictive model (using only the GIS/mappable data
> that is readily available to most users) is a starting point.  The resultant
> dataset(s) of this potential model will be followed-up and field verified.
> Providing this simple starting point (or catalyst if you will)could
> potentially save A LOT of time and money.
> What I need to do is to isolate the best available variables into a model
> and assign a confidence to it.  It doesn't have to change everyone's world
> ... it just has to change the way of thinking in my small little world.
> These past few days have been an education for me in the subject of stepwise
> regression.  I approach it with much more apprehension now.  So if nothing
> else good comes of this discussion/exercise/experience ... I've learned
> something.

I too would like to thank the contributors to this thread --- very
informative!

All the best,

G

> 
> Darin Brooks           
> 
> -----Original Message-----
> From: Bert Gunter [mailto:gunter.berton at gene.com] 
> Sent: Sunday, September 28, 2008 6:26 PM
> To: 'David Winsemius'; 'Darin Brooks'
> Cc: r-help at stat.math.ethz.ch; ted.harding at manchester.ac.uk
> Subject: RE: [R] FW: logistic regression
> 
> 
> The Inferno awaits me -- but I cannot resist a comment (but DO look at
> Frank's website).
> 
> There is a deep and disconcerting dissonance here. Scientists are
> (naturally) interested in getting at mechanisms, and so want to know which
> of the variables "count" and which do not. But statistical analysis --
> **any** statistical analysis -- cannot tell you that. All statistical
> analysis can do is build models that give good predictions (and only over
> the range of the data). The models you get depend **both** on the way Nature
> works **and** the peculiarities of your data (which is what Frank referred
> to in his comment on data reduction). In fact, it is highly likely that with
> your data there are many alternative prediction equations built from
> different collections of covariates that perform essentially equally well.
> Sometimes it is otherwise, typically when prospective, carefully designed
> studies are performed -- there is a reason that the FDA insists on clinical
> trials, after all (and reasons why such studies are difficult and expensive
> to do!).
> 
> The belief that "data mining" (as it is known in the polite circles that
> Frank obviously eschews) is an effective (and even automated!) tool for
> discovering how Nature works is a misconception, but one that for many
> reasons is enthusiastically promoted.  If you are looking only to predict,
> it may do; but you are deceived if you hope for Truth. Can you get hints? --
> well maybe, maybe not. Chaos beckons.
> 
> I think many -- maybe even most -- statisticians rue the day that stepwise
> regression was invented and certainly that it has been marketed as a tool
> for winnowing out the "important" few variables from the blizzard of
> "irrelevant" background noise. Pogo was right: " We have seen the enemy --
> and it is us."
> 
> (As I said, the Inferno awaits...)
> 
> Cheers to all,
> Bert Gunter
> 
> DEFINITELY MY OWN OPINIONS HERE!
> 
> 
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of David Winsemius
> Sent: Saturday, September 27, 2008 5:34 PM
> To: Darin Brooks
> Cc: r-help at stat.math.ethz.ch; ted.harding at manchester.ac.uk
> Subject: Re: [R] FW: logistic regression
> 
> It's more a statement that it expresses a statistical perspective very
> succinctly, somewhat like a Zen koan.  Frank's book,"Regression Modeling
> Strategies", has entire chapters on reasoned approaches to your question.
> His website also has quite a bit of material free for the taking.
> 
> --
> David Winsemius
> Heritage Laboratories
> 
> On Sep 27, 2008, at 7:24 PM, Darin Brooks wrote:
> 
> > Glad you were amused.
> >
> > I assume that "booking this as a fortune" means that this was an 
> > idiotic way to model the data?
> >
> > MARS?  Boosted Regression Trees?  Any of these a better choice to 
> > extract significant predictors (from a list of about 44) for a 
> > measured dependent variable?
> >
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> > [mailto:r-help-bounces at r-project.org
> > ] On
> > Behalf Of Ted Harding
> > Sent: Saturday, September 27, 2008 4:30 PM
> > To: r-help at stat.math.ethz.ch
> > Subject: Re: [R] FW: logistic regression
> >
> >
> >
> > On 27-Sep-08 21:45:23, Dieter Menne wrote:
> >> Frank E Harrell Jr <f.harrell <at> vanderbilt.edu> writes:
> >>
> >>> Estimates from this model (and especially standard errors and
> >>> P-values)
> >>> will be invalid because they do not take into account the stepwise 
> >>> procedure above that was used to torture the data until they 
> >>> confessed.
> >>>
> >>> Frank
> >>
> >> Please book this as a fortune.
> >>
> >> Dieter
> >
> > Seconded!
> > Ted.
> >
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 27-Sep-08                                       Time: 23:30:19
> > ------------------------------ XFMail ------------------------------
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > No virus found in this incoming message.
> > Checked by AVG - http://www.avg.com
> >
> > 6:55 PM
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> 
> 1:11 PM
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list