[R] Logistic Regression

John Sorkin JSorkin at grecc.umaryland.edu
Fri Jun 10 13:35:53 CEST 2011


First a word of caution: Forward, backward, and stepwise regression analyses are not well received among statisticians. There are many reasons for this. Some of the reasons include:
(1) The p value computed at each step is computed ignoring all the previous steps. This can lead to incorrect inferences. The p value should be conditioned (i.e. computed taking into account) all the previous steps, i.e. the "path" used to get to the current model.
(2) When entering interaction terms, the three methods do not make sure that the main effects included in the interaction are in the model before the interaction is added.
(3) The analysis strategy substitutes the modeler's knowledge of the problem at hand for a thoughtless mechanical procedure.
(4) When your data are colinear (i.e. there is significant correlation among your independent variables) the three techniques you used may choose different models not because one model is better than the other, but rather because of colinearity.
(5) None of the three techniques gives "absolute truth"; each can give a glimpse of truth but they can also lead to a false sense of truth, a trip down a rabbit hole if you will.

Given these caveats, it remains to explain why your three analyses gave the same results. Fortunately the explanation is simple. The three analyses (one which starts with no terms in the model and then adds one at a time [forward], one that starts with all terms in the model and then removes terms one at a time [backward], and one that starts with no terms in the model and then adds and removes terms one at a time [stepwise]) all wound up with the same results. This is not a problem, it simply reflects the relations in your data. In fact the fact that the three methods all give the same result make me, at least, feel a bit better about the results you obtained with a method that is far from optimal.

In general it is considered better to model your data using "standard" modeling techniques that make use of your knowledge of the field rather than using one of the three techniques you used. This being said, sometimes when one has many independent variables, the three techniques can help one understand what is happening, but any inference drawn from the models must be taken with many a grain of salt.
John


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

>>> Frank Harrell <f.harrell at vanderbilt.edu> 6/10/2011 7:06 AM >>>
Which statistical principles are you invoking on which to base such analyses?
Frank

Sergio Della Franca wrote:
> 
> Dear R-Helpers,
> 
> I want to perform a logistic regression on my dataset (y).
> 
> I used the following code:
> 
> logistic<-glm(formula="interest_variable"~.,family = binomial(link =
> logit),
> data = y)
> 
> 
> This run correctly.
> Then i want to develop the logistic regression with three different
> method:
> -forward
> -backward
> -stepwise
> 
> I used these procedure:
> forward<-step(logistica,direction="forward")
> backward<-step(logistica,direction="backward")
> stepwise<-step(logistica,direction="both")
> 
> Even these run correctly, but i obtained the same results with the three
> different procedures.
> 
> Then I tought i made some mistakes.
> 
> My question is:
> 
> Is correct what i did?
> Is correct that three different methods return the same results?
> 
> If i made some mistakes, what is the correct method to correctly perform
> the
> three different logistics regression?
> 
> 
> Thank you in advance.
> 
> 
> Sergio Della Franca.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/Logistic-Regression-tp821301p3588135.html 
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}



More information about the R-help mailing list