[R] Why the order of parameters in a logistic regression affects results significantly?
David Winsemius
dwinsemius at comcast.net
Fri Jul 22 19:24:48 CEST 2016
> On Jul 21, 2016, at 3:04 PM, Qinghua He via R-help <r-help at r-project.org> wrote:
>
> Using the same data, if I ran
> fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
> I obtained:
exp(coef(fit2))(Intercept) Age LumA LumB HER2 Basal Normal
0.24866935 1.00433781 0.10639937 0.31614001 0.08220685 20.25180956 NA
> while if I ran
>
> fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
> I obtained:
exp(coef(fit2)) (Intercept) Age LumA LumB Basal Normal HER2
0.02044232 1.00433781 1.29428846 3.84566516 246.35185956 12.16443690 NA
>
> Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain?
You have collinearity and one of your variables will be dropped as redundant. Which one is dropped is determined by the order of the variable names in the model formula.
> For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept?
In the first model (with the defaults of treatment contrasts) the Intercept is actually an estimate for cases with LumA, LumB,Basal,Her2 all at their lowest level and this not coincidentally also precisely defines your Normal variable. They all (excepting Normal) have adverse impact in your study of AR whatever it might be. If these various categories (which I suspect are breast cancer risk predictors) are all distinct with no overlaps, then use this:
fit2 <-glm(formula=AR~Age+ Normal+ LumA+LumB+HER2+Basal+ 0,family=binomial,data=RacComp1)
The results will probably be the same as your first model except that Intercept's parameter will now be the parameter for Normal.
> Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1).
> Thanks!
> Peter
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list