[R] binomial glm for relevant feature selection?
Ben Liblit
liblit at eecs.berkeley.edu
Mon Nov 11 00:50:01 CET 2002
As suggested in my earlier message, I have a large population of
independent variables and a binary dependent outcome. It is expected
that only a few of the independent variables actually contribute to the
outcome, and I'd like to find those.
If it wasn't already obvious, I am *not* a statistician. Not even
close. :-) Statistician colleagues have suggested that I use logistic
regression for this problem. My understanding is that logistic
regression is available in R as glm(..., family=binomial).
When I use this solver on fictitious data, though, the answers I expect
are not the answers I see. Consider the following fictitious data,
where "z" is the dependent binary outcome, "y" is irrelevant noise, and
"x" is actually relevant to predicting the outcome:
x y z
1 8 7 1
2 8 3 1
3 0 5 0
4 0 9 0
5 8 1 1
If I feed this data to glm(z ~ x + y) using the default gaussian family,
the results make some sense to me. The estimated coefficient for x is
positive and the corresponding "Pr(>|t|)" value is tiny (<2e-16), which
I take to imply a high degree of confidence that larger values of x
correlate with increased likelihood of z. Conversely, the estimated
coefficient for y has a "Pr(>|t|)" value of 0.552, which I take to imply
that there is no strong correlation between y and z. Good.
However, I've been told that I want to use family=binomial for a
logistic regression problem with a binary dependent outcome like this.
If I give this data to glm(z ~ x + y, family=binomial), the results
become quite mysterious. I receive a warning that "Algorithm did not
converge". The "Pr(>|t|)" values for x and y are 0.916 and 1.000
respectively, which would seem to indicate that neither one correlates
with the outcome.
I realize that this is not a problem with R. It is a problem with my
understanding of what R is doing. But you all have been so helpful thus
far, perhaps I can impose on you to give me one more clue? What am I
doing wrong here? What should I be looking at that I'm not?
Thank you, once again!
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list