[R] method of rpart when response variable is binary?
ronggui
ronggui.huang at gmail.com
Fri Jun 15 15:27:35 CEST 2007
Dear all,
I would like to model the relationship between y and x. y is binary
variable, and x is a count variable which may be possion-distribution.
I think it is better to divide x into intervals and change it to a
factor before calling glm(y~x,data=dat,family=binomail).
I try to use rpart. As y is binary, I use "class" method and get the
following result.
> rpart(y~x,data=dat,method="class")
n=778 (22 observations deleted due to missingness)
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 778 67 0 (0.91388175 0.08611825) *
If with the default method, I get such a result.
> rpart(y~x,data=dat)
n=778 (22 observations deleted due to missingness)
node), split, n, deviance, yval
* denotes terminal node
1) root 778 61.230080 0.08611825
2) x< 19.5 750 53.514670 0.07733333
4) x< 1.25 390 17.169230 0.04615385 *
5) x>=1.25 360 35.555560 0.11111110 *
3) x>=19.5 28 6.107143 0.32142860 *
If I use 1.25 and 19.5 as the cutting points, change x into factor by
>x2 <- cut(q34b,breaks=c(0,1.25,19.5,200),right=F)
The coef in y~x2 is significant and makes sense.
My problem is: is it OK use the default method in rpart when response
varibale is binary one? Thanks.
--
Ronggui Huang
Department of Sociology
Fudan University, Shanghai, China
More information about the R-help
mailing list