[R] survival analysis using rpart
Walter345
walter345 at yahoo.com
Mon Feb 26 18:37:56 CET 2007
Hello,
I use rpart to predict survival time and have a problem in interpreting the
output of “estimated rate”. Here is an example of what I do:
> stagec <-
> read.table("http://www.stanford.edu/class/stats202/DATA/stagec.data",
> col.names=c("pgtime", "pgstat", "age","eet", "g2", "grade", "gleason",
> "ploidy"))
> fit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason +
> ploidy, data=stagec)
Result:
1) root 146 195.411600 1.0000000
2) grade< 2.5 61 45.021520 0.3624701
4) g2< 11.36 33 9.120116 0.1225562 *
5) g2>=11.36 28 27.804100 0.7335298
10) gleason< 5.5 20 14.376900 0.5292190 *
11) gleason>=5.5 8 11.201470 1.3083680 *
3) grade>=2.5 85 125.327400 1.6190620
6) age>=56.5 75 104.154700 1.4287310
12) gleason< 7.5 50 66.701410 1.1431320 *
13) gleason>=7.5 25 33.993130 2.0355220
26) g2>=15.29 13 16.555970 1.3494740 *
27) g2< 15.29 12 14.220260 2.9210480 *
7) age< 56.5 10 15.522810 3.1977430 *
Let’s look at the terminal node 4:
# PGTIME PGSTAGE AGE EET G2 GRADE GLEASON PLOIDY
1 8.657084 0 70 1 4.43 1 3 1
2 16.70088 0 56 2 5.29 1 3 1
3 3.162217 1 62 2 3.57 2 4 1
4 10.20123 0 63 2 5.14 2 5 1
5 4.479124 0 63 2 5.75 2 5 1
6 6.516084 0 66 2 5.92 2 5 1
7 4.936345 0 67 2 6.41 2 5 1
8 10.79808 0 72 1 6.68 2 NA 1
9 9.174537 0 62 1 6.74 2 5 1
10 10.87474 0 72 2 6.8 2 5 1
11 7.028062 0 52 2 7.15 2 7 1
12 11.36481 0 59 2 7.61 2 5 1
13 10.17659 0 64 1 7.61 2 NA 1
14 6.96783 0 67 2 7.78 2 6 1
15 10.61738 0 55 2 7.81 2 5 1
16 6.510609 0 70 1 7.88 2 6 1
17 10.36276 0 55 2 8.1 2 5 1
18 6.694045 0 54 2 8.11 2 4 1
19 11.718 0 61 2 8.4 2 5 1
20 7.301847 0 69 2 8.46 2 5 1
21 6.067077 0 69 2 8.58 2 6 1
22 8.353182 0 59 2 8.76 2 6 1
23 5.541409 0 59 1 9.01 2 5 1
24 5.492128 0 61 2 9.42 2 5 1
25 7.208761 0 63 1 9.76 2 5 1
26 6.004106 0 52 2 9.9 2 4 1
27 5.664613 0 71 1 10.16 2 6 1
28 6.130047 0 64 2 10.26 2 4 1
29 9.812457 0 64 1 10.51 2 5 1
30 6.275154 0 62 2 10.82 2 6 1
31 9.253935 0 61 2 11.23 2 5 1
32 5.201916 0 54 2 11.35 2 6 1
33 6.22861 0 65 2 11.35 2 5 1
Here we have 33 observations and 1 event. The “estimated rate” is 0.1225562.
My questions are:
(1) Is the “estimated rate” the estimated hazard rate ratio?
(2) How does rpart calculate this rate?
(3) Suppose I use xpred.rpart(fit, xval=10) to perform 10-fold
cross-validation using (a) the complete stagec data set and (b) only a
subset of it, say, using the columns Age, EET, and G2 only. For the i-th
patient, I am likely to obtain a different estimated rate. How can I
meaningfully compare both rates? How can say which one is “better”?
Thanks a lot for all comments!
Walter
--
View this message in context: http://www.nabble.com/survival-analysis-using-rpart-tf3294276.html#a9163329
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list