[R] library(rpart) or library(tree)
Ingo Holz
Ingo.Holz at uni-hohenheim.de
Wed Dec 19 17:14:45 CET 2007
Hi,
I have a problem with library (rpart) (and/or library(tree)).
I use a data.frame with variables
"pnV22" (observation: 1, 0 or yes, no)
"JTemp" (mean temperature)
"SNied" (summer rain)
I used function "rpart" to build a model:
library(rpart)
attach(data.frame)
result <- rpart(pnV22 ~ JTemp + SNied)
I got the following tree:
n=55518 (50 observations deleted due to missingness)
node), split, n, deviance, yval
* denotes terminal node
1) root 55518 668.744500 0.0121942400
2) punkte[["JTemp"]]< 10.35 51251 18.992960 0.0003707245 *
3) punkte[["JTemp"]]>=10.35 4267 556.532000 0.1542067000
6) punkte[["SNied"]]>=450 3136 291.318600 0.1036352000 *
7) punkte[["SNied"]]< 450 1131 234.954900 0.2944297000
14) punkte[["JTemp"]]>=10.55 723 113.502100 0.1950207000 *
15) punkte[["JTemp"]]< 10.55 408 101.647100 0.4705882000
30) punkte[["JTemp"]]< 10.45 48 4.479167 0.1041667000 *
31) punkte[["JTemp"]]>=10.45 360 89.863890 0.5194444000 *
I constructed a simple new.data.frame:
new.data.fame <- data.frame
new.data.frame[,"JTemp"] <- 10.5
new.data.frame[,"SNied"] <- 430
Than I used predict() to predict values for "pnV22" in the following way:
pred <- predict(result, data.frame)
pred2 <- predict(result, new.data.frame)
The results are the same, which I checked by ploting the values of pred and pred2 and by
table(pred ==pred2) which is true for all values.
Looking at the tree I would expect that pred2 has the same high value for all elements of the
vector. Did I make a mistake?
Thanks, Ingo
More information about the R-help
mailing list