[R] library(rpart) or library(tree)

Ingo Holz Ingo.Holz at uni-hohenheim.de
Wed Dec 19 17:14:45 CET 2007


Hi,

 I have a problem with library (rpart) (and/or library(tree)).

 I use a data.frame with variables
"pnV22" (observation: 1, 0 or yes, no)
"JTemp" (mean temperature)
"SNied"  (summer rain)

 I used function "rpart" to build a model:

	library(rpart)
	attach(data.frame)
	result <- rpart(pnV22 ~ JTemp + SNied)

 I got the following tree:

  n=55518 (50 observations deleted due to missingness)

node), split, n, deviance, yval
      * denotes terminal node

 1) root 55518 668.744500 0.0121942400  
   2) punkte[["JTemp"]]< 10.35 51251  18.992960 0.0003707245 *
   3) punkte[["JTemp"]]>=10.35 4267 556.532000 0.1542067000  
     6) punkte[["SNied"]]>=450 3136 291.318600 0.1036352000 *
     7) punkte[["SNied"]]< 450 1131 234.954900 0.2944297000  
      14) punkte[["JTemp"]]>=10.55 723 113.502100 0.1950207000 *
      15) punkte[["JTemp"]]< 10.55 408 101.647100 0.4705882000  
        30) punkte[["JTemp"]]< 10.45 48   4.479167 0.1041667000 *
        31) punkte[["JTemp"]]>=10.45 360  89.863890 0.5194444000 *

 I constructed a simple new.data.frame:

     new.data.fame <- data.frame
     new.data.frame[,"JTemp"] <- 10.5
     new.data.frame[,"SNied"] <- 430

Than I used predict() to predict values for "pnV22" in the following way:

    pred <- predict(result, data.frame)
    pred2 <- predict(result, new.data.frame)

The results are the same, which I checked by ploting the values of pred and pred2 and by

   table(pred ==pred2)  which is true for all values.

Looking at the tree I would expect that pred2 has the same high value for all elements of the 
vector. Did I make a mistake?

Thanks, Ingo



More information about the R-help mailing list