[R] Interpretation of csplit from rpart.object

jmoreira@fe.up.pt jmoreira at fe.up.pt
Wed Sep 21 12:10:59 CEST 2005


I send again this help message once previously was detected a virus. So, I don't
know if the R-list receive it. The virus problem is solved. Sorry for that.

----- Forwarded message from jmoreira at fe.up.pt -----
    Date: Tue, 20 Sep 2005 14:35:12 +0100
    From: jmoreira at fe.up.pt
Reply-To: jmoreira at fe.up.pt
 Subject: Interpretation of csplit from rpart.object
      To: r-help at stat.math.ethz.ch

Dear members of R-list,

I need to reproduce the rules of a decision tree. For that I need to use the
csplit information from the rpart.object. But I cannot uderstand the
information because from my example I get:
> rpart.tree$csplit
      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]    1    3    3    1    3    3    3
 [2,]    2    3    3    1    2    2    2
 [3,]    1    3    3    1    3    3    3
 [4,]    2    3    3    1    2    2    2
 [5,]    2    3    3    1    2    2    2
 [6,]    2    1    3    2    3    1    1
 [7,]    2    3    3    2    3    3    1
 [8,]    2    3    3    1    2    2    2
 [9,]    2    1    3    2    3    1    1
[10,]    2    1    3    3    2    2    2
[11,]    2    1    1    2    1    1    3
[12,]    2    3    3    1    2    2    2
[13,]    2    1    1    2    3    1    1
[14,]    2    3    3    1    2    2    2
[15,]    2    1    3    2    1    1    1
[16,]    2    3    1    1    2    2    2
[17,]    2    3    3    1    2    2    2
[18,]    2    1    3    2    1    3    1
[19,]    2    3    3    1    2    2    2
[20,]    2    1    3    2    1    3    3
[21,]    2    3    1    2    2    2    2
[22,]    2    1    3    2    1    1    1

I don't understand why I have 22 rows (my tree has 21 nodes including the root
node) and 7 columns (I have four explanatory variables: two numerics and two
factors; plus the numeric target variable)

?rpart.object says:

  csplit: this will be present only if one of the split variables is a
          factor. There is one row for each such split, and column 'i =
          -1' if this level of the factor goes to the left, '+1' if it
          goes to the right, and 0 if that level is not present at this
          node of the tree. For an ordered categorical variable all
          levels are marked as 'R/L',  including levels that are not
          present.

The values I got are quite different.

Can some one give me information on how to deal with that?

Thanks in advance?

Joao Moreira


----- End forwarded message -----




More information about the R-help mailing list