[R] Interpretation of csplit from rpart.object

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Sep 21 12:36:41 CEST 2005


Your message *was* received, and you can check the archives to see it at

https://stat.ethz.ch/pipermail/r-help/2005-September/077889.html

You need to read the code to answer the question for yourself.  There is 
lots of code interpreting csplit in the rpart package.  These lines might 
be a clue, for example

rpart.s:    if (ncat>0) ans$csplit <- catmat +2
pred.rpart.s:                        as.integer(fit$csplit -2),
summary.rpart.s:  paste(c("L", "-", "R")[x$csplit[x$splits[i,4], 1:temp[i]]],

The documentation is from the authors and may well be out of date: but you 
need to read much more carefully what it says (e.g. `this level').


On Wed, 21 Sep 2005 jmoreira at fe.up.pt wrote:

>
> I send again this help message once previously was detected a virus. So, I don't
> know if the R-list receive it. The virus problem is solved. Sorry for that.
>
> ----- Forwarded message from jmoreira at fe.up.pt -----
>    Date: Tue, 20 Sep 2005 14:35:12 +0100
>    From: jmoreira at fe.up.pt
> Reply-To: jmoreira at fe.up.pt
> Subject: Interpretation of csplit from rpart.object
>      To: r-help at stat.math.ethz.ch
>
> Dear members of R-list,
>
> I need to reproduce the rules of a decision tree. For that I need to use the
> csplit information from the rpart.object. But I cannot uderstand the
> information because from my example I get:
>> rpart.tree$csplit
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
> [1,]    1    3    3    1    3    3    3
> [2,]    2    3    3    1    2    2    2
> [3,]    1    3    3    1    3    3    3
> [4,]    2    3    3    1    2    2    2
> [5,]    2    3    3    1    2    2    2
> [6,]    2    1    3    2    3    1    1
> [7,]    2    3    3    2    3    3    1
> [8,]    2    3    3    1    2    2    2
> [9,]    2    1    3    2    3    1    1
> [10,]    2    1    3    3    2    2    2
> [11,]    2    1    1    2    1    1    3
> [12,]    2    3    3    1    2    2    2
> [13,]    2    1    1    2    3    1    1
> [14,]    2    3    3    1    2    2    2
> [15,]    2    1    3    2    1    1    1
> [16,]    2    3    1    1    2    2    2
> [17,]    2    3    3    1    2    2    2
> [18,]    2    1    3    2    1    3    1
> [19,]    2    3    3    1    2    2    2
> [20,]    2    1    3    2    1    3    3
> [21,]    2    3    1    2    2    2    2
> [22,]    2    1    3    2    1    1    1
>
> I don't understand why I have 22 rows (my tree has 21 nodes including the root
> node) and 7 columns (I have four explanatory variables: two numerics and two
> factors; plus the numeric target variable)
>
> ?rpart.object says:
>
>  csplit: this will be present only if one of the split variables is a
>          factor. There is one row for each such split, and column 'i =
>          -1' if this level of the factor goes to the left, '+1' if it
>          goes to the right, and 0 if that level is not present at this
>          node of the tree. For an ordered categorical variable all
>          levels are marked as 'R/L',  including levels that are not
>          present.
>
> The values I got are quite different.
>
> Can some one give me information on how to deal with that?
>
> Thanks in advance?
>
> Joao Moreira
>
>
> ----- End forwarded message -----
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list