[R] partykit ctree: minbucket and case weights

Henric Winell nilsson.henric at gmail.com
Fri May 30 09:39:45 CEST 2014


Amber Dawn Nolder wrote 2014-05-28 23:16:
>
>     Hello,
>     I am an R novice, and I am using the "partykit" package to create
>     regression trees. I used the following to generate the trees:
>     ctree(y~x1+x2+x3+x4,data=my_data,control=ctree_control(testtype =
>     "Bonferroni", mincriterion = 0.90, minsplit = 12, minbucket = 4,
>     majority = TRUE)
>     I thought that "minbucket" set the minimum value for the sum of weights
>     in each terminal node, and that each case weight is 1, unless otherwise
>     specified. In which case, the sum of case weights in a node should equal the
>     number of cases (n) in that node. However, I  sometimes obtain a tree with
>     a terminal node that contains fewer than 4 cases.

I do agree that the tree below looks suspicious.  You may have found a 
bug.

But you didn't provide "commented, minimal, self-contained, reproducible 
code", i.e., we're missing your 'my_data' object, and therefore we 
cannot reproduce this easily.  Can you please provide us with the output 
from 'dput(my_data)'?

>     My data set has a total of 36 cases. The dependent and all independent
>     variables are continuous data. Variables x1 and x2 contain missing (NA)
>     values.

I tried a few other data sets and there the results seem to come out OK 
(even after inducing NAs).

>     Could someone please explain why I am getting these results?

Probably.  But you need to provide a reproducible example and the 
details obtained by 'sessionInfo()'.

As per the posting guide, since this is a contributed package you should 
first contact its maintainer (Torsten Hothorn, CC'd) and only post here 
if you get no reply.  Did you try contacting Torsten?

>     Am I  mistaken about the value of case weights or about the use of minbucket
>     to restrict the size of a terminal node?

I don't think you're mistaken since '?ctree_control' says that 
"minbucket: the minimum sum of weights in a terminal node."


Henric



>     This is an example of the output:
>     Model formula:
>     y ~ x1 + x2 + x3 + x4
>     Fitted party:
>     [1] root
>     |   [2] x4 <= 30: 0.927 (n = 17, err = 1.1)
>     |   [3] x4 > 30
>     |   |   [4] x2 <= 43: 0.472 (n = 8, err = 0.4)
>     |   |   [5] x2 > 43
>     |   |   |   [6] x3 <= 0.4: 0.282 (n = 3, err = 0.0)
>     |   |   |   [7] x3 > 0.4: 0.020 (n = 8, err = 0.0)
>     Number of inner nodes:    3
>     Number of terminal nodes: 4
>     Many thanks!
>     Amber Nolder
>     Graduate Student
>     Indiana University of Pennsylvania
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list