[R] missing values in party::ctree
Torsten Hothorn
Torsten.Hothorn at stat.uni-muenchen.de
Fri Feb 18 09:07:45 CET 2011
On Thu, 17 Feb 2011, Andrew Ziem wrote:
> After ctree builds a tree, how would I determine the direction missing values follow by examining the BinaryTree-class object? For instance in the example below Bare.nuclei has 16 missing values and is used for the first split, but the missing values are not listed in either set of factors. (I have the same question for missing values among numeric [non-factor] values, but I assume the answer is similar.)
Hi Andrew,
ctree() doesn't treat missings in factors as a category in its own right.
Instead, it uses surrogate splits to determine the daughter node
observations with missings in the primary split variable are send to (you
need to specify `maxsurrogates' in ctree_control()).
However, you can recode your factor and add NA to the levels. This will
lead to the intended behaviour.
Best,
Torsten
>
>
>> require(party)
>> require(mlbench)
>> data(BreastCancer)
>> BreastCancer$Id <- NULL
>> ct <- ctree(Class ~ . , data=BreastCancer, controls = ctree_control(maxdepth = 1))
>> ct
>
> Conditional inference tree with 2 terminal nodes
>
> Response: Class
> Inputs: Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size, Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses
> Number of observations: 699
>
> 1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
> 2)* weights = 448
> 1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
> 3)* weights = 251
>> sum(is.na(BreastCancer$Bare.nuclei))
> [1] 16
>> nodes(ct, 1)[[1]]$psplit
> Bare.nuclei == {1, 2}
>> nodes(ct, 1)[[1]]$ssplit
> list()
>
>
>
> Based on below, the answer is node 2, but I don't see it in the object.
>
>> sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
> [1] 448
>> sum(BreastCancer$Bare.nuclei %in% c(1,2))
> [1] 432
>> sum(BreastCancer$Bare.nuclei %in% c(3:10))
> [1] 251
>
>
> Andrew
>
>
More information about the R-help
mailing list