[R] Problem with rpart

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat May 26 06:52:57 CEST 2007

You only have 43 cases.  After one split, the groups are too small 
to split again with the default settings.  See  ?rpart.control.

On Fri, 25 May 2007, Silvia Lomascolo wrote:

> I work on Windows, R version 2.4.1.  I'm very new with R!
> I am trying to build a classification tree using rpart but, although the
> matrix has 108 variables, the program builds a tree with only one split
> using one variable!  I know it is probable that only one variable is
> informative, but I think it's unlikely.  I was wondering if someone can help
> me identify if I'm doing something wrong because I can't see it, nor could I
> find it in the help or in this forum.
> I want to see whether I can predict disperser type (5 categories) of a
> species given the volatile compounds that the fruits emit (108 volatiles)
> I am writing:
>> dispvol.x<- read.table ('C:\\Documents and
> Settings\\silvia\\...\\volatile_disperser_matrix.txt', header=T)
>> dispvol.df<- as.data.frame (dispvol.x)
>> attach (dispvol.df) #I think I need to do this so the variables are
> identified when I write the regression equation
>> dispvol.ctree <- rpart (disperser~	P3.70	+P4.29	+P5.05	+...	+P30.99	+P32.25
> +TotArea, >data= dispvol.df, method='class')
> and I get the following output:
> n= 28
> node), split, n, loss, yval, (yprob)
>      * denotes terminal node
> 1) root 28 15 non (0.036 0.32 0.071 0.11 0.46)
>  2) P10.01>=1.185 10  4 bat (0.1 0.6 0.2 0 0.1) *
>  3) P10.01< 1.185 18  6 non (0 0.17 0 0.17 0.67) *
> There is nothing special about P10.01 that I can see in my data and I don't
> know why it chooses that variables and stops there!
> My matrix looks something like this (except, with a lot more variables)
> disperser	P3.70	P4.29	P6.45	P6.55	P10.01	P10.15	P10.18	TotArea
> ban	0.00	0.00	1.34	0.00	1.49	0.00	0.00	2.83
> non	0.00	0.00	0.00	152.80	0.00	14.31	0.00	167.11
> bat	0.00	0.00	0.00	131.56	0.65	0.00	0.00	132.21
> bat	0.00	0.00	5.05	0.00	13.01	6.85	0.00	24.90
> non	0.00	0.00	72.65	103.26	4.10	0.00	0.00	180.02
> non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> bat	1.23	0.00	0.48	0.89	0.25	0.00	0.00	2.85
> bat	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> non	0.00	0.00	0.00	0.00	1.06	0.00	0.00	1.06
> bat	0.00	0.00	0.00	0.00	28.69	0.00	21.33	50.02
> mix	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> non	0.00	0.00	0.00	0.00	1.15	0.00	0.00	1.15
> non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> non	0.00	0.82	0.00	1.65	0.00	0.00	0.00	2.47
> bat	0.00	0.00	133.24	0.00	3.13	0.00	0.00	136.37
> bir	0.00	0.00	11.08	3.16	1.79	2.09	0.48	18.61
> non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> mix	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> bat	0.00	0.00	0.00	0.00	1.31	0.00	0.00	1.31
> non	0.00	0.00	0.00	0.00	0.00	0.00	1.23	1.23
> bat	0.00	0.00	1.81	0.00	2.84	0.00	0.00	4.65
> non	0.00	0.00	1.18	0.00	0.73	0.00	0.00	1.91
> bir	0.00	0.00	0.00	0.00	1.40	0.00	0.00	1.40
> bat	0.00	0.00	8.16	1.50	1.22	0.00	0.00	10.88
> mix	0.00	0.55	0.00	0.00	0.00	0.00	0.00	0.55
> non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
> Thanks! Silvia.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list