[R] Problem with rpart

Silvia Lomascolo slomascolo at zoo.ufl.edu
Fri May 25 23:34:10 CEST 2007


I work on Windows, R version 2.4.1.  I'm very new with R!

I am trying to build a classification tree using rpart but, although the
matrix has 108 variables, the program builds a tree with only one split
using one variable!  I know it is probable that only one variable is
informative, but I think it's unlikely.  I was wondering if someone can help
me identify if I'm doing something wrong because I can't see it, nor could I
find it in the help or in this forum.

I want to see whether I can predict disperser type (5 categories) of a
species given the volatile compounds that the fruits emit (108 volatiles) 
I am writing:

>dispvol.x<- read.table ('C:\\Documents and
Settings\\silvia\\...\\volatile_disperser_matrix.txt', header=T)
>dispvol.df<- as.data.frame (dispvol.x)
>attach (dispvol.df) #I think I need to do this so the variables are
identified when I write the regression equation
>dispvol.ctree <- rpart (disperser~	P3.70	+P4.29	+P5.05	+...	+P30.99	+P32.25
+TotArea, >data= dispvol.df, method='class')

and I get the following output:

n= 28 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 28 15 non (0.036 0.32 0.071 0.11 0.46)  
  2) P10.01>=1.185 10  4 bat (0.1 0.6 0.2 0 0.1) *
  3) P10.01< 1.185 18  6 non (0 0.17 0 0.17 0.67) *

There is nothing special about P10.01 that I can see in my data and I don't
know why it chooses that variables and stops there!

My matrix looks something like this (except, with a lot more variables)

disperser	P3.70	P4.29	P6.45	P6.55	P10.01	P10.15	P10.18	TotArea
ban	0.00	0.00	1.34	0.00	1.49	0.00	0.00	2.83
non	0.00	0.00	0.00	152.80	0.00	14.31	0.00	167.11
bat	0.00	0.00	0.00	131.56	0.65	0.00	0.00	132.21
bat	0.00	0.00	5.05	0.00	13.01	6.85	0.00	24.90
non	0.00	0.00	72.65	103.26	4.10	0.00	0.00	180.02
non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
bat	1.23	0.00	0.48	0.89	0.25	0.00	0.00	2.85
bat	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
non	0.00	0.00	0.00	0.00	1.06	0.00	0.00	1.06
bat	0.00	0.00	0.00	0.00	28.69	0.00	21.33	50.02
mix	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
non	0.00	0.00	0.00	0.00	1.15	0.00	0.00	1.15
non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
non	0.00	0.82	0.00	1.65	0.00	0.00	0.00	2.47
bat	0.00	0.00	133.24	0.00	3.13	0.00	0.00	136.37
bir	0.00	0.00	11.08	3.16	1.79	2.09	0.48	18.61
non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
mix	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
bat	0.00	0.00	0.00	0.00	1.31	0.00	0.00	1.31
non	0.00	0.00	0.00	0.00	0.00	0.00	1.23	1.23
bat	0.00	0.00	1.81	0.00	2.84	0.00	0.00	4.65
non	0.00	0.00	1.18	0.00	0.73	0.00	0.00	1.91
bir	0.00	0.00	0.00	0.00	1.40	0.00	0.00	1.40
bat	0.00	0.00	8.16	1.50	1.22	0.00	0.00	10.88
mix	0.00	0.55	0.00	0.00	0.00	0.00	0.00	0.55
non	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

Thanks! Silvia.

-- 
View this message in context: http://www.nabble.com/Problem-with-rpart-tf3818436.html#a10810625
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list