[R] Couple of Questions about Classification trees

Frank E Harrell Jr f.harrell at vanderbilt.edu
Wed Mar 11 21:12:01 CET 2009


Jen_mp3 wrote:
> So I have 2 sets of data - a training data set and a test data set. I've been
> doing the analysis on the training data set and then using predict and
> feeding the test data through that. There are 114 rows in the training data
> and 117 in the test data and 1024 columns in both. It's actually the same
> set of data split into two. The rows are made of 5 different numbers. They
> do represent something but it would take too long to explain.

Your sample size is too small by a factor of perhaps 100 for simple data 
splitting to provide stable results.  Then you have the problem of an 
improper scoring rule, i.e., one that when optimized gives the wrong answer.

Frank Harrell

> 
> I want to try and find a classification rule for the 5 numbers in the rows
> based on the columns so I created a classification tree and plotted that and
> then pruned it. My question is how do you print the misclassification rate
> at each node on the actual diagram of the classification tree. I can't seem
> to get it up there. In my notes it uses gmistext() but I have a feeling that
> it's for Splus rather than R as gmistext() doesn.t work for me either. 
> 
> Second question is when I try using the predict.tree to put the test data
> into the tree and then plot it it comes up with a really weird and wrong
> looking plot. Here is the code I'm using:
> tree1 <- tree(row~.,data=train)
> pruned.tree <- prune.tree(tree1, best = 5, method = "misclass")
> predict.tree1 <- predict.tree(prune.tree, data = main)
> plot(predict.tree);text(predict.tree)
> I sort of don't get a classification tree, I get the x axis labelled 1, the
> y axis labelled 2 and then about 4 small black rectangles scattered across
> the plot. 
> 
> Thanks in Advance. 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list