[R-sig-eco] no splits possible - in mvpart
Sarah Goslee
sarah.goslee at gmail.com
Sat Jan 29 17:20:19 CET 2011
Hi Mike,
You need to carefully read the help for mvpart, rpart, and rpart.control -
this is a complex procedure and there are a lot of possible options
and ways to screw up.
cp is the complexity parameter - a proposed split must be as good or
better than cp to even be considered. If you aren't getting any splits,
then none of the splits possible in your data are good enough at that
level.
More formally, from the help for rpart.control:
cp complexity parameter. Any split that does not decrease the overall
lack of fit by
a factor of cp is not attempted. For instance, with anova splitting, this means
that the overall Rsquare must increase by cp at each step. The main
role of this
parameter is to save computing time by pruning off splits that are
obviously not
worthwhile. Essentially,the user informs the program that any split which does
not improve the fit by cp will likely be pruned off by
cross-validation, and that
hence the program need not pursue it.
I believe that classification trees (not clustering) as implemented in R are
covered in some detail in MASS; you should probably also find and read that.
Sarah
On Fri, Jan 28, 2011 at 8:31 PM, Mike Marsh <swamp at blarg.net> wrote:
> I am clustering vegetation richness (0 or 1) data that is segregated by
> growth form, i.e. Shrub, Annual Grass, Perennial Grass, etc., using mvpart
> for comparison with clustering by hclust.
> The environmental file has four variables, Slope, Elevation, heatload, and
> Ecological Site (a measure of soil and land form type).
> When four of the six data files are analyzed, a split is successful when raw
> data are analyzed, but a message,
>
> "No splits possible -- try decreasing cp"
>
> appears when data standardized by "scaler" are submitted.
> My question: What does the message mean? How would I decrease cp.
>
> I have re-read De'Ath, 2002 (Ecology 83:1105) regarding cross-validation,
> and I assume that xerror in the table produced by printcp is that quantity.
> In the present instance, there are only two leaves to the tree, and further
> reduction of cp would seem impossible
>
> A further puzzle is that when the smallest dataset (not included in this
> analysis), with only 6 columns, is analyzed, a result is obtained for
> standardized data. The Shrub data resented here as an example, have 27
> columns, the Annual.Forb data, 35 columns.
>
> Here is my script, with output:
>
>> set.seed(1)
>> Shrub.mrt<-mvpart(Shrub~.,Qenv)
>> printcp(Shrub.mrt)
> mvpart(form = Shrub ~ ., data = Qenv)
>
> Variables actually used in tree construction:
> [1] Alt.E
>
> Root node error: 69.727/22 = 3.1694
>
> n= 22
>
> CP nsplit rel error xerror xstd
> 1 0.23477 0 1.00000 1.1064 0.09480
> 2 0.12882 1 0.76523 1.0372 0.10470
>> Shrub.std<- scaler(Shrub, col="mean1", row="mean1")
>> Shrub.std.mrt<-mvpart(Shrub.std~.,Qenv)
> No splits possible -- try decreasing cp
>> printcp(Shrub.std.mrt)
> rpart(formula = form, data = data)
>
> Variables actually used in tree construction:
> character(0)
>
> Root node error: 0/0 = NaN
>
> n=0 (22 observations deleted due to missingness)
>
> CP nsplit rel error
> 1 NaN 0 NaN
>>
>> set.seed(1)
>> Annual.Forb.mrt<-mvpart(Annual.Forb~.,Qenv)
>> printcp(Annual.Forb.mrt)
> mvpart(form = Annual.Forb ~ ., data = Qenv)
>
> Variables actually used in tree construction:
> [1] Slope
>
> Root node error: 105.27/22 = 4.7851
>
> n= 22
>
> CP nsplit rel error xerror xstd
> 1 0.135579 0 1.00000 1.1085 0.081214
> 2 0.096179 1 0.86442 1.0827 0.079488
>> Annual.Forb.std<- scaler(Annual.Forb, col="mean1", row="mean1")
>> Annual.Forb.std.mrt<-mvpart(Annual.Forb.std~.,Qenv)
>> printcp(Annual.Forb.std.mrt)
> mvpart(form = Annual.Forb.std ~ ., data = Qenv)
>
> Variables actually used in tree construction:
> [1] Elev
>
> Root node error: 4282.1/22 = 194.64
>
> n= 22
>
> CP nsplit rel error xerror xstd
> 1 0.15587 0 1.00000 1.1015 0.12860
> 2 0.10174 1 0.84413 1.0949 0.12898
>> printcp(Annual.Grass.std.mrt)
> mvpart(form = Annual.Grass.std ~ ., data = Qenv)
>
> Variables actually used in tree construction:
> [1] heatld
>
> Root node error: 219.76/22 = 9.989
>
> n= 22
>
> CP nsplit rel error xerror xstd
> 1 0.12602 0 1.00000 1.1179 0.43984
> 2 0.11866 1 0.87398 1.4865 0.51020
>>
> While output for the standardized data for annual forb is the same as with
> raw data, this is often not the case in my larger dataset.
>
> data files are appended, and will be provided separately on request.
>
> Thanks very much for looking at this.
>
> Mike Marsh
> Washington Native Plant Society
>
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-sig-ecology
mailing list