[R-sig-eco] gbm.simplify Error (nTrain*bag.fraction)

Eva Amorim eva.amorim at gmail.com
Mon Jan 5 17:03:06 CET 2015


Dear all,


I am trying to evaluate the influence of several oceanographic
environmental parameters on the presence/absence of a fish species in an
estuary using boosted regression trees. For that I tried the gbm.step
function provided in the package dismo.

Since I have many predictors, I used gbm.simplify to drop the
non-informative predictors and improve the predictive performance of the
models.

But one of my datasets has a very small number of observations, n=44.
Although the function gbm.step appears to run fine on this dataset, when I
apply gbm.simplify to the model, I get the following error:



*Error in gbm.fit(x, y, offset = offset, distribution = distribution, w =
w,  : *

*  The dataset size is too small or subsampling rate is too large:
nTrain*bag.fraction <= n.minobsinnode*



I provide an example using Anguilla_train the dataset:



data(Anguilla_train)

# reduce data set to 44 obs.

Anguilla_train <- Anguilla_train[245:288,]

# apply gbm.step with a bag.fraction=0.75

model <- gbm.step(data=Anguilla_train, gbm.x = c("SegSumT", "SegTSeas",

                      "SegLowFlow", "DSDist", "DSMaxSlope", "USAvgT",

                       "USRainDays", "USSlope", "USNative",

"DSDam", "Method", "LocSed"),

                                    gbm.y = "Angaus", family = "bernoulli",
tree.complexity = 1,

                                    learning.rate = 0.001, bag.fraction =
0.75, n.folds =5)



#apply gbm.simplify to the model

model.simp<- gbm.simplify(model, n.drops=3)





When I check the components of my model object:

model$nTrain

#[1] 44

model$bag.fraction

#[1] 0.75

model$n.minobsinnode

# [1] 10



So if I understand correctly, 44*0.75>10, which allowed the model to be
built with the function gbm.step. I assume gbm.simplify would run based on
the settings established previously for the model... So why does the error
message only appear for this function and not for both?



If I change the bag.fraction to 1, the same happens.

I also tried to include a setting of n.minobsinnode=5 in the the gbm.step
function, but the default remains the same (=10). I guess that happens
because gbm.step function is an extension of the gbm functions in the gbm
package...



Any thoughts will be highly appreciated.


Thanks in advance.


Eva Amorim

	[[alternative HTML version deleted]]



More information about the R-sig-ecology mailing list