[R-sig-eco] gbm.simplify Error (nTrain*bag.fraction)
Eva Amorim
eva.amorim at gmail.com
Mon Jan 5 17:03:06 CET 2015
Dear all,
I am trying to evaluate the influence of several oceanographic
environmental parameters on the presence/absence of a fish species in an
estuary using boosted regression trees. For that I tried the gbm.step
function provided in the package dismo.
Since I have many predictors, I used gbm.simplify to drop the
non-informative predictors and improve the predictive performance of the
models.
But one of my datasets has a very small number of observations, n=44.
Although the function gbm.step appears to run fine on this dataset, when I
apply gbm.simplify to the model, I get the following error:
*Error in gbm.fit(x, y, offset = offset, distribution = distribution, w =
w, : *
* The dataset size is too small or subsampling rate is too large:
nTrain*bag.fraction <= n.minobsinnode*
I provide an example using Anguilla_train the dataset:
data(Anguilla_train)
# reduce data set to 44 obs.
Anguilla_train <- Anguilla_train[245:288,]
# apply gbm.step with a bag.fraction=0.75
model <- gbm.step(data=Anguilla_train, gbm.x = c("SegSumT", "SegTSeas",
"SegLowFlow", "DSDist", "DSMaxSlope", "USAvgT",
"USRainDays", "USSlope", "USNative",
"DSDam", "Method", "LocSed"),
gbm.y = "Angaus", family = "bernoulli",
tree.complexity = 1,
learning.rate = 0.001, bag.fraction =
0.75, n.folds =5)
#apply gbm.simplify to the model
model.simp<- gbm.simplify(model, n.drops=3)
When I check the components of my model object:
model$nTrain
#[1] 44
model$bag.fraction
#[1] 0.75
model$n.minobsinnode
# [1] 10
So if I understand correctly, 44*0.75>10, which allowed the model to be
built with the function gbm.step. I assume gbm.simplify would run based on
the settings established previously for the model... So why does the error
message only appear for this function and not for both?
If I change the bag.fraction to 1, the same happens.
I also tried to include a setting of n.minobsinnode=5 in the the gbm.step
function, but the default remains the same (=10). I guess that happens
because gbm.step function is an extension of the gbm functions in the gbm
package...
Any thoughts will be highly appreciated.
Thanks in advance.
Eva Amorim
[[alternative HTML version deleted]]
More information about the R-sig-ecology
mailing list