[R-sig-eco] cross-validation fold.vector during gbm.simplify
Michel Bechtold
michel.bechtold at ti.bund.de
Wed Feb 5 16:18:23 CET 2014
Dear list,
I am using the 'dismo' package for the regionalization of water table
depth in German peatlands. BRT showed to be a powerful tool for this
data set. I have the following question related to the model
simplification process:
I start gbm.step with a predefined fold.vector to leave out whole
peatland areas in the cross-validation scheme. This gives me a much more
robust model, compared to a simple 10-fold cross-validation.
When I run gbm.simplify afterwards, I feel that the simplification and
the change of the predictive deviance is generated by a 10-fold
cross-validation and not by using the predefined fold.vector, which is
still saved in the output object of gbm.step. As a consequence,
gbm.simplify seems to suggest an overfitted model with many parameters
(37) . Manual runs with only e.g. the 6 most powerful parameters show
better cross-validation runs for the predefined fold.vector.
Does gbm.simplify so far only work for a random cross-validation and not
with a predefined fold.vector? Do you have recommendations for a
workaround or how to handle this problem?
Thanks a lot in advance for any answer and advice.
best,
Michel
More information about the R-sig-ecology
mailing list