[R-sig-eco] cross-validation fold.vector during gbm.simplify

Michel Bechtold michel.bechtold at ti.bund.de
Wed Feb 5 16:18:23 CET 2014


Dear list,

I am using the 'dismo' package for the regionalization of water table 
depth in German peatlands. BRT showed to be a powerful tool for this 
data set. I have the following question related to the model 
simplification process:
I start gbm.step with a predefined fold.vector to leave out whole 
peatland areas in the cross-validation scheme. This gives me a much more 
robust model, compared to a simple 10-fold cross-validation.
When I run gbm.simplify afterwards, I feel that the simplification and 
the change of the predictive deviance is generated by a 10-fold 
cross-validation and not by using the predefined fold.vector, which is 
still saved in the output object of gbm.step. As a consequence, 
gbm.simplify seems to suggest an overfitted model with many parameters 
(37) . Manual runs with only e.g. the 6 most powerful parameters show 
better cross-validation runs for the predefined fold.vector.
Does gbm.simplify so far only work for a random cross-validation and not 
with a predefined fold.vector? Do you have recommendations for a 
workaround or how to handle this problem?

Thanks a lot in advance for any answer and advice.

best,
Michel



More information about the R-sig-ecology mailing list