[R] package 'gradientForest' and 'extendedForest'

Tue Aug 26 15:03:20 CEST 2014

Dear experts,

I have 5 environmental predictors and abundance data (300 samples, 60 
species, transformation: log(x + min(x,x > 0) and use the function 
'gradientForest' to estimate (R²-weighted) predictor importance 
(regression trees). The resulting predictor importance in decreasing 
order is as follows: pred1, pred2, pred3, pred4, pred5. The two species 
with the highest R² (goodness-of-fit; output value 'result' of function 
'gradientForest') are species 1 (R²=0.76), species 2 (R²=0.74), and 
species 3 (R²=0.72). To my understanding this means that the model (i.e. 
the predictor importance ranking) fits best to species 1, 2, and 3 in 
decreasing order. In a further step I want to know which predictors are 
the most important for selected species. Thus, I ran separate forests 
using the 'extendedForest' function with the same parameter settings 
(and the same set.seed()) as in the function call of 'gradientForest' 
for species 1, 2, and 3 (and others). Now the resulting predictor 
importance is (in decreasing order): species1: pred1, pred2, pred4, 
pred3, pred5; species2: pred1, pred4, pred2, pred5, pred3; species3: 
pred2, pred4, pred5, pred1, pred3. This seems strange to me, because I 
believed that the 'extendedForest' function should give similar 
predictor importance rankings as the 'gradientForest' predictor 
importance ranking for the species with the highest R² values obtained 
by 'gradientForest' . I'd be grateful for any help. Thanks a lot in 
anticipation.

Best regards

Thomas

	[[alternative HTML version deleted]]