[R-sig-Geo] Prediction variance (map) for predictions derived using RandomForest package

Tomislav Hengl hengl at spatial-analyst.net
Sun Jun 23 11:51:18 CEST 2013


Dear list,

I have a question about the randomForest models. I'm trying to figure 
out a way to estimate the prediction variance (spatially) for the 
randomForest function 
(http://cran.r-project.org/web/packages/randomForest/).

If I run a GLM I can also derive the prediction variance using:

 > demo(meuse, echo=FALSE)
 > meuse.ov <- over(meuse, meuse.grid)
 > meuse.ov <- cbind(meuse.ov, meuse at data)
 > omm0 <- glm(log1p(om)~dist+ffreq, meuse.ov, family=gaussian())
 > om.glm <- predict.glm(omm0, meuse.grid, se.fit=TRUE)
 > str(om.glm)
List of 3
  $ fit           : Named num [1:3103] 2.34 2.34 2.32 2.29 2.34 ...
   ..- attr(*, "names")= chr [1:3103] "1" "2" "3" "4" ...
  $ se.fit        : Named num [1:3103] 0.0491 0.0491 0.0481 0.046 0.0491 ...
   ..- attr(*, "names")= chr [1:3103] "1" "2" "3" "4" ...
  $ residual.scale: num 0.357

when I fit a randomForest model, I do not get any estimate of the model 
uncertainty (for each pixel) but just the predictions:

 > meuse.ov <- meuse.ov[-omm0$na.action,]
 > x <- randomForest(log1p(om)~dist+ffreq, meuse.ov)
 > om.rf <- predict(x, meuse.grid)
 > str(om.rf)
  Named num [1:3103] 2.49 2.49 2.51 2.44 2.49 ...
  - attr(*, "names")= chr [1:3103] "1" "2" "3" "4" ...

Does anyone has an idea how to map the prediction variance (i.e. 
estimated or propagated error) for the randomForest models spatially?

I've tried deriving a propagated error for the randomForest models 
(every fit gives another model due to random component):

 > l.rfk <- data.frame(om_1 = rep(NA, nrow(meuse.grid)))
 > for(i in 1:50){
+   suppressWarnings(suppressMessages(x <- 
randomForest(log1p(om)~dist+ffreq, meuse.ov)))
+   l.rfk[,paste("om",i,sep="_")] <- predict(x, meuse.grid)
+ } ## takes ca 1 minute
 > meuse.grid$om.rfkvar <- om.rfk at predicted$var1.var + apply(l.rfk, 1, var)

but the prediction variance I get is rather small (much smaller than 
e.g. the GLM variance). Here is the complete code with some plots:

R code:
https://code.google.com/p/gsif/source/browse/trunk/meuse/RK_vs_RandomForestK.R

Predictions UK vs randomForest-kriging:
https://gsif.googlecode.com/svn/trunk/meuse/Fig_meuse_RK_vs_RFK.png

thanx,

T. (Tom) Hengl
Url: http://www.wageningenur.nl/en/Persons/dr.-T-Tom-Hengl.htm
Network: http://profiles.google.com/tom.hengl
Publications: http://scholar.google.com/citations?user=2oYU7S8AAAAJ



More information about the R-sig-Geo mailing list