[R-sig-Geo] (no subject)

Giuseppe Amatulli giuseppe.amatulli at gmail.com
Mon Jan 10 18:48:48 CET 2011


Hi,
first of all  happy new year!

I'm trying to do forest species distribution at European level (1 km
resolution) by means of  randomForest running it in a cluster
computer. I'm using several predictors of different data sources. All
of them are rasters in grass format.  Therefore i was using spgrass6
to import the data in to R and apply randomForest prediction to the
layers.

In the same time, reading carefully the help page of the raster
package seems to me that his "row by row" feature allows a better
performance of the memory limitation, compare to spgrass6. It is this
the case?
If raster package is more efficient, how i can use it to import grass
data?  I suppose by reading the raster under the cellhd folder
>  maps  <-  stack ( c ( 'LOCATION/PERMANENT/cellhd/grid1','LOCATION/PERMANENT/cellhd/grid2'))

One more question.
The data for training randomforest are stored in R table. Each
observation represent the  presence/absence ( 0 or 1 ) of a plant
specie. I also have an item of presence/absence reliability which give
to me information concerning the quality of the data. Whit this item i
would like to give a "weight" in randForest in order to give more
importance to the "good" data. Any idea?
As rough  idea i was thinking to replicate the data in accordance to
the quality but it this will increment to much the amount of data. On
the opposite a stratified bootstrapping will result in a data
squeezing and long computation.
In other words i'm searching a weight options as present in lm model. Any idea?

Thank in advance
Regards
Giuseppe Amatulli



More information about the R-sig-Geo mailing list