[R-sig-Geo] (no subject)
Giuseppe Amatulli
giuseppe.amatulli at gmail.com
Mon Jan 10 18:48:48 CET 2011
Hi,
first of all happy new year!
I'm trying to do forest species distribution at European level (1 km
resolution) by means of randomForest running it in a cluster
computer. I'm using several predictors of different data sources. All
of them are rasters in grass format. Therefore i was using spgrass6
to import the data in to R and apply randomForest prediction to the
layers.
In the same time, reading carefully the help page of the raster
package seems to me that his "row by row" feature allows a better
performance of the memory limitation, compare to spgrass6. It is this
the case?
If raster package is more efficient, how i can use it to import grass
data? I suppose by reading the raster under the cellhd folder
> maps <- stack ( c ( 'LOCATION/PERMANENT/cellhd/grid1','LOCATION/PERMANENT/cellhd/grid2'))
One more question.
The data for training randomforest are stored in R table. Each
observation represent the presence/absence ( 0 or 1 ) of a plant
specie. I also have an item of presence/absence reliability which give
to me information concerning the quality of the data. Whit this item i
would like to give a "weight" in randForest in order to give more
importance to the "good" data. Any idea?
As rough idea i was thinking to replicate the data in accordance to
the quality but it this will increment to much the amount of data. On
the opposite a stratified bootstrapping will result in a data
squeezing and long computation.
In other words i'm searching a weight options as present in lm model. Any idea?
Thank in advance
Regards
Giuseppe Amatulli
More information about the R-sig-Geo
mailing list