[R-sig-Geo] (no subject)

Mon Jan 10 20:25:08 CET 2011

On Mon, 10 Jan 2011, Giuseppe Amatulli wrote:

> Hi,
> first of all  happy new year!
>
> I'm trying to do forest species distribution at European level (1 km
> resolution) by means of  randomForest running it in a cluster
> computer. I'm using several predictors of different data sources. All
> of them are rasters in grass format.  Therefore i was using spgrass6
> to import the data in to R and apply randomForest prediction to the
> layers.
>
> In the same time, reading carefully the help page of the raster
> package seems to me that his "row by row" feature allows a better
> performance of the memory limitation, compare to spgrass6. It is this
> the case?
> If raster package is more efficient, how i can use it to import grass
> data?  I suppose by reading the raster under the cellhd folder
>>  maps  <-  stack ( c ( 'LOCATION/PERMANENT/cellhd/grid1','LOCATION/PERMANENT/cellhd/grid2'))

In principle, the GRASS GDAL plugin should work in this way, but you can 
also use g.region in GRASS to set the region for readRAST6() to read, 
which could be in tiles or rows at your convenience. This might be easier 
if the predicted tiles are to be written back to GRASS as part of the 
process. It would be overkill to think of this kind of iterated region 
support in raster, which uses the region.dim= and offset= features of the 
GDAL interface, I think.

It would be fun to see whether SAGA could be used in the same way with 
raster, as there is a SAGA GDAL driver.

Roger

>
> One more question.
> The data for training randomforest are stored in R table. Each
> observation represent the  presence/absence ( 0 or 1 ) of a plant
> specie. I also have an item of presence/absence reliability which give
> to me information concerning the quality of the data. Whit this item i
> would like to give a "weight" in randForest in order to give more
> importance to the "good" data. Any idea?
> As rough  idea i was thinking to replicate the data in accordance to
> the quality but it this will increment to much the amount of data. On
> the opposite a stratified bootstrapping will result in a data
> squeezing and long computation.
> In other words i'm searching a weight options as present in lm model. Any idea?
>
> Thank in advance
> Regards
> Giuseppe Amatulli
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no