[R-sig-Geo] SpatialGridDataFrame to data.frame

Roger Bivand Roger.Bivand at nhh.no
Wed Feb 11 16:57:37 CET 2009


On Wed, 11 Feb 2009, Ned Horning wrote:

> Robert and Roger,
>
> Thanks for the information and pointers. The raster package looks quite 
> interesting and I'll try to get up to speed on some of its capabilities. Are 
> the man pages the best way to do that or is that a single document available?
>
> I made some progress but still have some questions. I followed the steps laid 
> out by Robert and everything went fine except I ran into an error with 
> "predrast <- setValues(predrast, pred, r)" in the for loop when I tried 
> processing one line at a time and "r <- setValues(r, pred)" when I ran the 
> full image in one go. The error was: "values must be a vector." Any idea what 
> I'm doing wrong?
>
> I tried to read the GRASS files directly but got a message saying it is 
> not a supported file format. Can you confirm that is the case or am I 
> doing something wrong? I was able to read a tiff version of the image. I 
> am able to run gdalinfo on GRASS files just fine from a terminal window.

Could you quote verbatim what the actual fname= argument was that you used 
in readGDAL for the plugin - the incantation isn't obvious?

In readRAST6() it is:

paste(gg$GISDBASE, gg$LOCATION_NAME, mapset, "cellhd", vname[1], sep="/")

where all of GISDBASE, LOCATION_NAME, and mapset need to be discovered - 
you can use gmeta6() for the first two, and .g_findfile() for the third, 
please read the code in spgrass6 to see how they work.

Roger

>
> Thanks again for the help.
>
> Ned
>
>
> Robert Hijmans wrote:
>> Ned,
>> 
>> This is an example of running a RandomForest prediction with the
>> raster package (for the simple case that there are no NA values in the
>> raster data; if there are, you have to into account that "predict'
>> does not return any values (not even NA) for those cells).
>> 
>> Robert
>> 
>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>> require(raster)
>> require(randomForest)
>> 
>> # for single band files
>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>> # for multiple band files
>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'), c(1,2,3) 
>> )
>> 
>> # simulate random points and values to model with
>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>> response <- runif(100) * 100
>> # read values of raster layers at points, and bind to respinse
>> trainvals <- cbind(response, xyValues(spot, xy))
>> 
>> # run RandomForest
>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>> 
>> # apply the prediction, row by row
>> predrast <- setRaster(spot)
>> filename(predrast) <- 'RF_pred.grd'
>> for (r in 1:nrow(spot)) {
>> 	spot <- readRow(spot, r)
>> 	rowvals <- values(spot, names=TRUE)
>> # this next line should not be necessary, but it is
>> # I'll fix that
>> 	colnames(rowvals) <- c('b1', 'b2', 'b3')
>> 	pred <- predict(randfor, rowvals)
>> 	predrast <- setValues(predrast, pred, r)
>> 	predrast <- writeRaster(predrast, overwrite=TRUE)
>> }
>> 
>> plot(predrast)
>> 
>> 
>> 
>> 
>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> 
>>> Ned:
>>> 
>>> 
>>> The three bands are most likely treated as 4-byte integers, so the object
>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. They may
>>> get copied too. There are no single byte user-level containers for you
>>> (there is a raw data type, but you can't calculate with it). Possibly
>>> saying spot_frame <- slot(spot, "data") will save one copying operation,
>>> but its hard to tell - your choice of method first adds inn all the
>>> coordinates, which are 8-byte numbers, so more than doubles its size and
>>> makes more copies (to 233MB for each copy). Running gc() several times
>>> manually between steps often helps by making the garbage collector more
>>> aggressive.
>>> 
>>> I would watch the developments in the R-Forge package "raster", which
>>> builds on some of these things, and try to see how that works. If you have
>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read from GRASS
>>> - which would work with raster package functions now. Look at the code of
>>> recent readRAST6 to see which incantations are needed. If you are going to
>>> use randomForest for prediction, you can use smaller tiles until you find
>>> an alternative solution. Note that feeding a data frame of integers to a
>>> model fitting or prediction function will result in coercion to a
>>> matrix of doubles, so your subsequent workflow should take that into
>>> account.
>>>  Getting more memory is another option, and may be very cost and 
>>> especially
>>> time effective - at the moment your machine is swapping. Buying memory may
>>> save you time programming around too little memory.
>>> 
>>> Hope this helps,
>>> 
>>> Roger
>>> 
>>> 
>>> ---
>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>> Roger.Bivand at nhh.no
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>>> Sent: Wed 11.02.2009 07:40
>>> To: r-sig-geo at stat.math.ethz.ch
>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>> 
>>> Greetings,
>>> 
>>> I am trying to read an image from GRASS using the spgrass6 command
>>> readRAST6 and then convert it into a data.frame object so I can use it
>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>> columns x 3 bands. It's a small subset of a larger image I would like to
>>> use eventually. I have no problem reading the image using readRAST6 but
>>> when I try to convert it to a data.frame object my linux system
>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a
>>> couple hours before I kill the process. The image is less than 25MB so
>>> I'm surprised it requires this level of memory. Can someone let me know
>>> why this is. Should I use something other than the GRASS interface for
>>> this? These are the commands I'm using:
>>> 
>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>> spot_frame <- as(spot, "data.frame")
>>> 
>>> Any help would be appreciated.
>>> 
>>> All the best,
>>> 
>>> Ned
>>> 
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>> 
>>> 
>>> 
>>>
>>>        [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>> 
>> 
>> 
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list