[R-sig-Geo] SpatialGridDataFrame to data.frame
Roger Bivand
Roger.Bivand at nhh.no
Wed Feb 11 16:57:37 CET 2009
On Wed, 11 Feb 2009, Ned Horning wrote:
> Robert and Roger,
>
> Thanks for the information and pointers. The raster package looks quite
> interesting and I'll try to get up to speed on some of its capabilities. Are
> the man pages the best way to do that or is that a single document available?
>
> I made some progress but still have some questions. I followed the steps laid
> out by Robert and everything went fine except I ran into an error with
> "predrast <- setValues(predrast, pred, r)" in the for loop when I tried
> processing one line at a time and "r <- setValues(r, pred)" when I ran the
> full image in one go. The error was: "values must be a vector." Any idea what
> I'm doing wrong?
>
> I tried to read the GRASS files directly but got a message saying it is
> not a supported file format. Can you confirm that is the case or am I
> doing something wrong? I was able to read a tiff version of the image. I
> am able to run gdalinfo on GRASS files just fine from a terminal window.
Could you quote verbatim what the actual fname= argument was that you used
in readGDAL for the plugin - the incantation isn't obvious?
In readRAST6() it is:
paste(gg$GISDBASE, gg$LOCATION_NAME, mapset, "cellhd", vname[1], sep="/")
where all of GISDBASE, LOCATION_NAME, and mapset need to be discovered -
you can use gmeta6() for the first two, and .g_findfile() for the third,
please read the code in spgrass6 to see how they work.
Roger
>
> Thanks again for the help.
>
> Ned
>
>
> Robert Hijmans wrote:
>> Ned,
>>
>> This is an example of running a RandomForest prediction with the
>> raster package (for the simple case that there are no NA values in the
>> raster data; if there are, you have to into account that "predict'
>> does not return any values (not even NA) for those cells).
>>
>> Robert
>>
>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>> require(raster)
>> require(randomForest)
>>
>> # for single band files
>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>> # for multiple band files
>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'), c(1,2,3)
>> )
>>
>> # simulate random points and values to model with
>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>> response <- runif(100) * 100
>> # read values of raster layers at points, and bind to respinse
>> trainvals <- cbind(response, xyValues(spot, xy))
>>
>> # run RandomForest
>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>
>> # apply the prediction, row by row
>> predrast <- setRaster(spot)
>> filename(predrast) <- 'RF_pred.grd'
>> for (r in 1:nrow(spot)) {
>> spot <- readRow(spot, r)
>> rowvals <- values(spot, names=TRUE)
>> # this next line should not be necessary, but it is
>> # I'll fix that
>> colnames(rowvals) <- c('b1', 'b2', 'b3')
>> pred <- predict(randfor, rowvals)
>> predrast <- setValues(predrast, pred, r)
>> predrast <- writeRaster(predrast, overwrite=TRUE)
>> }
>>
>> plot(predrast)
>>
>>
>>
>>
>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>>
>>> Ned:
>>>
>>>
>>> The three bands are most likely treated as 4-byte integers, so the object
>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. They may
>>> get copied too. There are no single byte user-level containers for you
>>> (there is a raw data type, but you can't calculate with it). Possibly
>>> saying spot_frame <- slot(spot, "data") will save one copying operation,
>>> but its hard to tell - your choice of method first adds inn all the
>>> coordinates, which are 8-byte numbers, so more than doubles its size and
>>> makes more copies (to 233MB for each copy). Running gc() several times
>>> manually between steps often helps by making the garbage collector more
>>> aggressive.
>>>
>>> I would watch the developments in the R-Forge package "raster", which
>>> builds on some of these things, and try to see how that works. If you have
>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read from GRASS
>>> - which would work with raster package functions now. Look at the code of
>>> recent readRAST6 to see which incantations are needed. If you are going to
>>> use randomForest for prediction, you can use smaller tiles until you find
>>> an alternative solution. Note that feeding a data frame of integers to a
>>> model fitting or prediction function will result in coercion to a
>>> matrix of doubles, so your subsequent workflow should take that into
>>> account.
>>> Getting more memory is another option, and may be very cost and
>>> especially
>>> time effective - at the moment your machine is swapping. Buying memory may
>>> save you time programming around too little memory.
>>>
>>> Hope this helps,
>>>
>>> Roger
>>>
>>>
>>> ---
>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>> Roger.Bivand at nhh.no
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>>> Sent: Wed 11.02.2009 07:40
>>> To: r-sig-geo at stat.math.ethz.ch
>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>
>>> Greetings,
>>>
>>> I am trying to read an image from GRASS using the spgrass6 command
>>> readRAST6 and then convert it into a data.frame object so I can use it
>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>> columns x 3 bands. It's a small subset of a larger image I would like to
>>> use eventually. I have no problem reading the image using readRAST6 but
>>> when I try to convert it to a data.frame object my linux system
>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a
>>> couple hours before I kill the process. The image is less than 25MB so
>>> I'm surprised it requires this level of memory. Can someone let me know
>>> why this is. Should I use something other than the GRASS interface for
>>> this? These are the commands I'm using:
>>>
>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>> spot_frame <- as(spot, "data.frame")
>>>
>>> Any help would be appreciated.
>>>
>>> All the best,
>>>
>>> Ned
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>
>>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list