[R-sig-Geo] SpatialGridDataFrame to data.frame

Roger Bivand Roger.Bivand at nhh.no
Thu Feb 12 08:14:19 CET 2009


On Thu, 12 Feb 2009, Robert Hijmans wrote:

> Hi Ned,
>
> Good to hear that. For your other question, have a look at:
>
> require(maptools)
> ?readShapePoly

Or more generally readOGR() in rgdal for another use of the underlying 
shapelib code.

>
> require(sp)
> ?sample.Polygons
>

Most likely the spsample() method will be enough, without having to pick a 
specific method - but if necessary the Polygons objects can be dissolved 
using unionSpatialPolygons() in maptools.

Roger

> Robert
>
>
> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>> Robert,
>>
>> This worked - thanks. It's always uplifting to see an actual image after
>> working on something for a while. Now I can start playing with parameters
>> and playing with different approaches. I'm just (re)starting my R education
>> and I'm pretty slow getting the hang of it but your examples help a lot.
>> They also give me more avenues to discover other functionality and different
>> ways of doing things. I hope I can keep working with R and GRASS until I
>> know it this time. My goal is to get to the point where I can be productive
>> with these packages and start training other folks.
>>
>> The raster package looks very nice and I'll keep an eye on its development.
>> One step that I am currently doing in GRASS is to read a Shapefile with
>> training data (polygons with an integer attribute representing a land cover
>> type) and then randomly create 100 points within each set of polygons
>> representing a specific land cover type. I do this for each land cover type
>> and then concatenate the results into a text file. This file has the
>> coordinates that I use in xyValues to get the pixels values from the SPOT
>> image. Is there a way to do this sampling using the raster package or
>> another R package that you are familiar with? In GRASS I convert the
>> Shapefile to a raster image before doing the random sampling and it would be
>> nice if I could skip this step.
>>
>> All the best,
>>
>> Ned
>>
>> Robert Hijmans wrote:
>>>
>>> I see, in my example, I had a single quantitative variable but you
>>> probably have land cover classes or something like that. If the
>>> classes are in fact numbers stored as text then use
>>>
>>> pred <- as.numeric(pred)
>>>
>>> but if you have words such as 'forest', 'crops', 'water' you could do
>>> something like
>>>
>>> ...
>>>   pred <- predict(randfor, rowvals)
>>>   pred[pred=='forest'] <- 1
>>>   pred[pred=='crops'] <- 2
>>>   pred[pred=='water'] <- 3
>>>   pred <- as.numeric(pred)
>>>   predrast <- setValues(predrast, pred, r)
>>> ...
>>>
>>> not pretty, you could also fit RF with classes that can be interpreted
>>> as numbers..
>>> Make sure you do not get:
>>>
>>> Warning message:
>>> NAs introduced by coercion
>>>
>>>
>>> which would suggest that some character values could not be
>>> transformed to numbers.
>>>
>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>>>
>>>>
>>>> Robert,
>>>>
>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got another
>>>> error: values must be numeric, integer or logical.
>>>>
>>>> class(pred) = "factor"
>>>> dim(pred) = NULL
>>>> class(v) = "character"
>>>> length(v) == ncol(spot) = TRUE
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Ned
>>>>
>>>> Robert Hijmans wrote:
>>>>
>>>>>
>>>>> Strange. You could try
>>>>>     predrast <- setValues(predrast, as.vector(pred), r)
>>>>>
>>>>> But it would be good to know what pred is.
>>>>>
>>>>> Can you do this:
>>>>>
>>>>> class(pred)
>>>>> dim(pred)
>>>>> v <- as.vector(pred)
>>>>> class(v)
>>>>> length(v) == ncol(spot)
>>>>>
>>>>>
>>>>> Robert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Robert and Roger,
>>>>>>
>>>>>> Thanks for the information and pointers. The raster package looks quite
>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>> capabilities.
>>>>>> Are
>>>>>> the man pages the best way to do that or is that a single document
>>>>>> available?
>>>>>>
>>>>>> I made some progress but still have some questions. I followed the
>>>>>> steps
>>>>>> laid out by Robert and everything went fine except I ran into an error
>>>>>> with
>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when I tried
>>>>>> processing one line at a time and "r <- setValues(r, pred)" when I ran
>>>>>> the
>>>>>> full image in one go. The error was: "values must be a vector." Any
>>>>>> idea
>>>>>> what I'm doing wrong?
>>>>>>
>>>>>> I tried to read the GRASS files directly but got a message saying it is
>>>>>> not
>>>>>> a supported file format. Can you confirm that is the case or am I doing
>>>>>> something wrong? I was able to read a tiff version of the image. I am
>>>>>> able
>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>
>>>>>> Thanks again for the help.
>>>>>>
>>>>>> Ned
>>>>>>
>>>>>>
>>>>>> Robert Hijmans wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Ned,
>>>>>>>
>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>> raster package (for the simple case that there are no NA values in the
>>>>>>> raster data; if there are, you have to into account that "predict'
>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>
>>>>>>> Robert
>>>>>>>
>>>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>>>> require(raster)
>>>>>>> require(randomForest)
>>>>>>>
>>>>>>> # for single band files
>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>> # for multiple band files
>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>>>> c(1,2,3) )
>>>>>>>
>>>>>>> # simulate random points and values to model with
>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>> response <- runif(100) * 100
>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>
>>>>>>> # run RandomForest
>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>
>>>>>>> # apply the prediction, row by row
>>>>>>> predrast <- setRaster(spot)
>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>      spot <- readRow(spot, r)
>>>>>>>      rowvals <- values(spot, names=TRUE)
>>>>>>> # this next line should not be necessary, but it is
>>>>>>> # I'll fix that
>>>>>>>      colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>      pred <- predict(randfor, rowvals)
>>>>>>>      predrast <- setValues(predrast, pred, r)
>>>>>>>      predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>> }
>>>>>>>
>>>>>>> plot(predrast)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Ned:
>>>>>>>>
>>>>>>>>
>>>>>>>> The three bands are most likely treated as 4-byte integers, so the
>>>>>>>> object
>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. They
>>>>>>>> may
>>>>>>>> get copied too. There are no single byte user-level containers for
>>>>>>>> you
>>>>>>>> (there is a raw data type, but you can't calculate with it). Possibly
>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>> operation,
>>>>>>>> but its hard to tell - your choice of method first adds inn all the
>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles its size
>>>>>>>> and
>>>>>>>> makes more copies (to 233MB for each copy). Running gc() several
>>>>>>>> times
>>>>>>>> manually between steps often helps by making the garbage collector
>>>>>>>> more
>>>>>>>> aggressive.
>>>>>>>>
>>>>>>>> I would watch the developments in the R-Forge package "raster", which
>>>>>>>> builds on some of these things, and try to see how that works. If you
>>>>>>>> have
>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read from
>>>>>>>> GRASS
>>>>>>>> - which would work with raster package functions now. Look at the
>>>>>>>> code
>>>>>>>> of
>>>>>>>> recent readRAST6 to see which incantations are needed. If you are
>>>>>>>> going
>>>>>>>> to
>>>>>>>> use randomForest for prediction, you can use smaller tiles until you
>>>>>>>> find
>>>>>>>> an alternative solution. Note that feeding a data frame of integers
>>>>>>>> to
>>>>>>>> a
>>>>>>>> model fitting or prediction function will result in coercion to a
>>>>>>>> matrix of doubles, so your subsequent workflow should take that into
>>>>>>>> account.
>>>>>>>>  Getting more memory is another option, and may be very cost and
>>>>>>>> especially
>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>> memory
>>>>>>>> may
>>>>>>>> save you time programming around too little memory.
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Roger
>>>>>>>>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> I am trying to read an image from GRASS using the spgrass6 command
>>>>>>>> readRAST6 and then convert it into a data.frame object so I can use
>>>>>>>> it
>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>>>> columns x 3 bands. It's a small subset of a larger image I would like
>>>>>>>> to
>>>>>>>> use eventually. I have no problem reading the image using readRAST6
>>>>>>>> but
>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a
>>>>>>>> couple hours before I kill the process. The image is less than 25MB
>>>>>>>> so
>>>>>>>> I'm surprised it requires this level of memory. Can someone let me
>>>>>>>> know
>>>>>>>> why this is. Should I use something other than the GRASS interface
>>>>>>>> for
>>>>>>>> this? These are the commands I'm using:
>>>>>>>>
>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>
>>>>>>>> Any help would be appreciated.
>>>>>>>>
>>>>>>>> All the best,
>>>>>>>>
>>>>>>>> Ned
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list