[R-sig-Geo] SpatialGridDataFrame to data.frame

Ned Horning horning at amnh.org
Thu Feb 12 10:09:54 CET 2009


Roger and Robert,

Thanks for the help. Once again I am close but I can't figure out how to 
use the attribute data to control the sampling. I'd like to stratify the 
sampling by attribute value. In the Shapefile I have an integer 
attribute "covertype". There can be several polygons with the same 
covertype ID. What I want to do is get 100 sample points from polygons 
of covertype=1, 100 points from polygons of covertype=2... Is that 
possible? I looked at readShapePoly, readORG, and the suite of tools in 
spsample and but didn't see what I was looking for.

It looks like overlay() is a good way to assign the value of covertype 
to coordinate pairs.

One solution to fix all of this is to have a separate Shapefile for each 
covertype but I'd like to avoid that if possible.

Ned

Roger Bivand wrote:
> On Thu, 12 Feb 2009, Robert Hijmans wrote:
>
>> Hi Ned,
>>
>> Good to hear that. For your other question, have a look at:
>>
>> require(maptools)
>> ?readShapePoly
>
> Or more generally readOGR() in rgdal for another use of the underlying 
> shapelib code.
>
>>
>> require(sp)
>> ?sample.Polygons
>>
>
> Most likely the spsample() method will be enough, without having to 
> pick a specific method - but if necessary the Polygons objects can be 
> dissolved using unionSpatialPolygons() in maptools.
>
> Roger
>
>> Robert
>>
>>
>> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>>> Robert,
>>>
>>> This worked - thanks. It's always uplifting to see an actual image 
>>> after
>>> working on something for a while. Now I can start playing with 
>>> parameters
>>> and playing with different approaches. I'm just (re)starting my R 
>>> education
>>> and I'm pretty slow getting the hang of it but your examples help a 
>>> lot.
>>> They also give me more avenues to discover other functionality and 
>>> different
>>> ways of doing things. I hope I can keep working with R and GRASS 
>>> until I
>>> know it this time. My goal is to get to the point where I can be 
>>> productive
>>> with these packages and start training other folks.
>>>
>>> The raster package looks very nice and I'll keep an eye on its 
>>> development.
>>> One step that I am currently doing in GRASS is to read a Shapefile with
>>> training data (polygons with an integer attribute representing a 
>>> land cover
>>> type) and then randomly create 100 points within each set of polygons
>>> representing a specific land cover type. I do this for each land 
>>> cover type
>>> and then concatenate the results into a text file. This file has the
>>> coordinates that I use in xyValues to get the pixels values from the 
>>> SPOT
>>> image. Is there a way to do this sampling using the raster package or
>>> another R package that you are familiar with? In GRASS I convert the
>>> Shapefile to a raster image before doing the random sampling and it 
>>> would be
>>> nice if I could skip this step.
>>>
>>> All the best,
>>>
>>> Ned
>>>
>>> Robert Hijmans wrote:
>>>>
>>>> I see, in my example, I had a single quantitative variable but you
>>>> probably have land cover classes or something like that. If the
>>>> classes are in fact numbers stored as text then use
>>>>
>>>> pred <- as.numeric(pred)
>>>>
>>>> but if you have words such as 'forest', 'crops', 'water' you could do
>>>> something like
>>>>
>>>> ...
>>>>   pred <- predict(randfor, rowvals)
>>>>   pred[pred=='forest'] <- 1
>>>>   pred[pred=='crops'] <- 2
>>>>   pred[pred=='water'] <- 3
>>>>   pred <- as.numeric(pred)
>>>>   predrast <- setValues(predrast, pred, r)
>>>> ...
>>>>
>>>> not pretty, you could also fit RF with classes that can be interpreted
>>>> as numbers..
>>>> Make sure you do not get:
>>>>
>>>> Warning message:
>>>> NAs introduced by coercion
>>>>
>>>>
>>>> which would suggest that some character values could not be
>>>> transformed to numbers.
>>>>
>>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>>>>
>>>>>
>>>>> Robert,
>>>>>
>>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got 
>>>>> another
>>>>> error: values must be numeric, integer or logical.
>>>>>
>>>>> class(pred) = "factor"
>>>>> dim(pred) = NULL
>>>>> class(v) = "character"
>>>>> length(v) == ncol(spot) = TRUE
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Ned
>>>>>
>>>>> Robert Hijmans wrote:
>>>>>
>>>>>>
>>>>>> Strange. You could try
>>>>>>     predrast <- setValues(predrast, as.vector(pred), r)
>>>>>>
>>>>>> But it would be good to know what pred is.
>>>>>>
>>>>>> Can you do this:
>>>>>>
>>>>>> class(pred)
>>>>>> dim(pred)
>>>>>> v <- as.vector(pred)
>>>>>> class(v)
>>>>>> length(v) == ncol(spot)
>>>>>>
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org> 
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Robert and Roger,
>>>>>>>
>>>>>>> Thanks for the information and pointers. The raster package 
>>>>>>> looks quite
>>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>>> capabilities.
>>>>>>> Are
>>>>>>> the man pages the best way to do that or is that a single document
>>>>>>> available?
>>>>>>>
>>>>>>> I made some progress but still have some questions. I followed the
>>>>>>> steps
>>>>>>> laid out by Robert and everything went fine except I ran into an 
>>>>>>> error
>>>>>>> with
>>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when 
>>>>>>> I tried
>>>>>>> processing one line at a time and "r <- setValues(r, pred)" when 
>>>>>>> I ran
>>>>>>> the
>>>>>>> full image in one go. The error was: "values must be a vector." Any
>>>>>>> idea
>>>>>>> what I'm doing wrong?
>>>>>>>
>>>>>>> I tried to read the GRASS files directly but got a message 
>>>>>>> saying it is
>>>>>>> not
>>>>>>> a supported file format. Can you confirm that is the case or am 
>>>>>>> I doing
>>>>>>> something wrong? I was able to read a tiff version of the image. 
>>>>>>> I am
>>>>>>> able
>>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>>
>>>>>>> Thanks again for the help.
>>>>>>>
>>>>>>> Ned
>>>>>>>
>>>>>>>
>>>>>>> Robert Hijmans wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Ned,
>>>>>>>>
>>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>>> raster package (for the simple case that there are no NA values 
>>>>>>>> in the
>>>>>>>> raster data; if there are, you have to into account that "predict'
>>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>>
>>>>>>>> Robert
>>>>>>>>
>>>>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>>>>> require(raster)
>>>>>>>> require(randomForest)
>>>>>>>>
>>>>>>>> # for single band files
>>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>>> # for multiple band files
>>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>>>>> c(1,2,3) )
>>>>>>>>
>>>>>>>> # simulate random points and values to model with
>>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>>> response <- runif(100) * 100
>>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>>
>>>>>>>> # run RandomForest
>>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>>
>>>>>>>> # apply the prediction, row by row
>>>>>>>> predrast <- setRaster(spot)
>>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>>      spot <- readRow(spot, r)
>>>>>>>>      rowvals <- values(spot, names=TRUE)
>>>>>>>> # this next line should not be necessary, but it is
>>>>>>>> # I'll fix that
>>>>>>>>      colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>>      pred <- predict(randfor, rowvals)
>>>>>>>>      predrast <- setValues(predrast, pred, r)
>>>>>>>>      predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>>> }
>>>>>>>>
>>>>>>>> plot(predrast)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand 
>>>>>>>> <Roger.Bivand at nhh.no>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ned:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The three bands are most likely treated as 4-byte integers, so 
>>>>>>>>> the
>>>>>>>>> object
>>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 
>>>>>>>>> 100MB. They
>>>>>>>>> may
>>>>>>>>> get copied too. There are no single byte user-level containers 
>>>>>>>>> for
>>>>>>>>> you
>>>>>>>>> (there is a raw data type, but you can't calculate with it). 
>>>>>>>>> Possibly
>>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>>> operation,
>>>>>>>>> but its hard to tell - your choice of method first adds inn 
>>>>>>>>> all the
>>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles 
>>>>>>>>> its size
>>>>>>>>> and
>>>>>>>>> makes more copies (to 233MB for each copy). Running gc() several
>>>>>>>>> times
>>>>>>>>> manually between steps often helps by making the garbage 
>>>>>>>>> collector
>>>>>>>>> more
>>>>>>>>> aggressive.
>>>>>>>>>
>>>>>>>>> I would watch the developments in the R-Forge package 
>>>>>>>>> "raster", which
>>>>>>>>> builds on some of these things, and try to see how that works. 
>>>>>>>>> If you
>>>>>>>>> have
>>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to 
>>>>>>>>> read from
>>>>>>>>> GRASS
>>>>>>>>> - which would work with raster package functions now. Look at the
>>>>>>>>> code
>>>>>>>>> of
>>>>>>>>> recent readRAST6 to see which incantations are needed. If you are
>>>>>>>>> going
>>>>>>>>> to
>>>>>>>>> use randomForest for prediction, you can use smaller tiles 
>>>>>>>>> until you
>>>>>>>>> find
>>>>>>>>> an alternative solution. Note that feeding a data frame of 
>>>>>>>>> integers
>>>>>>>>> to
>>>>>>>>> a
>>>>>>>>> model fitting or prediction function will result in coercion to a
>>>>>>>>> matrix of doubles, so your subsequent workflow should take 
>>>>>>>>> that into
>>>>>>>>> account.
>>>>>>>>>  Getting more memory is another option, and may be very cost and
>>>>>>>>> especially
>>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>>> memory
>>>>>>>>> may
>>>>>>>>> save you time programming around too little memory.
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Roger
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned 
>>>>>>>>> Horning
>>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>>
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> I am trying to read an image from GRASS using the spgrass6 
>>>>>>>>> command
>>>>>>>>> readRAST6 and then convert it into a data.frame object so I 
>>>>>>>>> can use
>>>>>>>>> it
>>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>>>>> columns x 3 bands. It's a small subset of a larger image I 
>>>>>>>>> would like
>>>>>>>>> to
>>>>>>>>> use eventually. I have no problem reading the image using 
>>>>>>>>> readRAST6
>>>>>>>>> but
>>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs 
>>>>>>>>> for a
>>>>>>>>> couple hours before I kill the process. The image is less than 
>>>>>>>>> 25MB
>>>>>>>>> so
>>>>>>>>> I'm surprised it requires this level of memory. Can someone 
>>>>>>>>> let me
>>>>>>>>> know
>>>>>>>>> why this is. Should I use something other than the GRASS 
>>>>>>>>> interface
>>>>>>>>> for
>>>>>>>>> this? These are the commands I'm using:
>>>>>>>>>
>>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>>
>>>>>>>>> Any help would be appreciated.
>>>>>>>>>
>>>>>>>>> All the best,
>>>>>>>>>
>>>>>>>>> Ned
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>



More information about the R-sig-Geo mailing list