[R-sig-Geo] SpatialGridDataFrame to data.frame

Thu Feb 12 10:39:16 CET 2009

On Thu, 12 Feb 2009, Ned Horning wrote:

> Roger and Robert,
>
> Thanks for the help. Once again I am close but I can't figure out how to use 
> the attribute data to control the sampling. I'd like to stratify the sampling 
> by attribute value. In the Shapefile I have an integer attribute "covertype". 
> There can be several polygons with the same covertype ID. What I want to do 
> is get 100 sample points from polygons of covertype=1, 100 points from 
> polygons of covertype=2... Is that possible? I looked at readShapePoly, 
> readORG, and the suite of tools in spsample and but didn't see what I was 
> looking for.
>
> It looks like overlay() is a good way to assign the value of covertype to 
> coordinate pairs.
>
> One solution to fix all of this is to have a separate Shapefile for each 
> covertype but I'd like to avoid that if possible.

If you subset the SpatialPolygonsDataFrame object by covertype, and then 
spsample() for the subset, you might get a bit nearer. Subset using the [ 
operator as for data.frames:

X_forest <- X[X$covertype="forest",]
forest_pts <- spsample(X_forest, n=100)

(untried)

Using lapply, you could probably step along the factor (categorical) 
levels of covertype if there are many and (say) n is fixed.

Roger

>
> Ned
>
> Roger Bivand wrote:
>> On Thu, 12 Feb 2009, Robert Hijmans wrote:
>> 
>>> Hi Ned,
>>> 
>>> Good to hear that. For your other question, have a look at:
>>> 
>>> require(maptools)
>>> ?readShapePoly
>> 
>> Or more generally readOGR() in rgdal for another use of the underlying 
>> shapelib code.
>> 
>>> 
>>> require(sp)
>>> ?sample.Polygons
>>> 
>> 
>> Most likely the spsample() method will be enough, without having to pick a 
>> specific method - but if necessary the Polygons objects can be dissolved 
>> using unionSpatialPolygons() in maptools.
>> 
>> Roger
>> 
>>> Robert
>>> 
>>> 
>>> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>>>> Robert,
>>>> 
>>>> This worked - thanks. It's always uplifting to see an actual image after
>>>> working on something for a while. Now I can start playing with parameters
>>>> and playing with different approaches. I'm just (re)starting my R 
>>>> education
>>>> and I'm pretty slow getting the hang of it but your examples help a lot.
>>>> They also give me more avenues to discover other functionality and 
>>>> different
>>>> ways of doing things. I hope I can keep working with R and GRASS until I
>>>> know it this time. My goal is to get to the point where I can be 
>>>> productive
>>>> with these packages and start training other folks.
>>>> 
>>>> The raster package looks very nice and I'll keep an eye on its 
>>>> development.
>>>> One step that I am currently doing in GRASS is to read a Shapefile with
>>>> training data (polygons with an integer attribute representing a land 
>>>> cover
>>>> type) and then randomly create 100 points within each set of polygons
>>>> representing a specific land cover type. I do this for each land cover 
>>>> type
>>>> and then concatenate the results into a text file. This file has the
>>>> coordinates that I use in xyValues to get the pixels values from the SPOT
>>>> image. Is there a way to do this sampling using the raster package or
>>>> another R package that you are familiar with? In GRASS I convert the
>>>> Shapefile to a raster image before doing the random sampling and it would 
>>>> be
>>>> nice if I could skip this step.
>>>> 
>>>> All the best,
>>>> 
>>>> Ned
>>>> 
>>>> Robert Hijmans wrote:
>>>>> 
>>>>> I see, in my example, I had a single quantitative variable but you
>>>>> probably have land cover classes or something like that. If the
>>>>> classes are in fact numbers stored as text then use
>>>>> 
>>>>> pred <- as.numeric(pred)
>>>>> 
>>>>> but if you have words such as 'forest', 'crops', 'water' you could do
>>>>> something like
>>>>> 
>>>>> ...
>>>>>   pred <- predict(randfor, rowvals)
>>>>>   pred[pred=='forest'] <- 1
>>>>>   pred[pred=='crops'] <- 2
>>>>>   pred[pred=='water'] <- 3
>>>>>   pred <- as.numeric(pred)
>>>>>   predrast <- setValues(predrast, pred, r)
>>>>> ...
>>>>> 
>>>>> not pretty, you could also fit RF with classes that can be interpreted
>>>>> as numbers..
>>>>> Make sure you do not get:
>>>>> 
>>>>> Warning message:
>>>>> NAs introduced by coercion
>>>>> 
>>>>> 
>>>>> which would suggest that some character values could not be
>>>>> transformed to numbers.
>>>>> 
>>>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>>>>> 
>>>>>> 
>>>>>> Robert,
>>>>>> 
>>>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got another
>>>>>> error: values must be numeric, integer or logical.
>>>>>> 
>>>>>> class(pred) = "factor"
>>>>>> dim(pred) = NULL
>>>>>> class(v) = "character"
>>>>>> length(v) == ncol(spot) = TRUE
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Ned
>>>>>> 
>>>>>> Robert Hijmans wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Strange. You could try
>>>>>>>     predrast <- setValues(predrast, as.vector(pred), r)
>>>>>>> 
>>>>>>> But it would be good to know what pred is.
>>>>>>> 
>>>>>>> Can you do this:
>>>>>>> 
>>>>>>> class(pred)
>>>>>>> dim(pred)
>>>>>>> v <- as.vector(pred)
>>>>>>> class(v)
>>>>>>> length(v) == ncol(spot)
>>>>>>> 
>>>>>>> 
>>>>>>> Robert
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Robert and Roger,
>>>>>>>> 
>>>>>>>> Thanks for the information and pointers. The raster package looks 
>>>>>>>> quite
>>>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>>>> capabilities.
>>>>>>>> Are
>>>>>>>> the man pages the best way to do that or is that a single document
>>>>>>>> available?
>>>>>>>> 
>>>>>>>> I made some progress but still have some questions. I followed the
>>>>>>>> steps
>>>>>>>> laid out by Robert and everything went fine except I ran into an 
>>>>>>>> error
>>>>>>>> with
>>>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when I 
>>>>>>>> tried
>>>>>>>> processing one line at a time and "r <- setValues(r, pred)" when I 
>>>>>>>> ran
>>>>>>>> the
>>>>>>>> full image in one go. The error was: "values must be a vector." Any
>>>>>>>> idea
>>>>>>>> what I'm doing wrong?
>>>>>>>> 
>>>>>>>> I tried to read the GRASS files directly but got a message saying it 
>>>>>>>> is
>>>>>>>> not
>>>>>>>> a supported file format. Can you confirm that is the case or am I 
>>>>>>>> doing
>>>>>>>> something wrong? I was able to read a tiff version of the image. I am
>>>>>>>> able
>>>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>>> 
>>>>>>>> Thanks again for the help.
>>>>>>>> 
>>>>>>>> Ned
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Robert Hijmans wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Ned,
>>>>>>>>> 
>>>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>>>> raster package (for the simple case that there are no NA values in 
>>>>>>>>> the
>>>>>>>>> raster data; if there are, you have to into account that "predict'
>>>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>>> 
>>>>>>>>> Robert
>>>>>>>>> 
>>>>>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>>>>>> require(raster)
>>>>>>>>> require(randomForest)
>>>>>>>>> 
>>>>>>>>> # for single band files
>>>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>>>> # for multiple band files
>>>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>>>>>> c(1,2,3) )
>>>>>>>>> 
>>>>>>>>> # simulate random points and values to model with
>>>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>>>> response <- runif(100) * 100
>>>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>>> 
>>>>>>>>> # run RandomForest
>>>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>>> 
>>>>>>>>> # apply the prediction, row by row
>>>>>>>>> predrast <- setRaster(spot)
>>>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>>>      spot <- readRow(spot, r)
>>>>>>>>>      rowvals <- values(spot, names=TRUE)
>>>>>>>>> # this next line should not be necessary, but it is
>>>>>>>>> # I'll fix that
>>>>>>>>>      colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>>>      pred <- predict(randfor, rowvals)
>>>>>>>>>      predrast <- setValues(predrast, pred, r)
>>>>>>>>>      predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> plot(predrast)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Ned:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The three bands are most likely treated as 4-byte integers, so the
>>>>>>>>>> object
>>>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. 
>>>>>>>>>> They
>>>>>>>>>> may
>>>>>>>>>> get copied too. There are no single byte user-level containers for
>>>>>>>>>> you
>>>>>>>>>> (there is a raw data type, but you can't calculate with it). 
>>>>>>>>>> Possibly
>>>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>>>> operation,
>>>>>>>>>> but its hard to tell - your choice of method first adds inn all the
>>>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles its 
>>>>>>>>>> size
>>>>>>>>>> and
>>>>>>>>>> makes more copies (to 233MB for each copy). Running gc() several
>>>>>>>>>> times
>>>>>>>>>> manually between steps often helps by making the garbage collector
>>>>>>>>>> more
>>>>>>>>>> aggressive.
>>>>>>>>>> 
>>>>>>>>>> I would watch the developments in the R-Forge package "raster", 
>>>>>>>>>> which
>>>>>>>>>> builds on some of these things, and try to see how that works. If 
>>>>>>>>>> you
>>>>>>>>>> have
>>>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read 
>>>>>>>>>> from
>>>>>>>>>> GRASS
>>>>>>>>>> - which would work with raster package functions now. Look at the
>>>>>>>>>> code
>>>>>>>>>> of
>>>>>>>>>> recent readRAST6 to see which incantations are needed. If you are
>>>>>>>>>> going
>>>>>>>>>> to
>>>>>>>>>> use randomForest for prediction, you can use smaller tiles until 
>>>>>>>>>> you
>>>>>>>>>> find
>>>>>>>>>> an alternative solution. Note that feeding a data frame of integers
>>>>>>>>>> to
>>>>>>>>>> a
>>>>>>>>>> model fitting or prediction function will result in coercion to a
>>>>>>>>>> matrix of doubles, so your subsequent workflow should take that 
>>>>>>>>>> into
>>>>>>>>>> account.
>>>>>>>>>>  Getting more memory is another option, and may be very cost and
>>>>>>>>>> especially
>>>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>>>> memory
>>>>>>>>>> may
>>>>>>>>>> save you time programming around too little memory.
>>>>>>>>>> 
>>>>>>>>>> Hope this helps,
>>>>>>>>>> 
>>>>>>>>>> Roger
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ---
>>>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>>>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>>> 
>>>>>>>>>> Greetings,
>>>>>>>>>> 
>>>>>>>>>> I am trying to read an image from GRASS using the spgrass6 command
>>>>>>>>>> readRAST6 and then convert it into a data.frame object so I can use
>>>>>>>>>> it
>>>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>>>>>> columns x 3 bands. It's a small subset of a larger image I would 
>>>>>>>>>> like
>>>>>>>>>> to
>>>>>>>>>> use eventually. I have no problem reading the image using readRAST6
>>>>>>>>>> but
>>>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for 
>>>>>>>>>> a
>>>>>>>>>> couple hours before I kill the process. The image is less than 25MB
>>>>>>>>>> so
>>>>>>>>>> I'm surprised it requires this level of memory. Can someone let me
>>>>>>>>>> know
>>>>>>>>>> why this is. Should I use something other than the GRASS interface
>>>>>>>>>> for
>>>>>>>>>> this? These are the commands I'm using:
>>>>>>>>>> 
>>>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>>> 
>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>> 
>>>>>>>>>> All the best,
>>>>>>>>>> 
>>>>>>>>>> Ned
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>
>>>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>> 
>> 
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no