[R-sig-Geo] SpatialGridDataFrame to data.frame

Thu Feb 12 11:28:53 CET 2009

On Thu, 12 Feb 2009, Ned Horning wrote:

> Roger,
>
> Thanks - It looks like this will do the trick. I had to change 
> [X$covertype="forest",] to [X$covertype=="forest",] but it appears to work 
> just fine.

Yes, of course, my mistake. Nice to know it worked after correction.

Roger

>
> Ned
>
> Roger Bivand wrote:
>> On Thu, 12 Feb 2009, Ned Horning wrote:
>> 
>>> Roger and Robert,
>>> 
>>> Thanks for the help. Once again I am close but I can't figure out how to 
>>> use the attribute data to control the sampling. I'd like to stratify the 
>>> sampling by attribute value. In the Shapefile I have an integer attribute 
>>> "covertype". There can be several polygons with the same covertype ID. 
>>> What I want to do is get 100 sample points from polygons of covertype=1, 
>>> 100 points from polygons of covertype=2... Is that possible? I looked at 
>>> readShapePoly, readORG, and the suite of tools in spsample and but didn't 
>>> see what I was looking for.
>>> 
>>> It looks like overlay() is a good way to assign the value of covertype to 
>>> coordinate pairs.
>>> 
>>> One solution to fix all of this is to have a separate Shapefile for each 
>>> covertype but I'd like to avoid that if possible.
>> 
>> If you subset the SpatialPolygonsDataFrame object by covertype, and then 
>> spsample() for the subset, you might get a bit nearer. Subset using the [ 
>> operator as for data.frames:
>> 
>> X_forest <- X[X$covertype="forest",]
>> forest_pts <- spsample(X_forest, n=100)
>> 
>> (untried)
>> 
>> Using lapply, you could probably step along the factor (categorical) levels 
>> of covertype if there are many and (say) n is fixed.
>> 
>> Roger
>> 
>> 
>>> 
>>> Ned
>>> 
>>> Roger Bivand wrote:
>>>> On Thu, 12 Feb 2009, Robert Hijmans wrote:
>>>> 
>>>>> Hi Ned,
>>>>> 
>>>>> Good to hear that. For your other question, have a look at:
>>>>> 
>>>>> require(maptools)
>>>>> ?readShapePoly
>>>> 
>>>> Or more generally readOGR() in rgdal for another use of the underlying 
>>>> shapelib code.
>>>> 
>>>>> 
>>>>> require(sp)
>>>>> ?sample.Polygons
>>>>> 
>>>> 
>>>> Most likely the spsample() method will be enough, without having to pick 
>>>> a specific method - but if necessary the Polygons objects can be 
>>>> dissolved using unionSpatialPolygons() in maptools.
>>>> 
>>>> Roger
>>>> 
>>>>> Robert
>>>>> 
>>>>> 
>>>>> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>>>>>> Robert,
>>>>>> 
>>>>>> This worked - thanks. It's always uplifting to see an actual image 
>>>>>> after
>>>>>> working on something for a while. Now I can start playing with 
>>>>>> parameters
>>>>>> and playing with different approaches. I'm just (re)starting my R 
>>>>>> education
>>>>>> and I'm pretty slow getting the hang of it but your examples help a 
>>>>>> lot.
>>>>>> They also give me more avenues to discover other functionality and 
>>>>>> different
>>>>>> ways of doing things. I hope I can keep working with R and GRASS until 
>>>>>> I
>>>>>> know it this time. My goal is to get to the point where I can be 
>>>>>> productive
>>>>>> with these packages and start training other folks.
>>>>>> 
>>>>>> The raster package looks very nice and I'll keep an eye on its 
>>>>>> development.
>>>>>> One step that I am currently doing in GRASS is to read a Shapefile with
>>>>>> training data (polygons with an integer attribute representing a land 
>>>>>> cover
>>>>>> type) and then randomly create 100 points within each set of polygons
>>>>>> representing a specific land cover type. I do this for each land cover 
>>>>>> type
>>>>>> and then concatenate the results into a text file. This file has the
>>>>>> coordinates that I use in xyValues to get the pixels values from the 
>>>>>> SPOT
>>>>>> image. Is there a way to do this sampling using the raster package or
>>>>>> another R package that you are familiar with? In GRASS I convert the
>>>>>> Shapefile to a raster image before doing the random sampling and it 
>>>>>> would be
>>>>>> nice if I could skip this step.
>>>>>> 
>>>>>> All the best,
>>>>>> 
>>>>>> Ned
>>>>>> 
>>>>>> Robert Hijmans wrote:
>>>>>>> 
>>>>>>> I see, in my example, I had a single quantitative variable but you
>>>>>>> probably have land cover classes or something like that. If the
>>>>>>> classes are in fact numbers stored as text then use
>>>>>>> 
>>>>>>> pred <- as.numeric(pred)
>>>>>>> 
>>>>>>> but if you have words such as 'forest', 'crops', 'water' you could do
>>>>>>> something like
>>>>>>> 
>>>>>>> ...
>>>>>>>   pred <- predict(randfor, rowvals)
>>>>>>>   pred[pred=='forest'] <- 1
>>>>>>>   pred[pred=='crops'] <- 2
>>>>>>>   pred[pred=='water'] <- 3
>>>>>>>   pred <- as.numeric(pred)
>>>>>>>   predrast <- setValues(predrast, pred, r)
>>>>>>> ...
>>>>>>> 
>>>>>>> not pretty, you could also fit RF with classes that can be interpreted
>>>>>>> as numbers..
>>>>>>> Make sure you do not get:
>>>>>>> 
>>>>>>> Warning message:
>>>>>>> NAs introduced by coercion
>>>>>>> 
>>>>>>> 
>>>>>>> which would suggest that some character values could not be
>>>>>>> transformed to numbers.
>>>>>>> 
>>>>>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Robert,
>>>>>>>> 
>>>>>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got 
>>>>>>>> another
>>>>>>>> error: values must be numeric, integer or logical.
>>>>>>>> 
>>>>>>>> class(pred) = "factor"
>>>>>>>> dim(pred) = NULL
>>>>>>>> class(v) = "character"
>>>>>>>> length(v) == ncol(spot) = TRUE
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Ned
>>>>>>>> 
>>>>>>>> Robert Hijmans wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Strange. You could try
>>>>>>>>>     predrast <- setValues(predrast, as.vector(pred), r)
>>>>>>>>> 
>>>>>>>>> But it would be good to know what pred is.
>>>>>>>>> 
>>>>>>>>> Can you do this:
>>>>>>>>> 
>>>>>>>>> class(pred)
>>>>>>>>> dim(pred)
>>>>>>>>> v <- as.vector(pred)
>>>>>>>>> class(v)
>>>>>>>>> length(v) == ncol(spot)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Robert
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Robert and Roger,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the information and pointers. The raster package looks 
>>>>>>>>>> quite
>>>>>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>>>>>> capabilities.
>>>>>>>>>> Are
>>>>>>>>>> the man pages the best way to do that or is that a single document
>>>>>>>>>> available?
>>>>>>>>>> 
>>>>>>>>>> I made some progress but still have some questions. I followed the
>>>>>>>>>> steps
>>>>>>>>>> laid out by Robert and everything went fine except I ran into an 
>>>>>>>>>> error
>>>>>>>>>> with
>>>>>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when I 
>>>>>>>>>> tried
>>>>>>>>>> processing one line at a time and "r <- setValues(r, pred)" when I 
>>>>>>>>>> ran
>>>>>>>>>> the
>>>>>>>>>> full image in one go. The error was: "values must be a vector." Any
>>>>>>>>>> idea
>>>>>>>>>> what I'm doing wrong?
>>>>>>>>>> 
>>>>>>>>>> I tried to read the GRASS files directly but got a message saying 
>>>>>>>>>> it is
>>>>>>>>>> not
>>>>>>>>>> a supported file format. Can you confirm that is the case or am I 
>>>>>>>>>> doing
>>>>>>>>>> something wrong? I was able to read a tiff version of the image. I 
>>>>>>>>>> am
>>>>>>>>>> able
>>>>>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>>>>> 
>>>>>>>>>> Thanks again for the help.
>>>>>>>>>> 
>>>>>>>>>> Ned
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Robert Hijmans wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Ned,
>>>>>>>>>>> 
>>>>>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>>>>>> raster package (for the simple case that there are no NA values in 
>>>>>>>>>>> the
>>>>>>>>>>> raster data; if there are, you have to into account that "predict'
>>>>>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>>>>> 
>>>>>>>>>>> Robert
>>>>>>>>>>> 
>>>>>>>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>>>>>>>> require(raster)
>>>>>>>>>>> require(randomForest)
>>>>>>>>>>> 
>>>>>>>>>>> # for single band files
>>>>>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>>>>>> # for multiple band files
>>>>>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>>>>>>>> c(1,2,3) )
>>>>>>>>>>> 
>>>>>>>>>>> # simulate random points and values to model with
>>>>>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>>>>>> response <- runif(100) * 100
>>>>>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>>>>> 
>>>>>>>>>>> # run RandomForest
>>>>>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>>>>> 
>>>>>>>>>>> # apply the prediction, row by row
>>>>>>>>>>> predrast <- setRaster(spot)
>>>>>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>>>>>      spot <- readRow(spot, r)
>>>>>>>>>>>      rowvals <- values(spot, names=TRUE)
>>>>>>>>>>> # this next line should not be necessary, but it is
>>>>>>>>>>> # I'll fix that
>>>>>>>>>>>      colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>>>>>      pred <- predict(randfor, rowvals)
>>>>>>>>>>>      predrast <- setValues(predrast, pred, r)
>>>>>>>>>>>      predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> plot(predrast)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand 
>>>>>>>>>>> <Roger.Bivand at nhh.no>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Ned:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> The three bands are most likely treated as 4-byte integers, so 
>>>>>>>>>>>> the
>>>>>>>>>>>> object
>>>>>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. 
>>>>>>>>>>>> They
>>>>>>>>>>>> may
>>>>>>>>>>>> get copied too. There are no single byte user-level containers 
>>>>>>>>>>>> for
>>>>>>>>>>>> you
>>>>>>>>>>>> (there is a raw data type, but you can't calculate with it). 
>>>>>>>>>>>> Possibly
>>>>>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>>>>>> operation,
>>>>>>>>>>>> but its hard to tell - your choice of method first adds inn all 
>>>>>>>>>>>> the
>>>>>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles its 
>>>>>>>>>>>> size
>>>>>>>>>>>> and
>>>>>>>>>>>> makes more copies (to 233MB for each copy). Running gc() several
>>>>>>>>>>>> times
>>>>>>>>>>>> manually between steps often helps by making the garbage 
>>>>>>>>>>>> collector
>>>>>>>>>>>> more
>>>>>>>>>>>> aggressive.
>>>>>>>>>>>> 
>>>>>>>>>>>> I would watch the developments in the R-Forge package "raster", 
>>>>>>>>>>>> which
>>>>>>>>>>>> builds on some of these things, and try to see how that works. If 
>>>>>>>>>>>> you
>>>>>>>>>>>> have
>>>>>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read 
>>>>>>>>>>>> from
>>>>>>>>>>>> GRASS
>>>>>>>>>>>> - which would work with raster package functions now. Look at the
>>>>>>>>>>>> code
>>>>>>>>>>>> of
>>>>>>>>>>>> recent readRAST6 to see which incantations are needed. If you are
>>>>>>>>>>>> going
>>>>>>>>>>>> to
>>>>>>>>>>>> use randomForest for prediction, you can use smaller tiles until 
>>>>>>>>>>>> you
>>>>>>>>>>>> find
>>>>>>>>>>>> an alternative solution. Note that feeding a data frame of 
>>>>>>>>>>>> integers
>>>>>>>>>>>> to
>>>>>>>>>>>> a
>>>>>>>>>>>> model fitting or prediction function will result in coercion to a
>>>>>>>>>>>> matrix of doubles, so your subsequent workflow should take that 
>>>>>>>>>>>> into
>>>>>>>>>>>> account.
>>>>>>>>>>>>  Getting more memory is another option, and may be very cost and
>>>>>>>>>>>> especially
>>>>>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>>>>>> memory
>>>>>>>>>>>> may
>>>>>>>>>>>> save you time programming around too little memory.
>>>>>>>>>>>> 
>>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>> 
>>>>>>>>>>>> Roger
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> ---
>>>>>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned 
>>>>>>>>>>>> Horning
>>>>>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>>>>> 
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>> 
>>>>>>>>>>>> I am trying to read an image from GRASS using the spgrass6 
>>>>>>>>>>>> command
>>>>>>>>>>>> readRAST6 and then convert it into a data.frame object so I can 
>>>>>>>>>>>> use
>>>>>>>>>>>> it
>>>>>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>>>>>>>> columns x 3 bands. It's a small subset of a larger image I would 
>>>>>>>>>>>> like
>>>>>>>>>>>> to
>>>>>>>>>>>> use eventually. I have no problem reading the image using 
>>>>>>>>>>>> readRAST6
>>>>>>>>>>>> but
>>>>>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs 
>>>>>>>>>>>> for a
>>>>>>>>>>>> couple hours before I kill the process. The image is less than 
>>>>>>>>>>>> 25MB
>>>>>>>>>>>> so
>>>>>>>>>>>> I'm surprised it requires this level of memory. Can someone let 
>>>>>>>>>>>> me
>>>>>>>>>>>> know
>>>>>>>>>>>> why this is. Should I use something other than the GRASS 
>>>>>>>>>>>> interface
>>>>>>>>>>>> for
>>>>>>>>>>>> this? These are the commands I'm using:
>>>>>>>>>>>> 
>>>>>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>>>>> 
>>>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>>> 
>>>>>>>>>>>> All the best,
>>>>>>>>>>>> 
>>>>>>>>>>>> Ned
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>
>>>>>>>>>>>>     [[alternative HTML version deleted]]
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>> 
>>>> 
>>> 
>>> 
>> 
>
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no