[R-sig-Geo] SpatialGridDataFrame to data.frame
Roger Bivand
Roger.Bivand at nhh.no
Thu Feb 12 11:28:53 CET 2009
On Thu, 12 Feb 2009, Ned Horning wrote:
> Roger,
>
> Thanks - It looks like this will do the trick. I had to change
> [X$covertype="forest",] to [X$covertype=="forest",] but it appears to work
> just fine.
Yes, of course, my mistake. Nice to know it worked after correction.
Roger
>
> Ned
>
> Roger Bivand wrote:
>> On Thu, 12 Feb 2009, Ned Horning wrote:
>>
>>> Roger and Robert,
>>>
>>> Thanks for the help. Once again I am close but I can't figure out how to
>>> use the attribute data to control the sampling. I'd like to stratify the
>>> sampling by attribute value. In the Shapefile I have an integer attribute
>>> "covertype". There can be several polygons with the same covertype ID.
>>> What I want to do is get 100 sample points from polygons of covertype=1,
>>> 100 points from polygons of covertype=2... Is that possible? I looked at
>>> readShapePoly, readORG, and the suite of tools in spsample and but didn't
>>> see what I was looking for.
>>>
>>> It looks like overlay() is a good way to assign the value of covertype to
>>> coordinate pairs.
>>>
>>> One solution to fix all of this is to have a separate Shapefile for each
>>> covertype but I'd like to avoid that if possible.
>>
>> If you subset the SpatialPolygonsDataFrame object by covertype, and then
>> spsample() for the subset, you might get a bit nearer. Subset using the [
>> operator as for data.frames:
>>
>> X_forest <- X[X$covertype="forest",]
>> forest_pts <- spsample(X_forest, n=100)
>>
>> (untried)
>>
>> Using lapply, you could probably step along the factor (categorical) levels
>> of covertype if there are many and (say) n is fixed.
>>
>> Roger
>>
>>
>>>
>>> Ned
>>>
>>> Roger Bivand wrote:
>>>> On Thu, 12 Feb 2009, Robert Hijmans wrote:
>>>>
>>>>> Hi Ned,
>>>>>
>>>>> Good to hear that. For your other question, have a look at:
>>>>>
>>>>> require(maptools)
>>>>> ?readShapePoly
>>>>
>>>> Or more generally readOGR() in rgdal for another use of the underlying
>>>> shapelib code.
>>>>
>>>>>
>>>>> require(sp)
>>>>> ?sample.Polygons
>>>>>
>>>>
>>>> Most likely the spsample() method will be enough, without having to pick
>>>> a specific method - but if necessary the Polygons objects can be
>>>> dissolved using unionSpatialPolygons() in maptools.
>>>>
>>>> Roger
>>>>
>>>>> Robert
>>>>>
>>>>>
>>>>> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>>>>>> Robert,
>>>>>>
>>>>>> This worked - thanks. It's always uplifting to see an actual image
>>>>>> after
>>>>>> working on something for a while. Now I can start playing with
>>>>>> parameters
>>>>>> and playing with different approaches. I'm just (re)starting my R
>>>>>> education
>>>>>> and I'm pretty slow getting the hang of it but your examples help a
>>>>>> lot.
>>>>>> They also give me more avenues to discover other functionality and
>>>>>> different
>>>>>> ways of doing things. I hope I can keep working with R and GRASS until
>>>>>> I
>>>>>> know it this time. My goal is to get to the point where I can be
>>>>>> productive
>>>>>> with these packages and start training other folks.
>>>>>>
>>>>>> The raster package looks very nice and I'll keep an eye on its
>>>>>> development.
>>>>>> One step that I am currently doing in GRASS is to read a Shapefile with
>>>>>> training data (polygons with an integer attribute representing a land
>>>>>> cover
>>>>>> type) and then randomly create 100 points within each set of polygons
>>>>>> representing a specific land cover type. I do this for each land cover
>>>>>> type
>>>>>> and then concatenate the results into a text file. This file has the
>>>>>> coordinates that I use in xyValues to get the pixels values from the
>>>>>> SPOT
>>>>>> image. Is there a way to do this sampling using the raster package or
>>>>>> another R package that you are familiar with? In GRASS I convert the
>>>>>> Shapefile to a raster image before doing the random sampling and it
>>>>>> would be
>>>>>> nice if I could skip this step.
>>>>>>
>>>>>> All the best,
>>>>>>
>>>>>> Ned
>>>>>>
>>>>>> Robert Hijmans wrote:
>>>>>>>
>>>>>>> I see, in my example, I had a single quantitative variable but you
>>>>>>> probably have land cover classes or something like that. If the
>>>>>>> classes are in fact numbers stored as text then use
>>>>>>>
>>>>>>> pred <- as.numeric(pred)
>>>>>>>
>>>>>>> but if you have words such as 'forest', 'crops', 'water' you could do
>>>>>>> something like
>>>>>>>
>>>>>>> ...
>>>>>>> pred <- predict(randfor, rowvals)
>>>>>>> pred[pred=='forest'] <- 1
>>>>>>> pred[pred=='crops'] <- 2
>>>>>>> pred[pred=='water'] <- 3
>>>>>>> pred <- as.numeric(pred)
>>>>>>> predrast <- setValues(predrast, pred, r)
>>>>>>> ...
>>>>>>>
>>>>>>> not pretty, you could also fit RF with classes that can be interpreted
>>>>>>> as numbers..
>>>>>>> Make sure you do not get:
>>>>>>>
>>>>>>> Warning message:
>>>>>>> NAs introduced by coercion
>>>>>>>
>>>>>>>
>>>>>>> which would suggest that some character values could not be
>>>>>>> transformed to numbers.
>>>>>>>
>>>>>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Robert,
>>>>>>>>
>>>>>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got
>>>>>>>> another
>>>>>>>> error: values must be numeric, integer or logical.
>>>>>>>>
>>>>>>>> class(pred) = "factor"
>>>>>>>> dim(pred) = NULL
>>>>>>>> class(v) = "character"
>>>>>>>> length(v) == ncol(spot) = TRUE
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ned
>>>>>>>>
>>>>>>>> Robert Hijmans wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Strange. You could try
>>>>>>>>> predrast <- setValues(predrast, as.vector(pred), r)
>>>>>>>>>
>>>>>>>>> But it would be good to know what pred is.
>>>>>>>>>
>>>>>>>>> Can you do this:
>>>>>>>>>
>>>>>>>>> class(pred)
>>>>>>>>> dim(pred)
>>>>>>>>> v <- as.vector(pred)
>>>>>>>>> class(v)
>>>>>>>>> length(v) == ncol(spot)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Robert and Roger,
>>>>>>>>>>
>>>>>>>>>> Thanks for the information and pointers. The raster package looks
>>>>>>>>>> quite
>>>>>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>>>>>> capabilities.
>>>>>>>>>> Are
>>>>>>>>>> the man pages the best way to do that or is that a single document
>>>>>>>>>> available?
>>>>>>>>>>
>>>>>>>>>> I made some progress but still have some questions. I followed the
>>>>>>>>>> steps
>>>>>>>>>> laid out by Robert and everything went fine except I ran into an
>>>>>>>>>> error
>>>>>>>>>> with
>>>>>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when I
>>>>>>>>>> tried
>>>>>>>>>> processing one line at a time and "r <- setValues(r, pred)" when I
>>>>>>>>>> ran
>>>>>>>>>> the
>>>>>>>>>> full image in one go. The error was: "values must be a vector." Any
>>>>>>>>>> idea
>>>>>>>>>> what I'm doing wrong?
>>>>>>>>>>
>>>>>>>>>> I tried to read the GRASS files directly but got a message saying
>>>>>>>>>> it is
>>>>>>>>>> not
>>>>>>>>>> a supported file format. Can you confirm that is the case or am I
>>>>>>>>>> doing
>>>>>>>>>> something wrong? I was able to read a tiff version of the image. I
>>>>>>>>>> am
>>>>>>>>>> able
>>>>>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>>>>>
>>>>>>>>>> Thanks again for the help.
>>>>>>>>>>
>>>>>>>>>> Ned
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Robert Hijmans wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ned,
>>>>>>>>>>>
>>>>>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>>>>>> raster package (for the simple case that there are no NA values in
>>>>>>>>>>> the
>>>>>>>>>>> raster data; if there are, you have to into account that "predict'
>>>>>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>>>>>
>>>>>>>>>>> Robert
>>>>>>>>>>>
>>>>>>>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>>>>>>>> require(raster)
>>>>>>>>>>> require(randomForest)
>>>>>>>>>>>
>>>>>>>>>>> # for single band files
>>>>>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>>>>>> # for multiple band files
>>>>>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>>>>>>>> c(1,2,3) )
>>>>>>>>>>>
>>>>>>>>>>> # simulate random points and values to model with
>>>>>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>>>>>> response <- runif(100) * 100
>>>>>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>>>>>
>>>>>>>>>>> # run RandomForest
>>>>>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>>>>>
>>>>>>>>>>> # apply the prediction, row by row
>>>>>>>>>>> predrast <- setRaster(spot)
>>>>>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>>>>> spot <- readRow(spot, r)
>>>>>>>>>>> rowvals <- values(spot, names=TRUE)
>>>>>>>>>>> # this next line should not be necessary, but it is
>>>>>>>>>>> # I'll fix that
>>>>>>>>>>> colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>>>>> pred <- predict(randfor, rowvals)
>>>>>>>>>>> predrast <- setValues(predrast, pred, r)
>>>>>>>>>>> predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> plot(predrast)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand
>>>>>>>>>>> <Roger.Bivand at nhh.no>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ned:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The three bands are most likely treated as 4-byte integers, so
>>>>>>>>>>>> the
>>>>>>>>>>>> object
>>>>>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB.
>>>>>>>>>>>> They
>>>>>>>>>>>> may
>>>>>>>>>>>> get copied too. There are no single byte user-level containers
>>>>>>>>>>>> for
>>>>>>>>>>>> you
>>>>>>>>>>>> (there is a raw data type, but you can't calculate with it).
>>>>>>>>>>>> Possibly
>>>>>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>>>>>> operation,
>>>>>>>>>>>> but its hard to tell - your choice of method first adds inn all
>>>>>>>>>>>> the
>>>>>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles its
>>>>>>>>>>>> size
>>>>>>>>>>>> and
>>>>>>>>>>>> makes more copies (to 233MB for each copy). Running gc() several
>>>>>>>>>>>> times
>>>>>>>>>>>> manually between steps often helps by making the garbage
>>>>>>>>>>>> collector
>>>>>>>>>>>> more
>>>>>>>>>>>> aggressive.
>>>>>>>>>>>>
>>>>>>>>>>>> I would watch the developments in the R-Forge package "raster",
>>>>>>>>>>>> which
>>>>>>>>>>>> builds on some of these things, and try to see how that works. If
>>>>>>>>>>>> you
>>>>>>>>>>>> have
>>>>>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read
>>>>>>>>>>>> from
>>>>>>>>>>>> GRASS
>>>>>>>>>>>> - which would work with raster package functions now. Look at the
>>>>>>>>>>>> code
>>>>>>>>>>>> of
>>>>>>>>>>>> recent readRAST6 to see which incantations are needed. If you are
>>>>>>>>>>>> going
>>>>>>>>>>>> to
>>>>>>>>>>>> use randomForest for prediction, you can use smaller tiles until
>>>>>>>>>>>> you
>>>>>>>>>>>> find
>>>>>>>>>>>> an alternative solution. Note that feeding a data frame of
>>>>>>>>>>>> integers
>>>>>>>>>>>> to
>>>>>>>>>>>> a
>>>>>>>>>>>> model fitting or prediction function will result in coercion to a
>>>>>>>>>>>> matrix of doubles, so your subsequent workflow should take that
>>>>>>>>>>>> into
>>>>>>>>>>>> account.
>>>>>>>>>>>> Getting more memory is another option, and may be very cost and
>>>>>>>>>>>> especially
>>>>>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>>>>>> memory
>>>>>>>>>>>> may
>>>>>>>>>>>> save you time programming around too little memory.
>>>>>>>>>>>>
>>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>>
>>>>>>>>>>>> Roger
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned
>>>>>>>>>>>> Horning
>>>>>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>>>>>
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to read an image from GRASS using the spgrass6
>>>>>>>>>>>> command
>>>>>>>>>>>> readRAST6 and then convert it into a data.frame object so I can
>>>>>>>>>>>> use
>>>>>>>>>>>> it
>>>>>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>>>>>>>> columns x 3 bands. It's a small subset of a larger image I would
>>>>>>>>>>>> like
>>>>>>>>>>>> to
>>>>>>>>>>>> use eventually. I have no problem reading the image using
>>>>>>>>>>>> readRAST6
>>>>>>>>>>>> but
>>>>>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs
>>>>>>>>>>>> for a
>>>>>>>>>>>> couple hours before I kill the process. The image is less than
>>>>>>>>>>>> 25MB
>>>>>>>>>>>> so
>>>>>>>>>>>> I'm surprised it requires this level of memory. Can someone let
>>>>>>>>>>>> me
>>>>>>>>>>>> know
>>>>>>>>>>>> why this is. Should I use something other than the GRASS
>>>>>>>>>>>> interface
>>>>>>>>>>>> for
>>>>>>>>>>>> this? These are the commands I'm using:
>>>>>>>>>>>>
>>>>>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>>>>>
>>>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>>>
>>>>>>>>>>>> All the best,
>>>>>>>>>>>>
>>>>>>>>>>>> Ned
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>
>>>
>>>
>>
>
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list