[R-sig-Geo] SpatialGridDataFrame to data.frame
Roger Bivand
Roger.Bivand at nhh.no
Thu Feb 12 10:39:16 CET 2009
On Thu, 12 Feb 2009, Ned Horning wrote:
> Roger and Robert,
>
> Thanks for the help. Once again I am close but I can't figure out how to use
> the attribute data to control the sampling. I'd like to stratify the sampling
> by attribute value. In the Shapefile I have an integer attribute "covertype".
> There can be several polygons with the same covertype ID. What I want to do
> is get 100 sample points from polygons of covertype=1, 100 points from
> polygons of covertype=2... Is that possible? I looked at readShapePoly,
> readORG, and the suite of tools in spsample and but didn't see what I was
> looking for.
>
> It looks like overlay() is a good way to assign the value of covertype to
> coordinate pairs.
>
> One solution to fix all of this is to have a separate Shapefile for each
> covertype but I'd like to avoid that if possible.
If you subset the SpatialPolygonsDataFrame object by covertype, and then
spsample() for the subset, you might get a bit nearer. Subset using the [
operator as for data.frames:
X_forest <- X[X$covertype="forest",]
forest_pts <- spsample(X_forest, n=100)
(untried)
Using lapply, you could probably step along the factor (categorical)
levels of covertype if there are many and (say) n is fixed.
Roger
>
> Ned
>
> Roger Bivand wrote:
>> On Thu, 12 Feb 2009, Robert Hijmans wrote:
>>
>>> Hi Ned,
>>>
>>> Good to hear that. For your other question, have a look at:
>>>
>>> require(maptools)
>>> ?readShapePoly
>>
>> Or more generally readOGR() in rgdal for another use of the underlying
>> shapelib code.
>>
>>>
>>> require(sp)
>>> ?sample.Polygons
>>>
>>
>> Most likely the spsample() method will be enough, without having to pick a
>> specific method - but if necessary the Polygons objects can be dissolved
>> using unionSpatialPolygons() in maptools.
>>
>> Roger
>>
>>> Robert
>>>
>>>
>>> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>>>> Robert,
>>>>
>>>> This worked - thanks. It's always uplifting to see an actual image after
>>>> working on something for a while. Now I can start playing with parameters
>>>> and playing with different approaches. I'm just (re)starting my R
>>>> education
>>>> and I'm pretty slow getting the hang of it but your examples help a lot.
>>>> They also give me more avenues to discover other functionality and
>>>> different
>>>> ways of doing things. I hope I can keep working with R and GRASS until I
>>>> know it this time. My goal is to get to the point where I can be
>>>> productive
>>>> with these packages and start training other folks.
>>>>
>>>> The raster package looks very nice and I'll keep an eye on its
>>>> development.
>>>> One step that I am currently doing in GRASS is to read a Shapefile with
>>>> training data (polygons with an integer attribute representing a land
>>>> cover
>>>> type) and then randomly create 100 points within each set of polygons
>>>> representing a specific land cover type. I do this for each land cover
>>>> type
>>>> and then concatenate the results into a text file. This file has the
>>>> coordinates that I use in xyValues to get the pixels values from the SPOT
>>>> image. Is there a way to do this sampling using the raster package or
>>>> another R package that you are familiar with? In GRASS I convert the
>>>> Shapefile to a raster image before doing the random sampling and it would
>>>> be
>>>> nice if I could skip this step.
>>>>
>>>> All the best,
>>>>
>>>> Ned
>>>>
>>>> Robert Hijmans wrote:
>>>>>
>>>>> I see, in my example, I had a single quantitative variable but you
>>>>> probably have land cover classes or something like that. If the
>>>>> classes are in fact numbers stored as text then use
>>>>>
>>>>> pred <- as.numeric(pred)
>>>>>
>>>>> but if you have words such as 'forest', 'crops', 'water' you could do
>>>>> something like
>>>>>
>>>>> ...
>>>>> pred <- predict(randfor, rowvals)
>>>>> pred[pred=='forest'] <- 1
>>>>> pred[pred=='crops'] <- 2
>>>>> pred[pred=='water'] <- 3
>>>>> pred <- as.numeric(pred)
>>>>> predrast <- setValues(predrast, pred, r)
>>>>> ...
>>>>>
>>>>> not pretty, you could also fit RF with classes that can be interpreted
>>>>> as numbers..
>>>>> Make sure you do not get:
>>>>>
>>>>> Warning message:
>>>>> NAs introduced by coercion
>>>>>
>>>>>
>>>>> which would suggest that some character values could not be
>>>>> transformed to numbers.
>>>>>
>>>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>>>>>
>>>>>>
>>>>>> Robert,
>>>>>>
>>>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got another
>>>>>> error: values must be numeric, integer or logical.
>>>>>>
>>>>>> class(pred) = "factor"
>>>>>> dim(pred) = NULL
>>>>>> class(v) = "character"
>>>>>> length(v) == ncol(spot) = TRUE
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ned
>>>>>>
>>>>>> Robert Hijmans wrote:
>>>>>>
>>>>>>>
>>>>>>> Strange. You could try
>>>>>>> predrast <- setValues(predrast, as.vector(pred), r)
>>>>>>>
>>>>>>> But it would be good to know what pred is.
>>>>>>>
>>>>>>> Can you do this:
>>>>>>>
>>>>>>> class(pred)
>>>>>>> dim(pred)
>>>>>>> v <- as.vector(pred)
>>>>>>> class(v)
>>>>>>> length(v) == ncol(spot)
>>>>>>>
>>>>>>>
>>>>>>> Robert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Robert and Roger,
>>>>>>>>
>>>>>>>> Thanks for the information and pointers. The raster package looks
>>>>>>>> quite
>>>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>>>> capabilities.
>>>>>>>> Are
>>>>>>>> the man pages the best way to do that or is that a single document
>>>>>>>> available?
>>>>>>>>
>>>>>>>> I made some progress but still have some questions. I followed the
>>>>>>>> steps
>>>>>>>> laid out by Robert and everything went fine except I ran into an
>>>>>>>> error
>>>>>>>> with
>>>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when I
>>>>>>>> tried
>>>>>>>> processing one line at a time and "r <- setValues(r, pred)" when I
>>>>>>>> ran
>>>>>>>> the
>>>>>>>> full image in one go. The error was: "values must be a vector." Any
>>>>>>>> idea
>>>>>>>> what I'm doing wrong?
>>>>>>>>
>>>>>>>> I tried to read the GRASS files directly but got a message saying it
>>>>>>>> is
>>>>>>>> not
>>>>>>>> a supported file format. Can you confirm that is the case or am I
>>>>>>>> doing
>>>>>>>> something wrong? I was able to read a tiff version of the image. I am
>>>>>>>> able
>>>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>>>
>>>>>>>> Thanks again for the help.
>>>>>>>>
>>>>>>>> Ned
>>>>>>>>
>>>>>>>>
>>>>>>>> Robert Hijmans wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ned,
>>>>>>>>>
>>>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>>>> raster package (for the simple case that there are no NA values in
>>>>>>>>> the
>>>>>>>>> raster data; if there are, you have to into account that "predict'
>>>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>>>
>>>>>>>>> Robert
>>>>>>>>>
>>>>>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>>>>>> require(raster)
>>>>>>>>> require(randomForest)
>>>>>>>>>
>>>>>>>>> # for single band files
>>>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>>>> # for multiple band files
>>>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>>>>>> c(1,2,3) )
>>>>>>>>>
>>>>>>>>> # simulate random points and values to model with
>>>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>>>> response <- runif(100) * 100
>>>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>>>
>>>>>>>>> # run RandomForest
>>>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>>>
>>>>>>>>> # apply the prediction, row by row
>>>>>>>>> predrast <- setRaster(spot)
>>>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>>> spot <- readRow(spot, r)
>>>>>>>>> rowvals <- values(spot, names=TRUE)
>>>>>>>>> # this next line should not be necessary, but it is
>>>>>>>>> # I'll fix that
>>>>>>>>> colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>>> pred <- predict(randfor, rowvals)
>>>>>>>>> predrast <- setValues(predrast, pred, r)
>>>>>>>>> predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> plot(predrast)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ned:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The three bands are most likely treated as 4-byte integers, so the
>>>>>>>>>> object
>>>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB.
>>>>>>>>>> They
>>>>>>>>>> may
>>>>>>>>>> get copied too. There are no single byte user-level containers for
>>>>>>>>>> you
>>>>>>>>>> (there is a raw data type, but you can't calculate with it).
>>>>>>>>>> Possibly
>>>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>>>> operation,
>>>>>>>>>> but its hard to tell - your choice of method first adds inn all the
>>>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles its
>>>>>>>>>> size
>>>>>>>>>> and
>>>>>>>>>> makes more copies (to 233MB for each copy). Running gc() several
>>>>>>>>>> times
>>>>>>>>>> manually between steps often helps by making the garbage collector
>>>>>>>>>> more
>>>>>>>>>> aggressive.
>>>>>>>>>>
>>>>>>>>>> I would watch the developments in the R-Forge package "raster",
>>>>>>>>>> which
>>>>>>>>>> builds on some of these things, and try to see how that works. If
>>>>>>>>>> you
>>>>>>>>>> have
>>>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read
>>>>>>>>>> from
>>>>>>>>>> GRASS
>>>>>>>>>> - which would work with raster package functions now. Look at the
>>>>>>>>>> code
>>>>>>>>>> of
>>>>>>>>>> recent readRAST6 to see which incantations are needed. If you are
>>>>>>>>>> going
>>>>>>>>>> to
>>>>>>>>>> use randomForest for prediction, you can use smaller tiles until
>>>>>>>>>> you
>>>>>>>>>> find
>>>>>>>>>> an alternative solution. Note that feeding a data frame of integers
>>>>>>>>>> to
>>>>>>>>>> a
>>>>>>>>>> model fitting or prediction function will result in coercion to a
>>>>>>>>>> matrix of doubles, so your subsequent workflow should take that
>>>>>>>>>> into
>>>>>>>>>> account.
>>>>>>>>>> Getting more memory is another option, and may be very cost and
>>>>>>>>>> especially
>>>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>>>> memory
>>>>>>>>>> may
>>>>>>>>>> save you time programming around too little memory.
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Roger
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>>>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>>>
>>>>>>>>>> Greetings,
>>>>>>>>>>
>>>>>>>>>> I am trying to read an image from GRASS using the spgrass6 command
>>>>>>>>>> readRAST6 and then convert it into a data.frame object so I can use
>>>>>>>>>> it
>>>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>>>>>> columns x 3 bands. It's a small subset of a larger image I would
>>>>>>>>>> like
>>>>>>>>>> to
>>>>>>>>>> use eventually. I have no problem reading the image using readRAST6
>>>>>>>>>> but
>>>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for
>>>>>>>>>> a
>>>>>>>>>> couple hours before I kill the process. The image is less than 25MB
>>>>>>>>>> so
>>>>>>>>>> I'm surprised it requires this level of memory. Can someone let me
>>>>>>>>>> know
>>>>>>>>>> why this is. Should I use something other than the GRASS interface
>>>>>>>>>> for
>>>>>>>>>> this? These are the commands I'm using:
>>>>>>>>>>
>>>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>>>
>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>
>>>>>>>>>> All the best,
>>>>>>>>>>
>>>>>>>>>> Ned
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
More information about the R-sig-Geo
mailing list