[R-sig-Geo] SpatialGridDataFrame to data.frame
Ned Horning
horning at amnh.org
Thu Feb 12 11:22:39 CET 2009
Roger,
Thanks - It looks like this will do the trick. I had to change
[X$covertype="forest",] to [X$covertype=="forest",] but it appears to
work just fine.
Ned
Roger Bivand wrote:
> On Thu, 12 Feb 2009, Ned Horning wrote:
>
>> Roger and Robert,
>>
>> Thanks for the help. Once again I am close but I can't figure out how
>> to use the attribute data to control the sampling. I'd like to
>> stratify the sampling by attribute value. In the Shapefile I have an
>> integer attribute "covertype". There can be several polygons with the
>> same covertype ID. What I want to do is get 100 sample points from
>> polygons of covertype=1, 100 points from polygons of covertype=2...
>> Is that possible? I looked at readShapePoly, readORG, and the suite
>> of tools in spsample and but didn't see what I was looking for.
>>
>> It looks like overlay() is a good way to assign the value of
>> covertype to coordinate pairs.
>>
>> One solution to fix all of this is to have a separate Shapefile for
>> each covertype but I'd like to avoid that if possible.
>
> If you subset the SpatialPolygonsDataFrame object by covertype, and
> then spsample() for the subset, you might get a bit nearer. Subset
> using the [ operator as for data.frames:
>
> X_forest <- X[X$covertype="forest",]
> forest_pts <- spsample(X_forest, n=100)
>
> (untried)
>
> Using lapply, you could probably step along the factor (categorical)
> levels of covertype if there are many and (say) n is fixed.
>
> Roger
>
>
>>
>> Ned
>>
>> Roger Bivand wrote:
>>> On Thu, 12 Feb 2009, Robert Hijmans wrote:
>>>
>>>> Hi Ned,
>>>>
>>>> Good to hear that. For your other question, have a look at:
>>>>
>>>> require(maptools)
>>>> ?readShapePoly
>>>
>>> Or more generally readOGR() in rgdal for another use of the
>>> underlying shapelib code.
>>>
>>>>
>>>> require(sp)
>>>> ?sample.Polygons
>>>>
>>>
>>> Most likely the spsample() method will be enough, without having to
>>> pick a specific method - but if necessary the Polygons objects can
>>> be dissolved using unionSpatialPolygons() in maptools.
>>>
>>> Roger
>>>
>>>> Robert
>>>>
>>>>
>>>> On Thu, Feb 12, 2009 at 2:30 PM, Ned Horning <horning at amnh.org> wrote:
>>>>> Robert,
>>>>>
>>>>> This worked - thanks. It's always uplifting to see an actual image
>>>>> after
>>>>> working on something for a while. Now I can start playing with
>>>>> parameters
>>>>> and playing with different approaches. I'm just (re)starting my R
>>>>> education
>>>>> and I'm pretty slow getting the hang of it but your examples help
>>>>> a lot.
>>>>> They also give me more avenues to discover other functionality and
>>>>> different
>>>>> ways of doing things. I hope I can keep working with R and GRASS
>>>>> until I
>>>>> know it this time. My goal is to get to the point where I can be
>>>>> productive
>>>>> with these packages and start training other folks.
>>>>>
>>>>> The raster package looks very nice and I'll keep an eye on its
>>>>> development.
>>>>> One step that I am currently doing in GRASS is to read a Shapefile
>>>>> with
>>>>> training data (polygons with an integer attribute representing a
>>>>> land cover
>>>>> type) and then randomly create 100 points within each set of polygons
>>>>> representing a specific land cover type. I do this for each land
>>>>> cover type
>>>>> and then concatenate the results into a text file. This file has the
>>>>> coordinates that I use in xyValues to get the pixels values from
>>>>> the SPOT
>>>>> image. Is there a way to do this sampling using the raster package or
>>>>> another R package that you are familiar with? In GRASS I convert the
>>>>> Shapefile to a raster image before doing the random sampling and
>>>>> it would be
>>>>> nice if I could skip this step.
>>>>>
>>>>> All the best,
>>>>>
>>>>> Ned
>>>>>
>>>>> Robert Hijmans wrote:
>>>>>>
>>>>>> I see, in my example, I had a single quantitative variable but you
>>>>>> probably have land cover classes or something like that. If the
>>>>>> classes are in fact numbers stored as text then use
>>>>>>
>>>>>> pred <- as.numeric(pred)
>>>>>>
>>>>>> but if you have words such as 'forest', 'crops', 'water' you
>>>>>> could do
>>>>>> something like
>>>>>>
>>>>>> ...
>>>>>> pred <- predict(randfor, rowvals)
>>>>>> pred[pred=='forest'] <- 1
>>>>>> pred[pred=='crops'] <- 2
>>>>>> pred[pred=='water'] <- 3
>>>>>> pred <- as.numeric(pred)
>>>>>> predrast <- setValues(predrast, pred, r)
>>>>>> ...
>>>>>>
>>>>>> not pretty, you could also fit RF with classes that can be
>>>>>> interpreted
>>>>>> as numbers..
>>>>>> Make sure you do not get:
>>>>>>
>>>>>> Warning message:
>>>>>> NAs introduced by coercion
>>>>>>
>>>>>>
>>>>>> which would suggest that some character values could not be
>>>>>> transformed to numbers.
>>>>>>
>>>>>> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Robert,
>>>>>>>
>>>>>>> Using predrast <- setValues(predrast, as.vector(pred), r) I got
>>>>>>> another
>>>>>>> error: values must be numeric, integer or logical.
>>>>>>>
>>>>>>> class(pred) = "factor"
>>>>>>> dim(pred) = NULL
>>>>>>> class(v) = "character"
>>>>>>> length(v) == ncol(spot) = TRUE
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ned
>>>>>>>
>>>>>>> Robert Hijmans wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Strange. You could try
>>>>>>>> predrast <- setValues(predrast, as.vector(pred), r)
>>>>>>>>
>>>>>>>> But it would be good to know what pred is.
>>>>>>>>
>>>>>>>> Can you do this:
>>>>>>>>
>>>>>>>> class(pred)
>>>>>>>> dim(pred)
>>>>>>>> v <- as.vector(pred)
>>>>>>>> class(v)
>>>>>>>> length(v) == ncol(spot)
>>>>>>>>
>>>>>>>>
>>>>>>>> Robert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning
>>>>>>>> <horning at amnh.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Robert and Roger,
>>>>>>>>>
>>>>>>>>> Thanks for the information and pointers. The raster package
>>>>>>>>> looks quite
>>>>>>>>> interesting and I'll try to get up to speed on some of its
>>>>>>>>> capabilities.
>>>>>>>>> Are
>>>>>>>>> the man pages the best way to do that or is that a single
>>>>>>>>> document
>>>>>>>>> available?
>>>>>>>>>
>>>>>>>>> I made some progress but still have some questions. I followed
>>>>>>>>> the
>>>>>>>>> steps
>>>>>>>>> laid out by Robert and everything went fine except I ran into
>>>>>>>>> an error
>>>>>>>>> with
>>>>>>>>> "predrast <- setValues(predrast, pred, r)" in the for loop
>>>>>>>>> when I tried
>>>>>>>>> processing one line at a time and "r <- setValues(r, pred)"
>>>>>>>>> when I ran
>>>>>>>>> the
>>>>>>>>> full image in one go. The error was: "values must be a
>>>>>>>>> vector." Any
>>>>>>>>> idea
>>>>>>>>> what I'm doing wrong?
>>>>>>>>>
>>>>>>>>> I tried to read the GRASS files directly but got a message
>>>>>>>>> saying it is
>>>>>>>>> not
>>>>>>>>> a supported file format. Can you confirm that is the case or
>>>>>>>>> am I doing
>>>>>>>>> something wrong? I was able to read a tiff version of the
>>>>>>>>> image. I am
>>>>>>>>> able
>>>>>>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>>>>>>
>>>>>>>>> Thanks again for the help.
>>>>>>>>>
>>>>>>>>> Ned
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Robert Hijmans wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ned,
>>>>>>>>>>
>>>>>>>>>> This is an example of running a RandomForest prediction with the
>>>>>>>>>> raster package (for the simple case that there are no NA
>>>>>>>>>> values in the
>>>>>>>>>> raster data; if there are, you have to into account that
>>>>>>>>>> "predict'
>>>>>>>>>> does not return any values (not even NA) for those cells).
>>>>>>>>>>
>>>>>>>>>> Robert
>>>>>>>>>>
>>>>>>>>>> #install.packages("raster",
>>>>>>>>>> repos="http://R-Forge.R-project.org")
>>>>>>>>>> require(raster)
>>>>>>>>>> require(randomForest)
>>>>>>>>>>
>>>>>>>>>> # for single band files
>>>>>>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>>>>>>> # for multiple band files
>>>>>>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif',
>>>>>>>>>> 'bands.tif'),
>>>>>>>>>> c(1,2,3) )
>>>>>>>>>>
>>>>>>>>>> # simulate random points and values to model with
>>>>>>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>>>>>>> response <- runif(100) * 100
>>>>>>>>>> # read values of raster layers at points, and bind to respinse
>>>>>>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>>>>>>
>>>>>>>>>> # run RandomForest
>>>>>>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>>>>>>
>>>>>>>>>> # apply the prediction, row by row
>>>>>>>>>> predrast <- setRaster(spot)
>>>>>>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>>>>>>> for (r in 1:nrow(spot)) {
>>>>>>>>>> spot <- readRow(spot, r)
>>>>>>>>>> rowvals <- values(spot, names=TRUE)
>>>>>>>>>> # this next line should not be necessary, but it is
>>>>>>>>>> # I'll fix that
>>>>>>>>>> colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>>>>>> pred <- predict(randfor, rowvals)
>>>>>>>>>> predrast <- setValues(predrast, pred, r)
>>>>>>>>>> predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> plot(predrast)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand
>>>>>>>>>> <Roger.Bivand at nhh.no>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ned:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The three bands are most likely treated as 4-byte integers,
>>>>>>>>>>> so the
>>>>>>>>>>> object
>>>>>>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's
>>>>>>>>>>> 100MB. They
>>>>>>>>>>> may
>>>>>>>>>>> get copied too. There are no single byte user-level
>>>>>>>>>>> containers for
>>>>>>>>>>> you
>>>>>>>>>>> (there is a raw data type, but you can't calculate with it).
>>>>>>>>>>> Possibly
>>>>>>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>>>>>>> operation,
>>>>>>>>>>> but its hard to tell - your choice of method first adds inn
>>>>>>>>>>> all the
>>>>>>>>>>> coordinates, which are 8-byte numbers, so more than doubles
>>>>>>>>>>> its size
>>>>>>>>>>> and
>>>>>>>>>>> makes more copies (to 233MB for each copy). Running gc()
>>>>>>>>>>> several
>>>>>>>>>>> times
>>>>>>>>>>> manually between steps often helps by making the garbage
>>>>>>>>>>> collector
>>>>>>>>>>> more
>>>>>>>>>>> aggressive.
>>>>>>>>>>>
>>>>>>>>>>> I would watch the developments in the R-Forge package
>>>>>>>>>>> "raster", which
>>>>>>>>>>> builds on some of these things, and try to see how that
>>>>>>>>>>> works. If you
>>>>>>>>>>> have
>>>>>>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to
>>>>>>>>>>> read from
>>>>>>>>>>> GRASS
>>>>>>>>>>> - which would work with raster package functions now. Look
>>>>>>>>>>> at the
>>>>>>>>>>> code
>>>>>>>>>>> of
>>>>>>>>>>> recent readRAST6 to see which incantations are needed. If
>>>>>>>>>>> you are
>>>>>>>>>>> going
>>>>>>>>>>> to
>>>>>>>>>>> use randomForest for prediction, you can use smaller tiles
>>>>>>>>>>> until you
>>>>>>>>>>> find
>>>>>>>>>>> an alternative solution. Note that feeding a data frame of
>>>>>>>>>>> integers
>>>>>>>>>>> to
>>>>>>>>>>> a
>>>>>>>>>>> model fitting or prediction function will result in coercion
>>>>>>>>>>> to a
>>>>>>>>>>> matrix of doubles, so your subsequent workflow should take
>>>>>>>>>>> that into
>>>>>>>>>>> account.
>>>>>>>>>>> Getting more memory is another option, and may be very cost
>>>>>>>>>>> and
>>>>>>>>>>> especially
>>>>>>>>>>> time effective - at the moment your machine is swapping. Buying
>>>>>>>>>>> memory
>>>>>>>>>>> may
>>>>>>>>>>> save you time programming around too little memory.
>>>>>>>>>>>
>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>
>>>>>>>>>>> Roger
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>>>>>>> Roger.Bivand at nhh.no
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned
>>>>>>>>>>> Horning
>>>>>>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>>>>>>
>>>>>>>>>>> Greetings,
>>>>>>>>>>>
>>>>>>>>>>> I am trying to read an image from GRASS using the spgrass6
>>>>>>>>>>> command
>>>>>>>>>>> readRAST6 and then convert it into a data.frame object so I
>>>>>>>>>>> can use
>>>>>>>>>>> it
>>>>>>>>>>> with randomForest. The byte image I'm reading is 2732 rows x
>>>>>>>>>>> 3058
>>>>>>>>>>> columns x 3 bands. It's a small subset of a larger image I
>>>>>>>>>>> would like
>>>>>>>>>>> to
>>>>>>>>>>> use eventually. I have no problem reading the image using
>>>>>>>>>>> readRAST6
>>>>>>>>>>> but
>>>>>>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it
>>>>>>>>>>> runs for a
>>>>>>>>>>> couple hours before I kill the process. The image is less
>>>>>>>>>>> than 25MB
>>>>>>>>>>> so
>>>>>>>>>>> I'm surprised it requires this level of memory. Can someone
>>>>>>>>>>> let me
>>>>>>>>>>> know
>>>>>>>>>>> why this is. Should I use something other than the GRASS
>>>>>>>>>>> interface
>>>>>>>>>>> for
>>>>>>>>>>> this? These are the commands I'm using:
>>>>>>>>>>>
>>>>>>>>>>> spot <- readRAST6(c("subset.red", "subset.green",
>>>>>>>>>>> "subset.blue"))
>>>>>>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>>>>>>
>>>>>>>>>>> Any help would be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> All the best,
>>>>>>>>>>>
>>>>>>>>>>> Ned
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> R-sig-Geo mailing list
>>>>>>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>
>>
>
More information about the R-sig-Geo
mailing list