[R-sig-Geo] SpatialGridDataFrame to data.frame

Ned Horning horning at amnh.org
Thu Feb 12 07:30:51 CET 2009


Robert,

This worked - thanks. It's always uplifting to see an actual image after 
working on something for a while. Now I can start playing with 
parameters and playing with different approaches. I'm just (re)starting 
my R education and I'm pretty slow getting the hang of it but your 
examples help a lot. They also give me more avenues to discover other 
functionality and different ways of doing things. I hope I can keep 
working with R and GRASS until I know it this time. My goal is to get to 
the point where I can be productive with these packages and start 
training other folks.

The raster package looks very nice and I'll keep an eye on its 
development. One step that I am currently doing in GRASS is to read a 
Shapefile with training data (polygons with an integer attribute 
representing a land cover type) and then randomly create 100 points 
within each set of polygons representing a specific land cover type. I 
do this for each land cover type and then concatenate the results into a 
text file. This file has the coordinates that I use in xyValues to get 
the pixels values from the SPOT image. Is there a way to do this 
sampling using the raster package or another R package that you are 
familiar with? In GRASS I convert the Shapefile to a raster image before 
doing the random sampling and it would be nice if I could skip this step.

All the best,

Ned

Robert Hijmans wrote:
> I see, in my example, I had a single quantitative variable but you
> probably have land cover classes or something like that. If the
> classes are in fact numbers stored as text then use
>
> pred <- as.numeric(pred)
>
> but if you have words such as 'forest', 'crops', 'water' you could do
> something like
>
> ...
>    pred <- predict(randfor, rowvals)
>    pred[pred=='forest'] <- 1
>    pred[pred=='crops'] <- 2
>    pred[pred=='water'] <- 3
>    pred <- as.numeric(pred)
>    predrast <- setValues(predrast, pred, r)
> ...
>
> not pretty, you could also fit RF with classes that can be interpreted
> as numbers..
> Make sure you do not get:
>
> Warning message:
> NAs introduced by coercion
>
>
> which would suggest that some character values could not be
> transformed to numbers.
>
> On Thu, Feb 12, 2009 at 1:56 AM, Ned Horning <horning at amnh.org> wrote:
>   
>> Robert,
>>
>> Using predrast <- setValues(predrast, as.vector(pred), r) I got another
>> error: values must be numeric, integer or logical.
>>
>> class(pred) = "factor"
>> dim(pred) = NULL
>> class(v) = "character"
>> length(v) == ncol(spot) = TRUE
>>
>>
>>
>>
>>
>> Ned
>>
>> Robert Hijmans wrote:
>>     
>>> Strange. You could try
>>>      predrast <- setValues(predrast, as.vector(pred), r)
>>>
>>> But it would be good to know what pred is.
>>>
>>> Can you do this:
>>>
>>> class(pred)
>>> dim(pred)
>>> v <- as.vector(pred)
>>> class(v)
>>> length(v) == ncol(spot)
>>>
>>>
>>> Robert
>>>
>>>
>>>
>>>
>>> On Wed, Feb 11, 2009 at 11:16 PM, Ned Horning <horning at amnh.org> wrote:
>>>
>>>       
>>>> Robert and Roger,
>>>>
>>>> Thanks for the information and pointers. The raster package looks quite
>>>> interesting and I'll try to get up to speed on some of its capabilities.
>>>> Are
>>>> the man pages the best way to do that or is that a single document
>>>> available?
>>>>
>>>> I made some progress but still have some questions. I followed the steps
>>>> laid out by Robert and everything went fine except I ran into an error
>>>> with
>>>> "predrast <- setValues(predrast, pred, r)" in the for loop when I tried
>>>> processing one line at a time and "r <- setValues(r, pred)" when I ran
>>>> the
>>>> full image in one go. The error was: "values must be a vector." Any idea
>>>> what I'm doing wrong?
>>>>
>>>> I tried to read the GRASS files directly but got a message saying it is
>>>> not
>>>> a supported file format. Can you confirm that is the case or am I doing
>>>> something wrong? I was able to read a tiff version of the image. I am
>>>> able
>>>> to run gdalinfo on GRASS files just fine from a terminal window.
>>>>
>>>> Thanks again for the help.
>>>>
>>>> Ned
>>>>
>>>>
>>>> Robert Hijmans wrote:
>>>>
>>>>         
>>>>> Ned,
>>>>>
>>>>> This is an example of running a RandomForest prediction with the
>>>>> raster package (for the simple case that there are no NA values in the
>>>>> raster data; if there are, you have to into account that "predict'
>>>>> does not return any values (not even NA) for those cells).
>>>>>
>>>>> Robert
>>>>>
>>>>> #install.packages("raster", repos="http://R-Forge.R-project.org")
>>>>> require(raster)
>>>>> require(randomForest)
>>>>>
>>>>> # for single band files
>>>>> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
>>>>> # for multiple band files
>>>>> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'),
>>>>> c(1,2,3) )
>>>>>
>>>>> # simulate random points and values to model with
>>>>> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
>>>>> response <- runif(100) * 100
>>>>> # read values of raster layers at points, and bind to respinse
>>>>> trainvals <- cbind(response, xyValues(spot, xy))
>>>>>
>>>>> # run RandomForest
>>>>> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>>>>>
>>>>> # apply the prediction, row by row
>>>>> predrast <- setRaster(spot)
>>>>> filename(predrast) <- 'RF_pred.grd'
>>>>> for (r in 1:nrow(spot)) {
>>>>>       spot <- readRow(spot, r)
>>>>>       rowvals <- values(spot, names=TRUE)
>>>>> # this next line should not be necessary, but it is
>>>>> # I'll fix that
>>>>>       colnames(rowvals) <- c('b1', 'b2', 'b3')
>>>>>       pred <- predict(randfor, rowvals)
>>>>>       predrast <- setValues(predrast, pred, r)
>>>>>       predrast <- writeRaster(predrast, overwrite=TRUE)
>>>>> }
>>>>>
>>>>> plot(predrast)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Ned:
>>>>>>
>>>>>>
>>>>>> The three bands are most likely treated as 4-byte integers, so the
>>>>>> object
>>>>>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. They
>>>>>> may
>>>>>> get copied too. There are no single byte user-level containers for you
>>>>>> (there is a raw data type, but you can't calculate with it). Possibly
>>>>>> saying spot_frame <- slot(spot, "data") will save one copying
>>>>>> operation,
>>>>>> but its hard to tell - your choice of method first adds inn all the
>>>>>> coordinates, which are 8-byte numbers, so more than doubles its size
>>>>>> and
>>>>>> makes more copies (to 233MB for each copy). Running gc() several times
>>>>>> manually between steps often helps by making the garbage collector more
>>>>>> aggressive.
>>>>>>
>>>>>> I would watch the developments in the R-Forge package "raster", which
>>>>>> builds on some of these things, and try to see how that works. If you
>>>>>> have
>>>>>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read from
>>>>>> GRASS
>>>>>> - which would work with raster package functions now. Look at the code
>>>>>> of
>>>>>> recent readRAST6 to see which incantations are needed. If you are going
>>>>>> to
>>>>>> use randomForest for prediction, you can use smaller tiles until you
>>>>>> find
>>>>>> an alternative solution. Note that feeding a data frame of integers to
>>>>>> a
>>>>>> model fitting or prediction function will result in coercion to a
>>>>>> matrix of doubles, so your subsequent workflow should take that into
>>>>>> account.
>>>>>>  Getting more memory is another option, and may be very cost and
>>>>>> especially
>>>>>> time effective - at the moment your machine is swapping. Buying memory
>>>>>> may
>>>>>> save you time programming around too little memory.
>>>>>>
>>>>>> Hope this helps,
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>>>>>> Roger.Bivand at nhh.no
>>>>>>
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>>>>>> Sent: Wed 11.02.2009 07:40
>>>>>> To: r-sig-geo at stat.math.ethz.ch
>>>>>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> I am trying to read an image from GRASS using the spgrass6 command
>>>>>> readRAST6 and then convert it into a data.frame object so I can use it
>>>>>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>>>>>> columns x 3 bands. It's a small subset of a larger image I would like
>>>>>> to
>>>>>> use eventually. I have no problem reading the image using readRAST6 but
>>>>>> when I try to convert it to a data.frame object my linux system
>>>>>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a
>>>>>> couple hours before I kill the process. The image is less than 25MB so
>>>>>> I'm surprised it requires this level of memory. Can someone let me know
>>>>>> why this is. Should I use something other than the GRASS interface for
>>>>>> this? These are the commands I'm using:
>>>>>>
>>>>>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>>>>>> spot_frame <- as(spot, "data.frame")
>>>>>>
>>>>>> Any help would be appreciated.
>>>>>>
>>>>>> All the best,
>>>>>>
>>>>>> Ned
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>>>>         
>>>       
>>     
>
>



More information about the R-sig-Geo mailing list