[R-sig-Geo] SpatialGridDataFrame to data.frame

Ned Horning horning at amnh.org
Wed Feb 11 16:16:00 CET 2009


Robert and Roger,

Thanks for the information and pointers. The raster package looks quite 
interesting and I'll try to get up to speed on some of its capabilities. 
Are the man pages the best way to do that or is that a single document 
available?

I made some progress but still have some questions. I followed the steps 
laid out by Robert and everything went fine except I ran into an error 
with "predrast <- setValues(predrast, pred, r)" in the for loop when I 
tried processing one line at a time and "r <- setValues(r, pred)" when I 
ran the full image in one go. The error was: "values must be a vector." 
Any idea what I'm doing wrong?

I tried to read the GRASS files directly but got a message saying it is 
not a supported file format. Can you confirm that is the case or am I 
doing something wrong? I was able to read a tiff version of the image. I 
am able to run gdalinfo on GRASS files just fine from a terminal window.

Thanks again for the help.

Ned


Robert Hijmans wrote:
> Ned,
>
> This is an example of running a RandomForest prediction with the
> raster package (for the simple case that there are no NA values in the
> raster data; if there are, you have to into account that "predict'
> does not return any values (not even NA) for those cells).
>
> Robert
>
> #install.packages("raster", repos="http://R-Forge.R-project.org")
> require(raster)
> require(randomForest)
>
> # for single band files
> spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
> # for multiple band files
> # spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'), c(1,2,3) )
>
> # simulate random points and values to model with
> xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
> response <- runif(100) * 100
> # read values of raster layers at points, and bind to respinse
> trainvals <- cbind(response, xyValues(spot, xy))
>
> # run RandomForest
> randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)
>
> # apply the prediction, row by row
> predrast <- setRaster(spot)
> filename(predrast) <- 'RF_pred.grd'
> for (r in 1:nrow(spot)) {
> 	spot <- readRow(spot, r)
> 	rowvals <- values(spot, names=TRUE)
> # this next line should not be necessary, but it is
> # I'll fix that
> 	colnames(rowvals) <- c('b1', 'b2', 'b3')
> 	pred <- predict(randfor, rowvals)
> 	predrast <- setValues(predrast, pred, r)
> 	predrast <- writeRaster(predrast, overwrite=TRUE)
> }
>
> plot(predrast)
>
>
>
>
> On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>   
>> Ned:
>>
>>
>> The three bands are most likely treated as 4-byte integers, so the object
>> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. They may
>> get copied too. There are no single byte user-level containers for you
>> (there is a raw data type, but you can't calculate with it). Possibly
>> saying spot_frame <- slot(spot, "data") will save one copying operation,
>> but its hard to tell - your choice of method first adds inn all the
>> coordinates, which are 8-byte numbers, so more than doubles its size and
>> makes more copies (to 233MB for each copy). Running gc() several times
>> manually between steps often helps by making the garbage collector more
>> aggressive.
>>
>> I would watch the developments in the R-Forge package "raster", which
>> builds on some of these things, and try to see how that works. If you have
>> the GDAL-GRASS plugin for rasters, you can use readGDAL to read from GRASS
>> - which would work with raster package functions now. Look at the code of
>> recent readRAST6 to see which incantations are needed. If you are going to
>> use randomForest for prediction, you can use smaller tiles until you find
>> an alternative solution. Note that feeding a data frame of integers to a
>> model fitting or prediction function will result in coercion to a
>> matrix of doubles, so your subsequent workflow should take that into
>> account.
>>  Getting more memory is another option, and may be very cost and especially
>> time effective - at the moment your machine is swapping. Buying memory may
>> save you time programming around too little memory.
>>
>> Hope this helps,
>>
>> Roger
>>
>>
>> ---
>> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
>> Roger.Bivand at nhh.no
>>
>>
>>
>> -----Original Message-----
>> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
>> Sent: Wed 11.02.2009 07:40
>> To: r-sig-geo at stat.math.ethz.ch
>> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>>
>> Greetings,
>>
>> I am trying to read an image from GRASS using the spgrass6 command
>> readRAST6 and then convert it into a data.frame object so I can use it
>> with randomForest. The byte image I'm reading is 2732 rows x 3058
>> columns x 3 bands. It's a small subset of a larger image I would like to
>> use eventually. I have no problem reading the image using readRAST6 but
>> when I try to convert it to a data.frame object my linux system
>> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a
>> couple hours before I kill the process. The image is less than 25MB so
>> I'm surprised it requires this level of memory. Can someone let me know
>> why this is. Should I use something other than the GRASS interface for
>> this? These are the commands I'm using:
>>
>> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
>> spot_frame <- as(spot, "data.frame")
>>
>> Any help would be appreciated.
>>
>> All the best,
>>
>> Ned
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>     
>
>



More information about the R-sig-Geo mailing list