[R-sig-Geo] SpatialGridDataFrame to data.frame

Robert Hijmans r.hijmans at gmail.com
Wed Feb 11 13:12:41 CET 2009


Ned,

This is an example of running a RandomForest prediction with the
raster package (for the simple case that there are no NA values in the
raster data; if there are, you have to into account that "predict'
does not return any values (not even NA) for those cells).

Robert

#install.packages("raster", repos="http://R-Forge.R-project.org")
require(raster)
require(randomForest)

# for single band files
spot <- stack('b1.tif', 'b2.tif', 'b3.tif')
# for multiple band files
# spot <- stackFromFiles(c('bands.tif', 'bands.tif', 'bands.tif'), c(1,2,3) )

# simulate random points and values to model with
xy <- xyFromCell(spot, round(runif(100) * ncell(spot)))
response <- runif(100) * 100
# read values of raster layers at points, and bind to respinse
trainvals <- cbind(response, xyValues(spot, xy))

# run RandomForest
randfor <- randomForest(response ~ b1 + b2 + b3, data=trainvals)

# apply the prediction, row by row
predrast <- setRaster(spot)
filename(predrast) <- 'RF_pred.grd'
for (r in 1:nrow(spot)) {
	spot <- readRow(spot, r)
	rowvals <- values(spot, names=TRUE)
# this next line should not be necessary, but it is
# I'll fix that
	colnames(rowvals) <- c('b1', 'b2', 'b3')
	pred <- predict(randfor, rowvals)
	predrast <- setValues(predrast, pred, r)
	predrast <- writeRaster(predrast, overwrite=TRUE)
}

plot(predrast)




On Wed, Feb 11, 2009 at 5:09 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> Ned:
>
>
> The three bands are most likely treated as 4-byte integers, so the object
> will be 2732 by 3058 by 3 by 4 plus a little bit. That's 100MB. They may
> get copied too. There are no single byte user-level containers for you
> (there is a raw data type, but you can't calculate with it). Possibly
> saying spot_frame <- slot(spot, "data") will save one copying operation,
> but its hard to tell - your choice of method first adds inn all the
> coordinates, which are 8-byte numbers, so more than doubles its size and
> makes more copies (to 233MB for each copy). Running gc() several times
> manually between steps often helps by making the garbage collector more
> aggressive.
>
> I would watch the developments in the R-Forge package "raster", which
> builds on some of these things, and try to see how that works. If you have
> the GDAL-GRASS plugin for rasters, you can use readGDAL to read from GRASS
> - which would work with raster package functions now. Look at the code of
> recent readRAST6 to see which incantations are needed. If you are going to
> use randomForest for prediction, you can use smaller tiles until you find
> an alternative solution. Note that feeding a data frame of integers to a
> model fitting or prediction function will result in coercion to a
> matrix of doubles, so your subsequent workflow should take that into
> account.
>  Getting more memory is another option, and may be very cost and especially
> time effective - at the moment your machine is swapping. Buying memory may
> save you time programming around too little memory.
>
> Hope this helps,
>
> Roger
>
>
> ---
> Roger Bivand, NHH, Helleveien 30, N-5045 Bergen,
> Roger.Bivand at nhh.no
>
>
>
> -----Original Message-----
> From: r-sig-geo-bounces at stat.math.ethz.ch on behalf of Ned Horning
> Sent: Wed 11.02.2009 07:40
> To: r-sig-geo at stat.math.ethz.ch
> Subject: [R-sig-Geo] SpatialGridDataFrame to data.frame
>
> Greetings,
>
> I am trying to read an image from GRASS using the spgrass6 command
> readRAST6 and then convert it into a data.frame object so I can use it
> with randomForest. The byte image I'm reading is 2732 rows x 3058
> columns x 3 bands. It's a small subset of a larger image I would like to
> use eventually. I have no problem reading the image using readRAST6 but
> when I try to convert it to a data.frame object my linux system
> resources (1BG RAM, 3GB swap) nearly get maxed out and it runs for a
> couple hours before I kill the process. The image is less than 25MB so
> I'm surprised it requires this level of memory. Can someone let me know
> why this is. Should I use something other than the GRASS interface for
> this? These are the commands I'm using:
>
> spot <- readRAST6(c("subset.red", "subset.green", "subset.blue"))
> spot_frame <- as(spot, "data.frame")
>
> Any help would be appreciated.
>
> All the best,
>
> Ned
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>



More information about the R-sig-Geo mailing list