[R-sig-Geo] model predict in subset: compare terra and stars
Robert J. Hijmans
r@h|jm@n@ @end|ng |rom gm@||@com
Fri Jan 21 07:21:14 CET 2022
Hello everybody,
I cannot answer Tim's questions, but here is a way to do the terra bit much
faster, by not writing intermediate steps to disk.
Cheers, Robert
library(terra)
library(stars)
library(sf)
library(microbenchmark)
## setup
tif = system.file("tif/L7_ETMs.tif", package = "stars")
L7_orig = read_stars(tif, proxy = TRUE) %>% split()
# create a shape within which we will want to make the prediction
circle = st_sfc(st_buffer(st_point(c(293749.5, 9115745)), 600), crs =
st_crs(L7_orig))
# simple set of points for training the model
trainCoords <- data.frame(
X = c(290050, 290100, 291700, 293000, 294500, 294600, 298000,
289500, 290100, 292500, 294000, 296000, 293600, 294800),
Y = c(9117000, 9118500, 9117100, 9118000, 9112890, 9111700, 9120000,
9112700, 9114000, 9115700, 9118100, 9120000, 9111650, 9116000))
# model for stars
trainPts <- st_as_sf(x = trainCoords, coords = c("X","Y"), crs =
st_crs(L7_orig))
trainAttr <- st_drop_geometry(st_as_sf(st_extract(L7_orig, trainPts)))
names(trainAttr) <- c("X1","X2","X3","X4","X5","X6")
trainAttr$pa <- c(rep(1,7), rep(0,7)) #add presence-absence attribute
st_model <- glm(formula=pa~.,data = trainAttr)
# model for terra
ras = terra::rast(tif)
trainAttr2 <- extract(ras, trainCoords)[,-1]
trainAttr2$pa <- c(rep(1,7), rep(0,7))
model <- glm(formula=pa~.,data = trainAttr2)
vcircle <- vect(circle)
# time both approaches
mbm <- microbenchmark("starsMethod" = {
L7_orig_inside = read_stars(tif, proxy = TRUE) %>% split()
L7_crop <- L7_orig_inside[circle]
st_out = predict(L7_crop, st_model)
# st_out is still a proxy with processing steps, writing it out forces
those steps to be applied
write_stars(st_out, file.path(tempdir(),"stars_prediction.tif"),overwrite
= TRUE)
},
"terraMethod" = {
ras = terra::rast(tif)
cras <- terra::crop(ras, vcircle)
mras <- terra::mask(cras, vcircle)
terra_out <- predict(mras, model, filename =
file.path(tempdir(),"terra_prediction.tif"), overwrite = TRUE)
}, times = 10)
mbm
#Unit: milliseconds
# expr min lq mean median uq max neval
# starsMethod 225.4070 227.4722 230.86188 231.4848 233.9374 237.5687 10
# terraMethod 93.9507 96.0912 99.59111 100.0375 103.5670 104.1225 10
With terra you can use a window ("lazy crop")
window(ras) <- ext(vcircle)
mras <- terra::mask(ras, vcircle)
# but this seems faster
cras <- terra::crop(ras, vcircle)
mras <- terra::mask(cras, vcircle)
On Thu, Jan 20, 2022 at 8:43 AM Howard, Tim G (DEC) via R-sig-Geo <
r-sig-geo using r-project.org> wrote:
> Dear all -
> Classically, if I want to create a spatial model and make a prediction for
> an area within the extent of my raster stack my approach would be to crop
> all predictor rasters to the area of interest and run the predict on that
> cropped set. The stars package opens up another option: to apply the
> crop/mask to the proxy object, which then will extract only the relevant
> cells from the full rasters when writing (or plotting) the result. At least
> that's how I understand it.
>
> The following example has both approaches and their timings. I borrowed
> heavily from https://r-spatial.github.io/stars/articles/stars7.html ,
> other stars articles, and from the terra help pages! Questions below the
> example.
>
> ```
> library(terra)
> library(stars)
> library(sf)
> library(microbenchmark)
>
> ## setup
> tif = system.file("tif/L7_ETMs.tif", package = "stars")
> L7_orig = read_stars(tif, proxy = TRUE) %>% split()
> # create a shape within which we will want to make the prediction
> circle = st_sfc(st_buffer(st_point(c(293749.5, 9115745)), 600), crs =
> st_crs(L7_orig))
> # simple set of points for training the model
> trainCoords <- data.frame(
> X = c(290050, 290100, 291700, 293000, 294500, 294600, 298000,
> 289500, 290100, 292500, 294000, 296000, 293600, 294800),
> Y = c(9117000, 9118500, 9117100, 9118000, 9112890, 9111700, 9120000,
> 9112700, 9114000, 9115700, 9118100, 9120000, 9111650, 9116000))
>
> trainPts <- st_as_sf(x = trainCoords, coords = c("X","Y"), crs =
> st_crs(L7_orig))
> trainAttr <- st_drop_geometry(st_as_sf(st_extract(L7_orig, trainPts)))
> names(trainAttr) <- c("X1","X2","X3","X4","X5","X6")
> trainAttr$pa <- c(rep(1,7), rep(0,7)) #add presence-absence attribute
>
> # time both approaches
> mbm <- microbenchmark("starsMethod" = {
> L7_orig_inside = read_stars(tif, proxy = TRUE) %>% split()
> st_model <- glm(formula=pa~.,data = trainAttr)
> L7_crop <- L7_orig_inside[circle]
> st_out = predict(L7_crop, st_model)
> # st_out is still a proxy with processing steps, writing it out forces
> those steps to be applied
> write_stars(st_out,
> file.path(tempdir(),"stars_prediction.tif"),overwrite = TRUE)
> },
> "terraMethod" = {
> ras = terra::rast(tif)
> # crop, mask, write out to temp dir
> c_ras <- terra::crop(ras, circle)
> msk <- rasterize(vect(circle), c_ras)
> maskRas <- file.path(tempdir(),"L7_ETMs_crop.tif")
> m_ras <- terra::mask(c_ras, msk, filename = maskRas, overwrite = TRUE)
> model <- glm(formula=pa~.,data = trainAttr)
> # read the cropped raster for predict (already in RAM, but this would be
> protocol)
> mras <- rast(maskRas)
> names(mras) <- c("X1","X2","X3","X4","X5","X6")
> terra_out <- predict(mras, model, filename =
> file.path(tempdir(),"terra_prediction.tif"), overwrite = TRUE)
> }, times = 10)
>
> # > mbm
> # Unit: milliseconds
> # expr min lq mean median uq max neval
> # starsMethod 411.8035 442.4342 701.2864 505.4522 630.8252 1699.737 10
> # terraMethod 94.0480 99.4800 560.6102 106.9317 132.1853 4207.067 1
> ```
>
> As run above, using stars proxies takes a fair amount longer than
> cropping, saving, and predicting on the cropped rasters. I have found this
> to be the case for larger (more real-life) examples as well, to the level
> that for larger data sets and more predictors, I run out of RAM using stars
> proxies and that approach fails.
>
> Questions:
> 1. Am I missing something? Is there a more efficient way for running a
> model and predicting within a smaller area than the full extent of your
> predictors?
> 2. On the surface of things, I would have expected the proxy approach to
> be faster, especially if using COG rasters (which allow read by chunk, not
> line). I'm not seeing that here, nor with larger datasets. Any thoughts on
> the potential bottlenecks?
>
> Thanks in advance,
> Tim
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list