[R-sig-Geo] model predict in subset: compare terra and stars

Fri Jan 21 07:21:14 CET 2022

Hello everybody,
I cannot answer Tim's questions, but here is a way to do the terra bit much
faster, by not writing intermediate steps to disk.
Cheers, Robert

library(terra)
library(stars)
library(sf)
library(microbenchmark)

## setup
tif = system.file("tif/L7_ETMs.tif", package = "stars")
L7_orig = read_stars(tif, proxy = TRUE) %>% split()
# create a shape within which we will want to make the prediction
circle = st_sfc(st_buffer(st_point(c(293749.5, 9115745)), 600), crs =
st_crs(L7_orig))
# simple set of points for training the model
trainCoords <- data.frame(
  X = c(290050, 290100, 291700, 293000, 294500, 294600, 298000,
        289500, 290100, 292500, 294000, 296000, 293600, 294800),
  Y = c(9117000, 9118500, 9117100, 9118000, 9112890, 9111700, 9120000,
        9112700, 9114000, 9115700, 9118100, 9120000, 9111650, 9116000))

# model for stars
trainPts <- st_as_sf(x = trainCoords, coords = c("X","Y"), crs =
st_crs(L7_orig))
trainAttr <- st_drop_geometry(st_as_sf(st_extract(L7_orig, trainPts)))
names(trainAttr) <- c("X1","X2","X3","X4","X5","X6")
trainAttr$pa <- c(rep(1,7), rep(0,7)) #add presence-absence attribute
st_model <- glm(formula=pa~.,data = trainAttr)

# model for terra
ras = terra::rast(tif)
trainAttr2 <- extract(ras, trainCoords)[,-1]
trainAttr2$pa <- c(rep(1,7), rep(0,7))
model <- glm(formula=pa~.,data = trainAttr2)
vcircle <- vect(circle)

# time both approaches
mbm <- microbenchmark("starsMethod" = {
  L7_orig_inside = read_stars(tif, proxy = TRUE) %>% split()
  L7_crop <- L7_orig_inside[circle]
  st_out = predict(L7_crop, st_model)
 # st_out is still a proxy with processing steps, writing it out forces
those steps to be applied
  write_stars(st_out, file.path(tempdir(),"stars_prediction.tif"),overwrite
= TRUE)
},
"terraMethod" = {
  ras = terra::rast(tif)
  cras <- terra::crop(ras, vcircle)
  mras <- terra::mask(cras, vcircle)
  terra_out <- predict(mras, model, filename =
file.path(tempdir(),"terra_prediction.tif"), overwrite = TRUE)
}, times = 10)

mbm
#Unit: milliseconds
#        expr      min       lq      mean   median       uq      max neval
# starsMethod 225.4070 227.4722 230.86188 231.4848 233.9374 237.5687    10
# terraMethod  93.9507  96.0912  99.59111 100.0375 103.5670 104.1225    10

With terra you can use a window ("lazy crop")
  window(ras) <- ext(vcircle)
  mras <- terra::mask(ras, vcircle)
  # but this seems faster
  cras <- terra::crop(ras, vcircle)
  mras <- terra::mask(cras, vcircle)

On Thu, Jan 20, 2022 at 8:43 AM Howard, Tim G (DEC) via R-sig-Geo <
r-sig-geo using r-project.org> wrote:

> Dear all -
> Classically, if I want to create a spatial model and make a prediction for
> an area within the extent of my raster stack my approach would be to crop
> all predictor rasters to the area of interest and run the predict on that
> cropped set. The stars package opens up another option: to apply the
> crop/mask to the proxy object, which then will extract only the relevant
> cells from the full rasters when writing (or plotting) the result. At least
> that's how I understand it.
>
> The following example has both approaches and their timings. I borrowed
> heavily from https://r-spatial.github.io/stars/articles/stars7.html ,
> other stars articles, and from the terra help pages! Questions below the
> example.
>
> ```
> library(terra)
> library(stars)
> library(sf)
> library(microbenchmark)
>
> ## setup
> tif = system.file("tif/L7_ETMs.tif", package = "stars")
> L7_orig = read_stars(tif, proxy = TRUE) %>% split()
> # create a shape within which we will want to make the prediction
> circle = st_sfc(st_buffer(st_point(c(293749.5, 9115745)), 600), crs =
> st_crs(L7_orig))
> # simple set of points for training the model
> trainCoords <- data.frame(
>   X = c(290050, 290100, 291700, 293000, 294500, 294600, 298000,
>         289500, 290100, 292500, 294000, 296000, 293600, 294800),
>   Y = c(9117000, 9118500, 9117100, 9118000, 9112890, 9111700, 9120000,
>         9112700, 9114000, 9115700, 9118100, 9120000, 9111650, 9116000))
>
> trainPts <- st_as_sf(x = trainCoords, coords = c("X","Y"), crs =
> st_crs(L7_orig))
> trainAttr <- st_drop_geometry(st_as_sf(st_extract(L7_orig, trainPts)))
> names(trainAttr) <- c("X1","X2","X3","X4","X5","X6")
> trainAttr$pa <- c(rep(1,7), rep(0,7)) #add presence-absence attribute
>
> # time both approaches
> mbm <- microbenchmark("starsMethod" = {
>   L7_orig_inside = read_stars(tif, proxy = TRUE) %>% split()
>   st_model <- glm(formula=pa~.,data = trainAttr)
>   L7_crop <- L7_orig_inside[circle]
>   st_out = predict(L7_crop, st_model)
>  # st_out is still a proxy with processing steps, writing it out forces
> those steps to be applied
>   write_stars(st_out,
> file.path(tempdir(),"stars_prediction.tif"),overwrite = TRUE)
> },
> "terraMethod" = {
>   ras = terra::rast(tif)
>   # crop, mask, write out to temp dir
>   c_ras <- terra::crop(ras, circle)
>   msk <- rasterize(vect(circle), c_ras)
>   maskRas <- file.path(tempdir(),"L7_ETMs_crop.tif")
>   m_ras <- terra::mask(c_ras, msk, filename = maskRas, overwrite = TRUE)
>   model <- glm(formula=pa~.,data = trainAttr)
>   # read the cropped raster for predict (already in RAM, but this would be
> protocol)
>   mras <- rast(maskRas)
>   names(mras) <- c("X1","X2","X3","X4","X5","X6")
>   terra_out <- predict(mras, model, filename =
> file.path(tempdir(),"terra_prediction.tif"), overwrite = TRUE)
> }, times = 10)
>
> # > mbm
> # Unit: milliseconds
> # expr      min       lq     mean   median       uq      max neval
> # starsMethod 411.8035 442.4342 701.2864 505.4522 630.8252 1699.737    10
> # terraMethod  94.0480  99.4800 560.6102 106.9317 132.1853 4207.067    1
> ```
>
> As run above, using stars proxies takes a fair amount longer than
> cropping, saving, and predicting on the cropped rasters. I have found this
> to be the case for larger (more real-life) examples as well, to the level
> that for larger data sets and more predictors, I run out of RAM using stars
> proxies and that approach fails.
>
> Questions:
> 1. Am I missing something? Is there a more efficient way for running a
> model and predicting within a smaller area than the full extent of your
> predictors?
> 2. On the surface of things, I would have expected the proxy approach to
> be faster, especially if using COG rasters (which allow read by chunk, not
> line). I'm not seeing that here, nor with larger datasets. Any thoughts on
> the potential bottlenecks?
>
> Thanks in advance,
> Tim
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]