[R-sig-Geo] Raster math on a stack of large rasters

Christoph Raab craab at uni-goettingen.de
Sat Apr 8 14:33:28 CEST 2017


You can try to run the calc process based on tiles in parallel (pblapply 
package ) and delete all files in the tempdir of the raster package 
after each iteration. In the end you need to mosaic the tiles back together.


library(rgdal)
library(raster)
library(pbapply)
library(GSIF)

#set the tempdir for the raster package
#the tempdir can be cleared after a tile is processed
tempdir <- 'path_to_your_desired_tempdir'
rasterOptions(tmpdir=tempdir)

#detect number of cores
#you should adjust this according to your memory
n.cores <- detectCores()

out.file <- 'path_to_result_folder_where_the_tiles_will_be_saved'
fn <- "path_to_your_in_raster.tif"
obj <- GDALinfo(fn)

#create tiles with e.g. 2000*2000 pixels and no overlap
#you can adjust the block size in x and y direction
ras.lst <- GSIF::getSpatialTiles(obj, block.x=2000, overlap.percent = 0)

#use pbapply in order to run the processing in parallel
out.list <- pblapply(1:length(ras.lst[,1]),function(i){
   #credit goes to stackoverflow or so, but I couldn't find the source I 
got it from anymore
   offset <- c(ras.lst$offset.y[i], ras.lst$offset.x[i])
   region.dim <- c(ras.lst$region.dim.y[i],
                   ras.lst$region.dim.x[i])
   ## read a tile
   s <- readGDAL(fn, offset=offset,
                   region.dim=region.dim)

   #perfome calc
   out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename = 
paste0(out.file,toString(i),".tif"))
   #clear the tempdir
   do.call(file.remove, list(list.files(tempdir, full.names = TRUE)))

} , cl = n.cores)

#settings for the mosaic process
out.list$fun <- mean
out.list$filename <- 'final_result.tif'
#mosaic each tile in the list from pblapply
mos <- do.call(mosaic, out.list)


On 08.04.2017 12:00, r-sig-geo-request at r-project.org wrote:
> Send R-sig-Geo mailing list submissions to
> 	r-sig-geo at r-project.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> or, via email, send a message with subject or body 'help' to
> 	r-sig-geo-request at r-project.org
>
> You can reach the person managing the list at
> 	r-sig-geo-owner at r-project.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-sig-Geo digest..."
>
>
> Today's Topics:
>
>     1. Re: Raster math on a stack of large rasters (Benjamin Leutner)
>     2. Re: Raster math on a stack of large rasters (Melanie Bacou)
>     3. how to get weights for inverse distance weighting
>        (Waichler, Scott R)
>     4. Re: how to get weights for inverse distance weighting
>        (Edzer Pebesma)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 7 Apr 2017 15:50:06 +0200
> From: Benjamin Leutner <benjamin.leutner at uni-wuerzburg.de>
> To: R-mailing list <r-sig-geo at r-project.org>
> Subject: Re: [R-sig-Geo] Raster math on a stack of large rasters
> Message-ID: <205e8b66-4307-5dea-73ea-63bcd464241d at uni-wuerzburg.de>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> You could stick to the native raster format (grd), in which case calc()
> writes to your final file directly.
> Of course that doesn't change the file size issue, but saves you the
> translation step to geotiff.
>
> For the file size you could consider restricting the datatype to integer
> (see ?dataType).
> If I have reflectance data [0,1] for example; I scale them by a factor
> 10000 and then save as "INT4U" or "INT2U" (depending on your maximally
> expected value range), or "INT4S" or "INT2S" if you have negative
> values. That can bring down the filesize quite a bit, while retaining
> most of the relevant precission (beware: remaining decimal places will
> be cut-off,)
>
> e.g. calc(..., function(x) { yourcalculations(x) * 10000 }, datatype =
> "INT4S")
>
> pro tip: the argument in calc() (or more precisely in writeRaster()) is
> called datatype, not to be confused with the stand-alone function
> dataType() with a capital T. That one has bitten me many times, because
> due to the "..." argument there will be no warning if you mistype it ;-)
>
>
>
> On 06.04.2017 19:27, Gregovich, Dave P (DFG) wrote:
>> Hi,
>> I am performing a math operation on a stack of large rasters. The code below uses smaller files for illustration and reproducibility.
>> Any alternative way of performing this task that does not create huge temporary files, and perhaps cuts down on processing time, would be greatly appreciated. The 'calc'  process creates a couple of temporary raster files that are huge. The first one is 142 GB, and I don't have hard drive space for that one and the second one that begins writing during the process.
>> Thanks kindly for any advice!
>> Dave.
>>
>> #create raster stack and coefficients...
>> library(raster)
>> mod.coefs <- rnorm(10)
>> s <- stack()
>> r <- raster(nrow = 100, ncol = 100)
>> #actual rasters I am working with are about 40000, pixels square, with each GeoTiff raster in the stack taking about 2.5 GB on disk
>>
>> for(i in 1:10){
>>     r[] <- rnorm(10000)
>>     s <- addLayer(s, r)
>> }
>>
>> #attempt to perform raster math...
>> out.file <- 'C:/ out.rast.tif'
>> out.rast <- calc(s, function(x){exp(sum(mod.coefs * x))}, filename = out.file)
>>
>> #at this point, the temporary files in the \AppData\Local\Temp\RtmpqQYzfS\raster folder eventually become quite large,
>> #with one .GRI file reaching 142 GB, and another now growing to 8 GB before I ended the process
>> #the file 'out.file' has not been created at that point.
>> #_____________end____________________________________________________________________
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>



More information about the R-sig-Geo mailing list