[R-sig-Geo] Tiled processing...

Thu Oct 30 20:17:54 CET 2008

Ok, I'm nearly there using only RGDAL and R-base commands, if I can get 
a bit of feedback I might have a decent tiled (row-by-row) processing 
structure worked up.  Couple of things -- first, I was mistakenly 
thinking all raster formats can even really support line-by-line 
writing, which is not neccessarily true (consider the various compressed 
image formats).  Let's assume that the user just wants a flat-binary 
type file, either ENVI or ESRI format.  We can use writebin and one of 
the writeGDAL(...,drivername='EHdr',...) or similar "header-only" write 
commands:

elev is a DEM, but it can be any raster format GDAL can read.

***

library(rgdal)
infile='elev'
outfile_base='testout'
outfile_ext='.bil'
outfile=paste(outfile_base,outfile_ext,sep='')
outcon <- file(outfile, "wb")

infile_info=GDALinfo(infile)
nl=infile_info[[1]]
ns=infile_info[[2]]

for (row in 1:nl) {
    templine <- readGDAL(infile,region.dim=c(1,ns),offset=c(row-1,0))
    writeBin(templine[[1]], outcon,size=4)
}
close(outcon)
writeGDAL(templine,outfile_base,drivername='EHdr',type="Float32")

***

Right now, this ALMOST works except, as you can see from the final line 
that the output header will incorrectly set the number of lines in the 
output to 1 (because I'm only reading one line at a time).  I'm using 
templine purely as a way to carry over the header info, it won't write 
any actual data out, as far as I know (or will it?)  How do I modify the 
"metadata" of the templine to reflect the correct "header" info (e.g. 
set the number of rows back to the total number of rows in the image, 
and reset the geographic position correctly.

--j

Alexander Brenning wrote:
> Hi,
> 
> maybe the RSAGA package has the solution to your problem; there are 
> actually two ways of applying functions to grids in RSAGA, (1) by 
> row-by-row processing (special form of tiles), or (2) using the SAGA 
> binaries.
> 
> 1) local.function
> 
> local.function and focal.function are very flexible tools, but they are 
> slow because they are written in R. They work with ASCII grids (see e.g. 
> write.ascii.grid in RSAGA, or the appropriate GDAL export functions that 
> work with your GeoTIFFs).
> 
> In your case:
> 
> local.function("ingrid", varnames = "outgrid",
>      fun = function(x) x + 1000)
> 
> (or use focal.function with radius = 0). This will of course work with 
> much more general R functions (even with predict methods if you look at 
> grid.predict and multi.focal.function).
> 
> The RSAGA package technically depends on Windows (because most of its 
> functions use SAGA GIS Windows binaries) but the local.function and 
> focal.function are (supposed to be) platform-independent; I can provide 
> you with the source code if you work on a non-Windows system and can't 
> get it from CRAN.
> 
> 2) rsaga.grid.calculus
> 
> Another approach, which involves SAGA itself and therefore (currently) 
> depends on Windows, uses SAGA's grid calculator. SAGA is able to process 
> large grids efficiently. E.g.:
> 
> # first convert the ASCII grid to a SAGA grid (.sgrd):
> rsaga.esri.to.sgrd("ingrid")
> rsaga.grid.calculus("ingrid", "outgrid", formula = "a+1000")
>   # 'ingrid' is treated as 'a' in the formula
> rsaga.sgrd.to.esri("outgrid", prec = 2)
> 
> I hope this helps...
> 
> Cheers
>  Alex
> 
> 
> Jonathan Greenberg wrote:
>> I've recently got back into using R to perform spatial analyses, and 
>> I'm trying to figure out how to perform "true" tiled processing, e.g. 
>> controlled reading of subsets of an input file, performing a function 
>> on this subset, and writing the output, subset by subset, to an output 
>> file and, finally, setting up the appropriate "header" info (the 
>> metadata).
>>
>> Using suggestions Tim Keitt and Roger Bivand gave me some years back, 
>> I wrote this simple code (all it should do is take an elevation 
>> geotiff and add 1000 to the elevations, writing the output):
>>
>> ***
>>
>> library(rgdal)
>> ds1 <- GDAL.open("elev.tif")
>> driver <- new('GDALDriver', 'GTiff')
>> ds2 <- new('GDALTransientDataset', driver, nrow(ds1), ncol(ds1), type 
>> = "Float32")
>>  for (row in nrow(ds1)) {
>>    row
>>    x <- getRasterData(ds1, offset = c(row - 1, 0), region = c(1, 
>> ncol(ds1)))
>>    elevnew <- x+1000
>>    putRasterData(ds2, elevnew, offset = c(row - 1, 0))
>>
>>  }
>> saveDataset(ds2, "out.tif")
>> closeDataset(ds2)
>> closeDataset(ds1)
>>
>> ***
>>
>> Couple of issues/questions:
>> 1) The code doesn't actually seem to work -- I get a tiny, unreadable 
>> TIFF out of the back of this algorithm -- what is wrong?
>> 2) I'm actually unclear about exactly what this is doing -- am I 
>> really just creating a large in-memory matrix (ds2) and then writing 
>> the entire output image out to disk, or is this somehow writing 
>> line-by-line?  If the former is true, how do I modify this to write 
>> within the loop, rather than all-at-once?
>>
>> Thanks!
>>
>> --j
>>
>>
> 

-- 

Jonathan A. Greenberg, PhD
Postdoctoral Scholar
Center for Spatial Technologies and Remote Sensing (CSTARS)
University of California, Davis
One Shields Avenue
The Barn, Room 250N
Davis, CA 95616
Cell: 415-794-5043
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307