[R-sig-Geo] Making GIS and R play nicely - some tips?

Wed Mar 2 18:16:55 CET 2005

On Wed, 2 Mar 2005, abunn wrote:

> Great. Thanks Roger and other others that replied.
> 
> > But isn't the real challenge going to be shoehorning 250 by 1000 by 1000
> > by 8 into R at once - or are the data layers needed just one after the
> > other? If I could grasp the workflow better, the advice above could become
> > more focussed, I think.
> 
> I'm not positive on the workflow yet! That's one of the problems. I'm
> finding it tough to wrap my head around the data since it is in both time
> and space.
> 
> The data are not as huge as I thought at first. Each layer has 567 rows and
> 409 columns (231903 elements) but 137113 are no data. So, that makes the
> actual amount of data to crunch much more manageable at 94790 elements per
> layer.
> 
> The goal is to predict satellite reflectance as a function of climate. The
> response data are monthly satellite reflectance over 19 years (So, that's
> 228 images). The predictor data are interpolated climate data and there
> should be a minimum of three predictors and I'd like to use two more if they
> are reliable. So, assuming three predictors the data would only run to 500mb
> or so.
> 
> R > object.size(rep(runif(94790), times = 19*12*3)) / 2^20
> [1] 494.6622
> 
> And if I can get all five then I'd be under a gig still.
> 
> R > object.size(rep(runif(94790), times = 19*12*5)) / 2^20
> [1] 824.437
> 
> That's cumbersome but possible.
> 
> I've managed to read in a test example using erdas img files using rgdal
> (thanks Tim Keitt):
> 
> R > library(rgdal)
> R > junk <- list()
> R > try1 <- GDAL.open("test.img")
> R > getDriverLongName(getDriver(try1))
> [1] "Erdas Imagine Images (.img)"
> R > junk[[1]] <- c(getRasterData(try1))
> R > GDAL.close(try1)
> Closing GDAL dataset handle 00CACBF0...  destroyed ... done.
> R > try2 <- GDAL.open("test2.img")
> R > getDriverLongName(getDriver(try2))
> [1] "Erdas Imagine Images (.img)"
> R > junk[[2]] <- c(getRasterData(try2))
> R > GDAL.close(try2)
> Closing GDAL dataset handle 00C7F180...  destroyed ... done.
> R > str(junk)
> List of 2
>  $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
>  $ : int [1:231903] 0 0 0 0 0 0 0 0 0 0 ...
> 
> 
> So, to read all this in I can pipe a script to Arc to convert the grids to
> images and loop through all the data. Does that sound like a reasonable
> plan?

What are the volumes going back?

Another itch to scratch is how much more you will get in terms of
precision in the coefficient point estimates by actually forcing all this
data down lm() or whatever's throat? Would it be possible to reduce the
load by using subsetting in rgdal, given that you had the *.img and model
based files lined up (or just use locations for which you have observed
climate data)?  Won't the interpolation errors in the predictors feed
through? I think the Furlanello et al. paper may be very helpful (they use
randomForest on regression trees of climate/topography drivers).

Interesting problem.

Roger

> 
> Thanks for all the help, Andy
> 
> 
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no