[R-sig-Geo] Advice on raster workflow: subs -> resample

Fri Oct 29 19:58:59 CEST 2010

Dear Robert,

Many thanks for your advice.  I hadn't considered the differences in
resolutions, so am inserting that step (it works nicely).

Much appreciated.

Cheers, Lyndon

On Fri, Oct 29, 2010 at 12:52 PM, Robert J. Hijmans <r.hijmans at gmail.com> wrote:
> Dear Lyndon,
>
> That looks good to me, except that I do not think it is wise to
> resample when the resolutions are different by an order of magnitude.
> I think it is better to first use aggregate to get a similar
> resolution and then use resample (although I never formally compared
> these different strategies). As in the below.
>
> lt.q.test <- raster(paste(path.in, "lt_quin_test1.img", sep = ""))
> # Assign values from sum.stats (in y.mn column) to yld.test,
> outputting values as integers.
> setwd(path.out)
> yld.test = subs(lt.q.test, sum.stats[[1]], by="VALUE", which="y.mn",
> subsWithNA = TRUE, filename="yld.rst4.img", format='HFA',
> datatype="INT2S")
>
> # perhaps first aggregate
> agg <- aggregate(yld.test, 10, fun=mean, na.rm=TRUE)
>
> # Resample agg to resolution of modtest
> modtest <- raster("MOD_SA_NDVI_amp_albers_test.img")
> aggres <- resample(agg, modtest, method = "bilinear", filename =
> "yld.resamp.test.img", datatype = "INT2S", format = "HFA")  # This
> seems to work
>
>
> These are functions are not very fast, but I do not think there is a
> better alternative with 'raster'. Computers are patient, fortunately.
>
> R
>
>
> On Fri, Oct 29, 2010 at 7:21 AM, Lyndon Estes <lestes at princeton.edu> wrote:
>> Hello,
>>
>> I am writing to ask for advice as to whether I am following the most
>> efficient/fastest workflow for a task using the raster package. This
>> involves 3 datasets:
>>
>> 1. modtest: A ~927 m resolution satellite-derived image:
>> nrow        : 141
>> ncol        : 189
>> ncell       : 26649
>> min value   : ?
>> max value   : ?
>> projection  : +proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24
>> +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
>> +towgs84=0,0,0
>> xmin        : 415851.2
>> xmax        : 590983.4
>> ymin        : -2670093
>> ymax        : -2539439
>> xres        : 926.6254
>> yres        : 926.6254
>>
>> 2. lt.q.test: a ~92 m resolution grid representing identifiers for
>> unique climate-soil combinations:
>> nrow        : 1403
>> ncol        : 1878
>> ncell       : 2634834
>> min value   : ?
>> max value   : ?
>> projection  : +proj=aea +lat_1=-18 +lat_2=-32 +lat_0=0 +lon_0=24
>> +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs
>> +towgs84=0,0,0
>> xmin        : 415926.7
>> xmax        : 590091.6
>> ymin        : -2670093
>> ymax        : -2539979
>> xres        : 92.73961
>> yres        : 92.73961
>>
>> 3. sum.stats: A dataframe within a list with several variables
>> relating to outputs from a crop simulations, and one variable
>> containing values corresponding to the identifiers in yld.test.
>>
>> My task is to
>> 1. assign crop yield values from sum.stats to lt.q.test
>> 2. resample the result of 1 to the resolution of modtest, so that the
>> images are lined up, and so that the yield values in yld.test are
>> averaged/aggregated to modtest's resolution.
>>
>> The solution I have come up with so far is:
>>
>> library(raster)
>>
>> lt.q.test <- raster(paste(path.in, "lt_quin_test1.img", sep = ""))
>>
>> # Assign values from sum.stats (in y.mn column) to yld.test,
>> outputting values as integers.
>> setwd(path.out)
>> subs(lt.q.test, sum.stats[[1]], by = "VALUE", which = "y.mn",
>> subsWithNA = TRUE,
>>                 filename = "yld.rst4.img", format = 'HFA', datatype = "INT2S")
>>
>> # Resample yld.test to resolution of modtest
>> yld.test <- raster("yld.rst4.img")
>> modtest <- raster("MOD_SA_NDVI_amp_albers_test.img")
>> resample(yld.test, modtest, method = "bilinear", filename =
>> "yld.resamp.test.img", datatype = "INT2S",
>>         format = "HFA")  # This seems to work
>>
>> The above is performed on fairly small test datasets, and I must now
>> apply this approach multiple times to the full size versions of the
>> input datasets, where the dimensions of lt.q.test are:
>>
>> nrow        : 15170
>> ncol        : 17364
>> ncell       : 263411880
>>
>> and modtest's dimensions are:
>>
>> nrow        : 1651
>> ncol        : 1937
>> ncell       : 3197987
>>
>> I was thus wondering if there is a more efficient approach that I
>> should adopt, or if there are any other
>> concerns of which I should be aware?
>>
>> Thanks in advance for any advice.
>>
>> Cheers, Lyndon
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>

-- 
Lyndon Estes
Research Associate
Woodrow Wilson School
Princeton University
+1-609-258-2392 (o)
+1-609-258-6082 (f)
+1-202-431-0496 (m)
lestes at princeton.edu