[R-sig-Geo] raster: stackApply problems..

Vijay Lulla v|j@y|u||@ @end|ng |rom gm@||@com
Wed Nov 27 23:22:19 CET 2019


Very nice!

And, just fyi: if you wish to get stackapply_mean to have layers in same
order as ver_mean then you can use  stackapply_mean <-
stack(stackapply_mean, layers=order(names(stackapply_mean)))

On Wed, Nov 27, 2019 at 12:43 PM Leonidas Liakos <leonidas_liakos using yahoo.gr>
wrote:

> Thank you for your help!
>
> I tried to fix stackApply according to your instructions.
>
> Now the indices of names are the same and consistent with indices
> enumeration (gist for validation and tests:
> https://gist.github.com/kokkytos/93f315a5ecf59c0b183f9788754bc170).
>
> I've attached a patch file here:
>
> https://gist.github.com/kokkytos/ca2c319134677b19900579665267a7a7
> > stackapply_mean
> class      : RasterBrick
> dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
> resolution : 500, 500  (x, y)
> extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
> crs        : NA
> source     : /tmp/Rtmp9W8UNc/raster/r_tmp_2019-11-27_191205_2929_20324.grd
> names      :  index_5,  index_6,  index_7,  index_1,  index_2,  index_3,
> index_4
> min values : 444.6946, 440.2028, 429.6900, 442.7436, 440.0467, 444.9182,
> 437.1589
> max values : 565.8671, 560.1375, 561.7972, 556.2471, 563.8341, 561.7687,
> 560.4509
>
> > ver_mean
> class      : RasterStack
> dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
> resolution : 500, 500  (x, y)
> extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
> crs        : NA
> names      :  layer.1,  layer.2,  layer.3,  layer.4,  layer.5,  layer.6,
> layer.7
> min values : 442.7436, 440.0467, 444.9182, 437.1589, 444.6946, 440.2028,
> 429.6900
> max values : 556.2471, 563.8341, 561.7687, 560.4509, 565.8671, 560.1375,
> 561.7972
>
>
>
>
>
> On 11/26/19 10:58 PM, Vijay Lulla wrote:
>
> Hmm...it appears that stackApply is using different conditions which might
> be causing this problem. Below is the snippet of the code which I think
> might be the problem.
>
> ## For canProcessInMemory
> if (rowcalc) {
>   v <- lapply(uin, function(i) fun(x[, ind == i, drop = FALSE], na.rm =
> na.rm))
> }
> else {
>   v <- lapply(uin, function(i, ...) apply(x[, ind == i, drop = FALSE], 1,
> fun, na.rm = na.rm))
> }
>
>
> ## If canProcessInMemory is not TRUE
> if (rowcalc) {
>   v <- lapply(uin, function(i) fun(a[, ind == uin[i], drop = FALSE], na.rm
> = na.rm))
> }
> else {
>   v <- lapply(uin, function(i, ...) apply(a[, ind == uin[i], drop =
> FALSE], 1, fun, na.rm = na.rm))
> }
>
> I think they should both be same but it appears that they aren't and
> that's what you've discovered.  Maybe you can try fix(stackApply) to see if
> it really is the problem and can tell us what you find.  Anyways, good
> catch...and...sorry for wasting your time.
> Cordially,
> Vijay.
>
> On Tue, Nov 26, 2019 at 2:53 PM Leonidas Liakos <leonidas_liakos using yahoo.gr>
> wrote:
>
>> Thank you!
>> The problem is not with the resulting values but with the index mapping.
>> Values are correct in all three cases.
>>
>> As I wrote in a previous post in the thread (
>> https://stat.ethz.ch/pipermail/r-sig-geo/2019-November/027821.html) ,
>> stackApply behaves inconsistently depending on whether the exported stack
>> will remain in memory or it will be stored, due to its large size, on the
>> hard disk.
>>
>> In the first case the indices are identical to my function (ver_mean) and
>> the lubridate::wday indexing system (and are correct) while in the second
>> they are shuffled.
>>
>> So, while Sunday has index 1 and while in the first case (when the result
>> is in memory) stackApply returns the correct index, in the second case
>> (when the result is stored on the hard disk) it returns index_4! So how can
>> one be sure if index_1 corresponds to Sunday or another day using
>> stackApply since it sometimes enumerates it with index_1 and sometimes
>> index_4?
>>
>>
>> Try to run this example (when the resulting stack remains in memory) to
>> see that the indexes are identical (stackApply = ver_median =
>> lubridate::wday)
>> https://gist.github.com/kokkytos/5d554b5a725bb48d2189e2d1fa0e2206
>>
>> Thank you again
>> On 11/26/19 9:00 PM, Vijay Lulla wrote:
>>
>> I'm sorry for the miscommunication.  What I meant to say is that the
>> output from stackApply and zApply are the same (because zApply uses
>> stackApply internally) except the names.  The behavior of stackApply makes
>> sense because AFAIUI R doesn't automatically reorder vectors/indices that
>> it receives.  Your observation about inconsistent result with ver_mean is
>> very valid though!  And, that's what I meant with my comment that using
>> sapply with the explicit ordering that you want is the best way to control
>> what raster package will output.  In R the input order should be maintained
>> (this is the prime difference between SQL and R) but packages/tools do not
>> always adhere to this...so it's never clear how the output will be
>> ordered.  Sorry for the confusion.
>>
>>
>> On Tue, Nov 26, 2019 at 12:22 PM Leonidas Liakos <
>> leonidas_liakos using yahoo.gr> wrote:
>>
>>> Why do they seem logical since they do not match?
>>>
>>> Check for example index 1 (Sunday). The results are different for the
>>> three processes
>>>
>>> > stackapply_mean
>>> class      : RasterBrick
>>> dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
>>> resolution : 500, 500  (x, y)
>>> extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
>>> crs        : NA
>>> source     :
>>> /tmp/RtmpkRMXLb/raster/r_tmp_2019-11-26_191359_7710_20324.grd
>>> names      :  index_5,  index_6,  index_7,  index_1,  index_2,
>>> index_3,  index_4
>>> min values : 440.0467, 444.9182, 437.1589, 444.6946, 440.2028, 429.6900,
>>> 442.7436
>>> max values : 563.8341, 561.7687, 560.4509, 565.8671, 560.1375, 561.7972,
>>> 556.2471
>>>
>>>
>>> > ver_mean
>>> class      : RasterStack
>>> dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
>>> resolution : 500, 500  (x, y)
>>> extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
>>> crs        : NA
>>> names      :  layer.1,  layer.2,  layer.3,  layer.4,  layer.5,
>>> layer.6,  layer.7
>>> min values : 442.7436, 440.0467, 444.9182, 437.1589, 444.6946, 440.2028,
>>> 429.6900
>>> max values : 556.2471, 563.8341, 561.7687, 560.4509, 565.8671, 560.1375,
>>> 561.7972
>>>
>>>
>>> > z
>>> class      : RasterBrick
>>> dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
>>> resolution : 500, 500  (x, y)
>>> extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
>>> crs        : NA
>>> source     :
>>> /tmp/RtmpkRMXLb/raster/r_tmp_2019-11-26_191439_7710_04780.grd
>>> names      :       X1,       X2,       X3,       X4,       X5,
>>> X6,       X7
>>> min values : 440.0467, 444.9182, 437.1589, 444.6946, 440.2028, 429.6900,
>>> 442.7436
>>> max values : 563.8341, 561.7687, 560.4509, 565.8671, 560.1375, 561.7972,
>>> 556.2471
>>>            : 1, 2, 3, 4, 5, 6, 7
>>>
>>>
>>> On 11/26/19 7:03 PM, Vijay Lulla wrote:
>>>
>>> If you read the code/help for `stackApply` and `zApply` you'll see that
>>> the results that you obtain make sense (at least they seem
>>> sensible/reasonable to me).  IMO, if you want to control the ordering of
>>> your layers then just use sapply, like how you've used for ver_mean.  IMO,
>>> this is the only reliable (safe?), and quite a readable, way to accomplish
>>> what you're trying to do.
>>> Just my 2 cents.
>>> -- Vijay.
>>>
>>> On Tue, Nov 26, 2019 at 11:19 AM Leonidas Liakos via R-sig-Geo <
>>> r-sig-geo using r-project.org> wrote:
>>>
>>>> I added raster::zApply in my tests to validate the results. However, the
>>>> indices of the names of the results are different now. Recall that the
>>>> goal is to calculate from a raster stack time series the mean per day of
>>>> the week. And that problem I have is that stackApply, zApply and
>>>> calc/sapply return different indices in the result names. New code is
>>>> available here:
>>>> https://gist.github.com/kokkytos/93f315a5ecf59c0b183f9788754bc170
>>>> I'm really curious about missing something.
>>>>
>>>>
>>>> On 11/20/19 3:30 AM, Frederico Faleiro wrote:
>>>> > Hi Leonidas,
>>>> >
>>>> > both results are in the same order, but the name is different.
>>>> > You can rename the first as in the second:
>>>> > names(res) <- names(res2)
>>>> >
>>>> > I provided an example to help you understand the logic.
>>>> >
>>>> > library(raster)
>>>> > beginCluster(2)
>>>> > r <- raster()
>>>> > values(r) <- 1
>>>> > # simple sequential stack from 1 to 6 in all cells
>>>> > s <- stack(r, r*2, r*3, r*4, r*5, r*6)
>>>> > s
>>>> > res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun
>>>> > = mean))
>>>> > res
>>>> > res2 <- stackApply(s, c(2,2,3,3,1,1), mean)
>>>> > res2
>>>> > dif <- res - res2
>>>> > # exatly the same order because the difference is zero for all layers
>>>> > dif
>>>> > # rename
>>>> > names(res) <- names(res2)
>>>> >
>>>> > Best regards,
>>>> >
>>>> > Frederico Faleiro
>>>> >
>>>> > On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo
>>>> > <r-sig-geo using r-project.org <mailto:r-sig-geo using r-project.org>> wrote:
>>>> >
>>>> >     I run the example with clusterR:
>>>> >
>>>> >     no_cores <- parallel::detectCores() -1
>>>> >     raster::beginCluster(no_cores)
>>>> >     ?????? res <- raster::clusterR(inp, raster::stackApply, args =
>>>> >     list(indices=c(2,2,3,3,1,1),fun = mean))
>>>> >     raster::endCluster()
>>>> >
>>>> >     And the result is:
>>>> >
>>>> >     > res
>>>> >     class?????????? : RasterBrick
>>>> >     dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)
>>>> >     resolution : 1, 1?? (x, y)
>>>> >     extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)
>>>> >     crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84
>>>> >     +towgs84=0,0,0
>>>> >     source???????? : memory
>>>> >     names?????????? : layer.1, layer.2, layer.3
>>>> >     min values :???????? 1.5,???????? 3.5,???????? 5.5
>>>> >     max values :???????? 1.5,???????? 3.5,???????? 5.5??
>>>> >
>>>> >
>>>> >     layer.1, layer.2, layer.3 (?)
>>>> >
>>>> >     So what corrensponds to what?
>>>> >
>>>> >
>>>> >     If I run:
>>>> >
>>>> >     res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)
>>>> >
>>>> >     The result is:
>>>> >
>>>> >     > res2
>>>> >     class      : RasterBrick
>>>> >     dimensions : 180, 360, 64800, 3  (nrow, ncol, ncell, nlayers)
>>>> >     resolution : 1, 1  (x, y)
>>>> >     extent     : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
>>>> >     crs        : +proj=longlat +datum=WGS84 +ellps=WGS84
>>>> +towgs84=0,0,0
>>>> >     source     : memory
>>>> >     names      : index_2, index_3, index_1
>>>> >     min values :     1.5,     3.5,     5.5
>>>> >     max values :     1.5,     3.5,     5.5
>>>> >
>>>> >     There is no consistency with the names of the output and obscure
>>>> >     correspondence with the indices in the case of clusterR
>>>> >
>>>> >
>>>> >             [[alternative HTML version deleted]]
>>>> >
>>>> >     _______________________________________________
>>>> >     R-sig-Geo mailing list
>>>> >     R-sig-Geo using r-project.org <mailto:R-sig-Geo using r-project.org>
>>>> >     https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo using r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>>
>>>
>>
>

-- 
Vijay Lulla

Assistant Professor,
Dept. of Geography, IUPUI
425 University Blvd, CA-207C.
Indianapolis, IN-46202
vlulla using iupui.edu
ORCID: https://orcid.org/0000-0002-0823-2522
Online: http://vijaylulla.com | https://github.com/vlulla

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list