[R-sig-Geo] raster: stackApply problems..

Wed Nov 27 18:43:13 CET 2019

Thank you for your help!

I tried to fix stackApply according to your instructions.

Now the indices of names are the same and consistent with indices
enumeration (gist for validation and tests:
https://gist.github.com/kokkytos/93f315a5ecf59c0b183f9788754bc170).

I've attached a patch file here:

https://gist.github.com/kokkytos/ca2c319134677b19900579665267a7a7

> stackapply_mean
class      : RasterBrick
dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
resolution : 500, 500  (x, y)
extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
crs        : NA
source     : /tmp/Rtmp9W8UNc/raster/r_tmp_2019-11-27_191205_2929_20324.grd
names      :  index_5,  index_6,  index_7,  index_1,  index_2, 
index_3,  index_4
min values : 444.6946, 440.2028, 429.6900, 442.7436, 440.0467, 444.9182,
437.1589
max values : 565.8671, 560.1375, 561.7972, 556.2471, 563.8341, 561.7687,
560.4509

> ver_mean
class      : RasterStack
dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
resolution : 500, 500  (x, y)
extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
crs        : NA
names      :  layer.1,  layer.2,  layer.3,  layer.4,  layer.5, 
layer.6,  layer.7
min values : 442.7436, 440.0467, 444.9182, 437.1589, 444.6946, 440.2028,
429.6900
max values : 556.2471, 563.8341, 561.7687, 560.4509, 565.8671, 560.1375,
561.7972

On 11/26/19 10:58 PM, Vijay Lulla wrote:
> Hmm...it appears that stackApply is using different conditions which
> might be causing this problem. Below is the snippet of the code which
> I think might be the problem.
>
> ## For canProcessInMemory
> if (rowcalc) {
>   v <- lapply(uin, function(i) fun(x[, ind == i, drop = FALSE], na.rm
> = na.rm))
> }
> else {
>   v <- lapply(uin, function(i, ...) apply(x[, ind == i, drop = FALSE],
> 1, fun, na.rm = na.rm))
> }
>
>
> ## If canProcessInMemory is not TRUE
> if (rowcalc) {
>   v <- lapply(uin, function(i) fun(a[, ind == uin[i], drop = FALSE],
> na.rm = na.rm))
> }
> else {
>   v <- lapply(uin, function(i, ...) apply(a[, ind == uin[i], drop =
> FALSE], 1, fun, na.rm = na.rm))
> }
>
> I think they should both be same but it appears that they aren't and
> that's what you've discovered.  Maybe you can try fix(stackApply) to
> see if it really is the problem and can tell us what you find. 
> Anyways, good catch...and...sorry for wasting your time.
> Cordially,
> Vijay.
>
> On Tue, Nov 26, 2019 at 2:53 PM Leonidas Liakos
> <leonidas_liakos using yahoo.gr <mailto:leonidas_liakos using yahoo.gr>> wrote:
>
>     Thank you!
>     The problem is not with the resulting values but with the index
>     mapping. Values are correct in all three cases.
>
>     As I wrote in a previous post in the thread
>     (https://stat.ethz.ch/pipermail/r-sig-geo/2019-November/027821.html)
>     , stackApply behaves inconsistently depending on whether the
>     exported stack will remain in memory or it will be stored, due to
>     its large size, on the hard disk.
>
>     In the first case the indices are identical to my function
>     (ver_mean) and the lubridate::wday indexing system (and are
>     correct) while in the second they are shuffled.
>
>     So, while Sunday has index 1 and while in the first case (when the
>     result is in memory) stackApply returns the correct index, in the
>     second case (when the result is stored on the hard disk) it
>     returns index_4! So how can one be sure if index_1 corresponds to
>     Sunday or another day using stackApply since it sometimes
>     enumerates it with index_1 and sometimes index_4?
>
>
>     Try to run this example (when the resulting stack remains in
>     memory) to see that the indexes are identical (stackApply =
>     ver_median = lubridate::wday)
>     https://gist.github.com/kokkytos/5d554b5a725bb48d2189e2d1fa0e2206
>
>     Thank you again
>
>     On 11/26/19 9:00 PM, Vijay Lulla wrote:
>>     I'm sorry for the miscommunication.  What I meant to say is that
>>     the output from stackApply and zApply are the same (because
>>     zApply uses stackApply internally) except the names.  The
>>     behavior of stackApply makes sense because AFAIUI R doesn't
>>     automatically reorder vectors/indices that it receives.  Your
>>     observation about inconsistent result with ver_mean is very valid
>>     though!  And, that's what I meant with my comment that using
>>     sapply with the explicit ordering that you want is the best way
>>     to control what raster package will output.  In R the input order
>>     should be maintained (this is the prime difference between SQL
>>     and R) but packages/tools do not always adhere to this...so it's
>>     never clear how the output will be ordered.  Sorry for the confusion.
>>
>>
>>     On Tue, Nov 26, 2019 at 12:22 PM Leonidas Liakos
>>     <leonidas_liakos using yahoo.gr <mailto:leonidas_liakos using yahoo.gr>> wrote:
>>
>>         Why do they seem logical since they do not match?
>>
>>         Check for example index 1 (Sunday). The results are different
>>         for the three processes
>>
>>         > stackapply_mean
>>         class      : RasterBrick
>>         dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
>>         resolution : 500, 500  (x, y)
>>         extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
>>         crs        : NA
>>         source     :
>>         /tmp/RtmpkRMXLb/raster/r_tmp_2019-11-26_191359_7710_20324.grd
>>         names      :  index_5,  index_6,  index_7,  index_1, 
>>         index_2,  index_3,  index_4
>>         min values : 440.0467, 444.9182, 437.1589, 444.6946,
>>         440.2028, 429.6900, 442.7436
>>         max values : 563.8341, 561.7687, 560.4509, 565.8671,
>>         560.1375, 561.7972, 556.2471
>>
>>
>>         > ver_mean
>>         class      : RasterStack
>>         dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
>>         resolution : 500, 500  (x, y)
>>         extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
>>         crs        : NA
>>         names      :  layer.1,  layer.2,  layer.3,  layer.4, 
>>         layer.5,  layer.6,  layer.7
>>         min values : 442.7436, 440.0467, 444.9182, 437.1589,
>>         444.6946, 440.2028, 429.6900
>>         max values : 556.2471, 563.8341, 561.7687, 560.4509,
>>         565.8671, 560.1375, 561.7972
>>
>>
>>         > z
>>         class      : RasterBrick
>>         dimensions : 300, 300, 90000, 7  (nrow, ncol, ncell, nlayers)
>>         resolution : 500, 500  (x, y)
>>         extent     : 0, 150000, 0, 150000  (xmin, xmax, ymin, ymax)
>>         crs        : NA
>>         source     :
>>         /tmp/RtmpkRMXLb/raster/r_tmp_2019-11-26_191439_7710_04780.grd
>>         names      :       X1,       X2,       X3,       X4,      
>>         X5,       X6,       X7
>>         min values : 440.0467, 444.9182, 437.1589, 444.6946,
>>         440.2028, 429.6900, 442.7436
>>         max values : 563.8341, 561.7687, 560.4509, 565.8671,
>>         560.1375, 561.7972, 556.2471
>>                    : 1, 2, 3, 4, 5, 6, 7
>>
>>
>>         On 11/26/19 7:03 PM, Vijay Lulla wrote:
>>>         If you read the code/help for `stackApply` and `zApply`
>>>         you'll see that the results that you obtain make sense (at
>>>         least they seem sensible/reasonable to me).  IMO, if you
>>>         want to control the ordering of your layers then just use
>>>         sapply, like how you've used for ver_mean.  IMO, this is the
>>>         only reliable (safe?), and quite a readable, way to
>>>         accomplish what you're trying to do.
>>>         Just my 2 cents.
>>>         -- Vijay.
>>>
>>>         On Tue, Nov 26, 2019 at 11:19 AM Leonidas Liakos via
>>>         R-sig-Geo <r-sig-geo using r-project.org
>>>         <mailto:r-sig-geo using r-project.org>> wrote:
>>>
>>>             I added raster::zApply in my tests to validate the
>>>             results. However, the
>>>             indices of the names of the results are different now.
>>>             Recall that the
>>>             goal is to calculate from a raster stack time series the
>>>             mean per day of
>>>             the week. And that problem I have is that stackApply,
>>>             zApply and
>>>             calc/sapply return different indices in the result
>>>             names. New code is
>>>             available here:
>>>             https://gist.github.com/kokkytos/93f315a5ecf59c0b183f9788754bc170
>>>             I'm really curious about missing something.
>>>
>>>
>>>             On 11/20/19 3:30 AM, Frederico Faleiro wrote:
>>>             > Hi Leonidas,
>>>             >
>>>             > both results are in the same order, but the name is
>>>             different.
>>>             > You can rename the first as in the second:
>>>             > names(res) <- names(res2)
>>>             >
>>>             > I provided an example to help you understand the logic.
>>>             >
>>>             > library(raster)
>>>             > beginCluster(2)
>>>             > r <- raster()
>>>             > values(r) <- 1
>>>             > # simple sequential stack from 1 to 6 in all cells
>>>             > s <- stack(r, r*2, r*3, r*4, r*5, r*6)
>>>             > s
>>>             > res <- clusterR(s, stackApply, args =
>>>             list(indices=c(2,2,3,3,1,1), fun
>>>             > = mean))
>>>             > res
>>>             > res2 <- stackApply(s, c(2,2,3,3,1,1), mean)
>>>             > res2
>>>             > dif <- res - res2
>>>             > # exatly the same order because the difference is zero
>>>             for all layers
>>>             > dif
>>>             > # rename
>>>             > names(res) <- names(res2)
>>>             >
>>>             > Best regards,
>>>             >
>>>             > Frederico Faleiro
>>>             >
>>>             > On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via
>>>             R-sig-Geo
>>>             > <r-sig-geo using r-project.org
>>>             <mailto:r-sig-geo using r-project.org>
>>>             <mailto:r-sig-geo using r-project.org
>>>             <mailto:r-sig-geo using r-project.org>>> wrote:
>>>             >
>>>             >     I run the example with clusterR:
>>>             >
>>>             >     no_cores <- parallel::detectCores() -1
>>>             >     raster::beginCluster(no_cores)
>>>             >     ?????? res <- raster::clusterR(inp,
>>>             raster::stackApply, args =
>>>             >     list(indices=c(2,2,3,3,1,1),fun = mean))
>>>             >     raster::endCluster()
>>>             >
>>>             >     And the result is:
>>>             >
>>>             >     > res
>>>             >     class?????????? : RasterBrick
>>>             >     dimensions : 180, 360, 64800, 3?? (nrow, ncol,
>>>             ncell, nlayers)
>>>             >     resolution : 1, 1?? (x, y)
>>>             >     extent???????? : -180, 180, -90, 90?? (xmin, xmax,
>>>             ymin, ymax)
>>>             >     crs?????????????? : +proj=longlat +datum=WGS84
>>>             +ellps=WGS84
>>>             >     +towgs84=0,0,0
>>>             >     source???????? : memory
>>>             >     names?????????? : layer.1, layer.2, layer.3
>>>             >     min values :???????? 1.5,???????? 3.5,???????? 5.5
>>>             >     max values :???????? 1.5,???????? 3.5,???????? 5.5??
>>>             >
>>>             >
>>>             >     layer.1, layer.2, layer.3 (?)
>>>             >
>>>             >     So what corrensponds to what?
>>>             >
>>>             >
>>>             >     If I run:
>>>             >
>>>             >     res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)
>>>             >
>>>             >     The result is:
>>>             >
>>>             >     > res2
>>>             >     class      : RasterBrick
>>>             >     dimensions : 180, 360, 64800, 3  (nrow, ncol,
>>>             ncell, nlayers)
>>>             >     resolution : 1, 1  (x, y)
>>>             >     extent     : -180, 180, -90, 90  (xmin, xmax,
>>>             ymin, ymax)
>>>             >     crs        : +proj=longlat +datum=WGS84
>>>             +ellps=WGS84 +towgs84=0,0,0
>>>             >     source     : memory
>>>             >     names      : index_2, index_3, index_1
>>>             >     min values :     1.5,     3.5,     5.5
>>>             >     max values :     1.5,     3.5,     5.5
>>>             >
>>>             >     There is no consistency with the names of the
>>>             output and obscure
>>>             >     correspondence with the indices in the case of
>>>             clusterR
>>>             >
>>>             >
>>>             >             [[alternative HTML version deleted]]
>>>             >
>>>             >     _______________________________________________
>>>             >     R-sig-Geo mailing list
>>>             >     R-sig-Geo using r-project.org
>>>             <mailto:R-sig-Geo using r-project.org>
>>>             <mailto:R-sig-Geo using r-project.org
>>>             <mailto:R-sig-Geo using r-project.org>>
>>>             >     https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>             >
>>>
>>>                     [[alternative HTML version deleted]]
>>>
>>>             _______________________________________________
>>>             R-sig-Geo mailing list
>>>             R-sig-Geo using r-project.org <mailto:R-sig-Geo using r-project.org>
>>>             https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>>
>>
>

	[[alternative HTML version deleted]]