[R-sig-Geo] How to calculate climatology in rasterbricks

Vijay Lulla vijaylulla at gmail.com
Sat Jun 4 17:30:15 CEST 2016


Yes, it is because of na.rm argument.  It's stackApply that defaults
na.rm=TRUE not zApply.  You can check that from the following
interaction:

> args(mean.default)
function (x, trim = 0, na.rm = FALSE, ...)
NULL
> args(sum)
function (..., na.rm = FALSE)
NULL
> args(stackApply)
function (x, indices, fun, filename = "", na.rm = TRUE, ...)
NULL
> args(zApply)
function (x, by, fun = mean, name = "", ...)
NULL
>

It is surprising that stackApply defaults differently than most of the
R functions but it kinda makes sense too.  I suspect the defaults are
what they are because in remote sensing related image procesing work
NAs get introduced, especially when doing projection transformation
and this can trip up many users and the authors might've decided to be
helpful to them.  I don't know...I'm just speculating.

HTH,
Vijay.

On Sat, Jun 4, 2016 at 9:48 AM, Loïc Dutrieux <loic.dutrieux at wur.nl> wrote:
> Hi Thiago,
>
> Yes, I can reproduce.
> It's probably because zApply() defaults to na.rm = TRUE (in fact the na.rm =
> argument is ignored if you pass it), and
>
> sum(c(NA, NA), na.rm = TRUE)
> [1] 0 (I was surprised by that).
>
> Perhaps the stackApply() call in the zApply() function definition is missing
> an ellipsis.
>
> Cheers,
> Loïc
>
>
> On 06/03/2016 10:25 PM, Thiago V. dos Santos wrote:
>>
>> Dear Vijay and Loïc,
>>
>>
>> I have identified some artifacts in the results when using zApply on my
>> data. To reproduce it, please download this shapefile
>> (https://www.dropbox.com/s/in2z10mlerr2sfu/southern.zip?dl=0) and run the
>> code below (see the comments after the plot commands):
>>
>> -----------------------
>> library(raster)
>> library(zoo)
>>
>> # Create the date sequence
>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>
>> # Create raster stack and assign dates
>> r <- raster(ncol=24, nrow=21, xmn=-58, xmx=-47.5, ymn=-34, ymx=-22,
>> resolution=0.5)
>> s <- stack(lapply(1:length(idx), function(x) setValues(r,
>> runif(ncell(r)))))
>> s <- setZ(s, idx)
>>
>> # Load shapefile and crop data
>> shp <- shapefile('~/Downloads/southern.shp', warnPRJ=F)
>> s.c <- crop(s, extent(shp))
>> s.c <- mask(s.c, shp)
>> plot(s.c,1)
>> plot(shp,add=T) # Looks good
>>
>> # Loïc's approach using zApply
>> sYM <- zApply(s.c, by = as.yearmon, sum)
>> sM <- zApply(sYM, by = months, mean)
>> plot(sYM,1)  # Note that the "empty" space across the extent was filled
>> with 0's
>> plot(sM,1)  # The same "gray area" here
>>
>> # Vijay's approach using calc
>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>> idxM <- unique(idxYM)%%100
>> meanYM <- calc(s.c, fun=function(x) { by(x, idxYM, sum) })
>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>> plot(meanYM,1)  # Note that no 0 was added across the extent
>> plot(meanM,1)  # This is OK too, no grey area
>> -----------------------
>>
>> Basically, a portion of the raster extent was filled with 0's after using
>> zApply. It doesn't happen when using calc.
>>
>> Any ideas on what can be causing this?
>>
>>   Greetings,
>>   -- Thiago V. dos Santos
>>
>> PhD student
>> Land and Atmospheric Science
>> University of Minnesota
>>
>>
>>
>> On Friday, June 3, 2016 1:47 PM, Vijay Lulla <vijaylulla at gmail.com> wrote:
>> Looking at Loïc's answer and from reading ?zApply I learned of
>> stackApply, which is used internally by zApply.  While zApply is for
>> time series of layers, stackApply is more general and can be used for
>> applying functions to groups of layers in a stack/brick.  It is very
>> similar to base R's `tapply`.  And it is also fast!
>>
>>> system.time(meanYM1 <- stackApply(s, idxYM, mean))
>>
>>     user  system elapsed
>>     0.45    0.03    0.48
>>>
>>> system.time(meanYM <- calc(s, fun=function(x) { by(x, idxYM, mean)}))
>>
>>     user  system elapsed
>>    17.50    0.01   17.61
>>>
>>> all(meanYM[] == meanYM1[])
>>
>> [1] TRUE
>>
>> Thanks Loïc for pointing out zApply.
>>
>>
>> On Fri, Jun 3, 2016 at 11:30 AM, Thiago V. dos Santos via R-sig-Geo
>> <r-sig-geo at r-project.org> wrote:
>>>
>>> Cool Loïc, thanks for showing one more option.
>>>
>>> By the way, in this case zApply is surprisingly faster than calc:
>>>
>>>> system.time(meanYM <- calc(s,fun=function(x) { by(x, idxYM, sum) }))
>>>
>>> user  system elapsed
>>> 21.700   0.248  22.095
>>>
>>>
>>>> system.time(sYM <- zApply(s, by = as.yearmon, sum))
>>>
>>> user  system elapsed
>>> 0.811   0.047   0.866
>>>
>>>   Cheers,
>>>   -- Thiago V. dos Santos
>>>
>>> PhD student
>>> Land and Atmospheric Science
>>> University of Minnesota
>>>
>>>
>>>
>>> On Friday, June 3, 2016 4:25 AM, Loïc Dutrieux <loic.dutrieux at wur.nl>
>>> wrote:
>>> This can also be done with zApply:
>>>
>>> library(zoo)
>>>
>>> sYM <- zApply(s, by = as.yearmon, sum)
>>> sM <- zApply(sYM, by = months, mean)
>>>
>>> Cheers,
>>> Loïc
>>>
>>> On 06/03/2016 02:02 AM, Vijay Lulla wrote:
>>>>
>>>> I think the following StackOverflow question has the answer:
>>>>
>>>> http://stackoverflow.com/questions/16135877/applying-a-function-to-a-multidimensional-array-with-grouping-variable/16136775#16136775
>>>>
>>>> Following the instructions listed on that page for your case might go
>>>> something like below:
>>>>
>>>>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>>>>> idxM <- unique(idxYM)%%100
>>>>> meanYM <- calc(s,fun=function(x) { by(x, idxYM, mean) })
>>>>> meanYM
>>>>
>>>> class       : RasterBrick
>>>> dimensions  : 20, 20, 400, 360  (nrow, ncol, ncell, nlayers)
>>>> resolution  : 18, 9  (x, y)
>>>> extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
>>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>>> data source : in memory
>>>> names       : X196101, X196102, X196103, X196104, X196105, X196106,
>>>> X196107, X196108, X196109, X196110, X196111, X196112, X196201,
>>>> X196202, X196203, ...
>>>> min values  :  0.3728,  0.2725,  0.3421,  0.3652,  0.3342,  0.3185,
>>>> 0.3130,  0.3780,  0.3376,  0.3727,  0.3537,  0.3737,  0.3515,  0.3588,
>>>>    0.3334, ...
>>>> max values  :  0.6399,  0.6652,  0.6583,  0.6640,  0.6359,  0.6761,
>>>> 0.6442,  0.6800,  0.6397,  0.6769,  0.6489,  0.6388,  0.6471,  0.6661,
>>>>    0.6255, ...
>>>>
>>>>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>>>>> meanM
>>>>
>>>> class       : RasterBrick
>>>> dimensions  : 20, 20, 400, 12  (nrow, ncol, ncell, nlayers)
>>>> resolution  : 18, 9  (x, y)
>>>> extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
>>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>>> data source : in memory
>>>> names       :     X1,     X2,     X3,     X4,     X5,     X6,     X7,
>>>>      X8,     X9,    X10,    X11,    X12
>>>> min values  : 0.4645, 0.4715, 0.4768, 0.4717, 0.4749, 0.4705, 0.4697,
>>>> 0.4724, 0.4629, 0.4774, 0.4736, 0.4708
>>>> max values  : 0.5274, 0.5275, 0.5293, 0.5259, 0.5285, 0.5276, 0.5269,
>>>> 0.5260, 0.5256, 0.5281, 0.5279, 0.5286
>>>>
>>>>>
>>>>
>>>> I'm not sure how [in]efficient this is for actual (i.e. not toy
>>>> example) data.  Maybe others more experienced, and knowledgeable,
>>>> members can provide better answers.
>>>>
>>>> HTH,
>>>> Vijay.
>>>>
>>>> On Thu, Jun 2, 2016 at 4:30 PM, Thiago V. dos Santos via R-sig-Geo
>>>> <r-sig-geo at r-project.org> wrote:
>>>>>
>>>>> Dear all,
>>>>>
>>>>> I am working with daily time series of meteorological variables. This
>>>>> is an example of the dataset:
>>>>>
>>>>> library(raster)
>>>>>
>>>>> # Create date sequence
>>>>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>>>>
>>>>> # Create raster stack and assign dates
>>>>> r <- raster(ncol=20, nrow=20)
>>>>> s <- stack(lapply(1:length(idx), function(x) setValues(r,
>>>>> runif(ncell(r)))))
>>>>> s <- setZ(s, idx)
>>>>>
>>>>>
>>>>> Now, let's assume those values represent daily precipitation. What I
>>>>> need to do is to integrate daily to monthly values,
>>>>> and then take a monthly climatology. Climatology in this case means
>>>>> multi-year average of selected months, e.g., an average of the 30 Octobers
>>>>> from 1961 to 1990, an average of the 30 Novembers from 1961 to 1990 and etc.
>>>>>
>>>>> On the other hand, let's assume the raster values represent daily
>>>>> temperature. Integrating daily to monthly temperature doesn't make sense.
>>>>> Hence, instead of integrating daily values, I need to take monthly means
>>>>> (e.g. mean value of all days in every month), and then calculate the
>>>>> climatology.
>>>>>
>>>>> What would be the best approach to achieve that using the raster
>>>>> package?
>>>>>
>>>>>    Greetings,
>>>>>    -- Thiago V. dos Santos
>>>>>
>>>>> PhD student
>>>>> Land and Atmospheric Science
>>>>> University of Minnesota
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>>>
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list