[R-sig-Geo] How to calculate climatology in rasterbricks

Robert J. Hijmans r.hijmans at gmail.com
Wed Jun 8 21:44:49 CEST 2016


This (zapply passing ellipses arguments to stackApply) was fixed, I
think, in the development version. You can try it if you want:

install.packages("raster", repos="http://R-Forge.R-project.org")


On Sat, Jun 4, 2016 at 1:59 PM, Thiago V. dos Santos via R-sig-Geo
<r-sig-geo at r-project.org> wrote:
> Thanks again Loïc and Vijay for helping to identify the issue.
>
> Hopefully Robert or any of the contributors will see this question and kindly suggest a workaround.
>  Cheers,
>  -- Thiago V. dos Santos
>
> PhD student
> Land and Atmospheric Science
> University of Minnesota
>
>
>
> On Saturday, June 4, 2016 10:30 AM, Vijay Lulla <vijaylulla at gmail.com> wrote:
> Yes, it is because of na.rm argument.  It's stackApply that defaults
> na.rm=TRUE not zApply.  You can check that from the following
> interaction:
>
>> args(mean.default)
> function (x, trim = 0, na.rm = FALSE, ...)
> NULL
>> args(sum)
> function (..., na.rm = FALSE)
> NULL
>> args(stackApply)
> function (x, indices, fun, filename = "", na.rm = TRUE, ...)
> NULL
>> args(zApply)
> function (x, by, fun = mean, name = "", ...)
> NULL
>>
>
> It is surprising that stackApply defaults differently than most of the
> R functions but it kinda makes sense too.  I suspect the defaults are
> what they are because in remote sensing related image procesing work
> NAs get introduced, especially when doing projection transformation
> and this can trip up many users and the authors might've decided to be
> helpful to them.  I don't know...I'm just speculating.
>
> HTH,
> Vijay.
>
>
> On Sat, Jun 4, 2016 at 9:48 AM, Loïc Dutrieux <loic.dutrieux at wur.nl> wrote:
>> Hi Thiago,
>>
>> Yes, I can reproduce.
>> It's probably because zApply() defaults to na.rm = TRUE (in fact the na.rm =
>> argument is ignored if you pass it), and
>>
>> sum(c(NA, NA), na.rm = TRUE)
>> [1] 0 (I was surprised by that).
>>
>> Perhaps the stackApply() call in the zApply() function definition is missing
>> an ellipsis.
>>
>> Cheers,
>> Loïc
>>
>>
>> On 06/03/2016 10:25 PM, Thiago V. dos Santos wrote:
>>>
>>> Dear Vijay and Loïc,
>>>
>>>
>>> I have identified some artifacts in the results when using zApply on my
>>> data. To reproduce it, please download this shapefile
>>> (https://www.dropbox.com/s/in2z10mlerr2sfu/southern.zip?dl=0) and run the
>>> code below (see the comments after the plot commands):
>>>
>>> -----------------------
>>> library(raster)
>>> library(zoo)
>>>
>>> # Create the date sequence
>>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>>
>>> # Create raster stack and assign dates
>>> r <- raster(ncol=24, nrow=21, xmn=-58, xmx=-47.5, ymn=-34, ymx=-22,
>>> resolution=0.5)
>>> s <- stack(lapply(1:length(idx), function(x) setValues(r,
>>> runif(ncell(r)))))
>>> s <- setZ(s, idx)
>>>
>>> # Load shapefile and crop data
>>> shp <- shapefile('~/Downloads/southern.shp', warnPRJ=F)
>>> s.c <- crop(s, extent(shp))
>>> s.c <- mask(s.c, shp)
>>> plot(s.c,1)
>>> plot(shp,add=T) # Looks good
>>>
>>> # Loïc's approach using zApply
>>> sYM <- zApply(s.c, by = as.yearmon, sum)
>>> sM <- zApply(sYM, by = months, mean)
>>> plot(sYM,1)  # Note that the "empty" space across the extent was filled
>>> with 0's
>>> plot(sM,1)  # The same "gray area" here
>>>
>>> # Vijay's approach using calc
>>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>>> idxM <- unique(idxYM)%%100
>>> meanYM <- calc(s.c, fun=function(x) { by(x, idxYM, sum) })
>>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>>> plot(meanYM,1)  # Note that no 0 was added across the extent
>>> plot(meanM,1)  # This is OK too, no grey area
>>> -----------------------
>>>
>>> Basically, a portion of the raster extent was filled with 0's after using
>>> zApply. It doesn't happen when using calc.
>>>
>>> Any ideas on what can be causing this?
>>>
>>>   Greetings,
>>>   -- Thiago V. dos Santos
>>>
>>> PhD student
>>> Land and Atmospheric Science
>>> University of Minnesota
>>>
>>>
>>>
>>> On Friday, June 3, 2016 1:47 PM, Vijay Lulla <vijaylulla at gmail.com> wrote:
>>> Looking at Loïc's answer and from reading ?zApply I learned of
>>> stackApply, which is used internally by zApply.  While zApply is for
>>> time series of layers, stackApply is more general and can be used for
>>> applying functions to groups of layers in a stack/brick.  It is very
>>> similar to base R's `tapply`.  And it is also fast!
>>>
>>>> system.time(meanYM1 <- stackApply(s, idxYM, mean))
>>>
>>>     user  system elapsed
>>>     0.45    0.03    0.48
>>>>
>>>> system.time(meanYM <- calc(s, fun=function(x) { by(x, idxYM, mean)}))
>>>
>>>     user  system elapsed
>>>    17.50    0.01   17.61
>>>>
>>>> all(meanYM[] == meanYM1[])
>>>
>>> [1] TRUE
>>>
>>> Thanks Loïc for pointing out zApply.
>>>
>>>
>>> On Fri, Jun 3, 2016 at 11:30 AM, Thiago V. dos Santos via R-sig-Geo
>>> <r-sig-geo at r-project.org> wrote:
>>>>
>>>> Cool Loïc, thanks for showing one more option.
>>>>
>>>> By the way, in this case zApply is surprisingly faster than calc:
>>>>
>>>>> system.time(meanYM <- calc(s,fun=function(x) { by(x, idxYM, sum) }))
>>>>
>>>> user  system elapsed
>>>> 21.700   0.248  22.095
>>>>
>>>>
>>>>> system.time(sYM <- zApply(s, by = as.yearmon, sum))
>>>>
>>>> user  system elapsed
>>>> 0.811   0.047   0.866
>>>>
>>>>   Cheers,
>>>>   -- Thiago V. dos Santos
>>>>
>>>> PhD student
>>>> Land and Atmospheric Science
>>>> University of Minnesota
>>>>
>>>>
>>>>
>>>> On Friday, June 3, 2016 4:25 AM, Loïc Dutrieux <loic.dutrieux at wur.nl>
>>>> wrote:
>>>> This can also be done with zApply:
>>>>
>>>> library(zoo)
>>>>
>>>> sYM <- zApply(s, by = as.yearmon, sum)
>>>> sM <- zApply(sYM, by = months, mean)
>>>>
>>>> Cheers,
>>>> Loïc
>>>>
>>>> On 06/03/2016 02:02 AM, Vijay Lulla wrote:
>>>>>
>>>>> I think the following StackOverflow question has the answer:
>>>>>
>>>>> http://stackoverflow.com/questions/16135877/applying-a-function-to-a-multidimensional-array-with-grouping-variable/16136775#16136775
>>>>>
>>>>> Following the instructions listed on that page for your case might go
>>>>> something like below:
>>>>>
>>>>>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>>>>>> idxM <- unique(idxYM)%%100
>>>>>> meanYM <- calc(s,fun=function(x) { by(x, idxYM, mean) })
>>>>>> meanYM
>>>>>
>>>>> class       : RasterBrick
>>>>> dimensions  : 20, 20, 400, 360  (nrow, ncol, ncell, nlayers)
>>>>> resolution  : 18, 9  (x, y)
>>>>> extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
>>>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>>>> data source : in memory
>>>>> names       : X196101, X196102, X196103, X196104, X196105, X196106,
>>>>> X196107, X196108, X196109, X196110, X196111, X196112, X196201,
>>>>> X196202, X196203, ...
>>>>> min values  :  0.3728,  0.2725,  0.3421,  0.3652,  0.3342,  0.3185,
>>>>> 0.3130,  0.3780,  0.3376,  0.3727,  0.3537,  0.3737,  0.3515,  0.3588,
>>>>>    0.3334, ...
>>>>> max values  :  0.6399,  0.6652,  0.6583,  0.6640,  0.6359,  0.6761,
>>>>> 0.6442,  0.6800,  0.6397,  0.6769,  0.6489,  0.6388,  0.6471,  0.6661,
>>>>>    0.6255, ...
>>>>>
>>>>>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>>>>>> meanM
>>>>>
>>>>> class       : RasterBrick
>>>>> dimensions  : 20, 20, 400, 12  (nrow, ncol, ncell, nlayers)
>>>>> resolution  : 18, 9  (x, y)
>>>>> extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
>>>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>>>> data source : in memory
>>>>> names       :     X1,     X2,     X3,     X4,     X5,     X6,     X7,
>>>>>      X8,     X9,    X10,    X11,    X12
>>>>> min values  : 0.4645, 0.4715, 0.4768, 0.4717, 0.4749, 0.4705, 0.4697,
>>>>> 0.4724, 0.4629, 0.4774, 0.4736, 0.4708
>>>>> max values  : 0.5274, 0.5275, 0.5293, 0.5259, 0.5285, 0.5276, 0.5269,
>>>>> 0.5260, 0.5256, 0.5281, 0.5279, 0.5286
>>>>>
>>>>>>
>>>>>
>>>>> I'm not sure how [in]efficient this is for actual (i.e. not toy
>>>>> example) data.  Maybe others more experienced, and knowledgeable,
>>>>> members can provide better answers.
>>>>>
>>>>> HTH,
>>>>> Vijay.
>>>>>
>>>>> On Thu, Jun 2, 2016 at 4:30 PM, Thiago V. dos Santos via R-sig-Geo
>>>>> <r-sig-geo at r-project.org> wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I am working with daily time series of meteorological variables. This
>>>>>> is an example of the dataset:
>>>>>>
>>>>>> library(raster)
>>>>>>
>>>>>> # Create date sequence
>>>>>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>>>>>
>>>>>> # Create raster stack and assign dates
>>>>>> r <- raster(ncol=20, nrow=20)
>>>>>> s <- stack(lapply(1:length(idx), function(x) setValues(r,
>>>>>> runif(ncell(r)))))
>>>>>> s <- setZ(s, idx)
>>>>>>
>>>>>>
>>>>>> Now, let's assume those values represent daily precipitation. What I
>>>>>> need to do is to integrate daily to monthly values,
>>>>>> and then take a monthly climatology. Climatology in this case means
>>>>>> multi-year average of selected months, e.g., an average of the 30 Octobers
>>>>>> from 1961 to 1990, an average of the 30 Novembers from 1961 to 1990 and etc.
>>>>>>
>>>>>> On the other hand, let's assume the raster values represent daily
>>>>>> temperature. Integrating daily to monthly temperature doesn't make sense.
>>>>>> Hence, instead of integrating daily values, I need to take monthly means
>>>>>> (e.g. mean value of all days in every month), and then calculate the
>>>>>> climatology.
>>>>>>
>>>>>> What would be the best approach to achieve that using the raster
>>>>>> package?
>>>>>>
>>>>>>    Greetings,
>>>>>>    -- Thiago V. dos Santos
>>>>>>
>>>>>> PhD student
>>>>>> Land and Atmospheric Science
>>>>>> University of Minnesota
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list