[R-sig-Geo] How to calculate climatology in rasterbricks
Robert J. Hijmans
r.hijmans at gmail.com
Wed Jun 8 21:44:49 CEST 2016
This (zapply passing ellipses arguments to stackApply) was fixed, I
think, in the development version. You can try it if you want:
install.packages("raster", repos="http://R-Forge.R-project.org")
On Sat, Jun 4, 2016 at 1:59 PM, Thiago V. dos Santos via R-sig-Geo
<r-sig-geo at r-project.org> wrote:
> Thanks again Loïc and Vijay for helping to identify the issue.
>
> Hopefully Robert or any of the contributors will see this question and kindly suggest a workaround.
> Cheers,
> -- Thiago V. dos Santos
>
> PhD student
> Land and Atmospheric Science
> University of Minnesota
>
>
>
> On Saturday, June 4, 2016 10:30 AM, Vijay Lulla <vijaylulla at gmail.com> wrote:
> Yes, it is because of na.rm argument. It's stackApply that defaults
> na.rm=TRUE not zApply. You can check that from the following
> interaction:
>
>> args(mean.default)
> function (x, trim = 0, na.rm = FALSE, ...)
> NULL
>> args(sum)
> function (..., na.rm = FALSE)
> NULL
>> args(stackApply)
> function (x, indices, fun, filename = "", na.rm = TRUE, ...)
> NULL
>> args(zApply)
> function (x, by, fun = mean, name = "", ...)
> NULL
>>
>
> It is surprising that stackApply defaults differently than most of the
> R functions but it kinda makes sense too. I suspect the defaults are
> what they are because in remote sensing related image procesing work
> NAs get introduced, especially when doing projection transformation
> and this can trip up many users and the authors might've decided to be
> helpful to them. I don't know...I'm just speculating.
>
> HTH,
> Vijay.
>
>
> On Sat, Jun 4, 2016 at 9:48 AM, Loïc Dutrieux <loic.dutrieux at wur.nl> wrote:
>> Hi Thiago,
>>
>> Yes, I can reproduce.
>> It's probably because zApply() defaults to na.rm = TRUE (in fact the na.rm =
>> argument is ignored if you pass it), and
>>
>> sum(c(NA, NA), na.rm = TRUE)
>> [1] 0 (I was surprised by that).
>>
>> Perhaps the stackApply() call in the zApply() function definition is missing
>> an ellipsis.
>>
>> Cheers,
>> Loïc
>>
>>
>> On 06/03/2016 10:25 PM, Thiago V. dos Santos wrote:
>>>
>>> Dear Vijay and Loïc,
>>>
>>>
>>> I have identified some artifacts in the results when using zApply on my
>>> data. To reproduce it, please download this shapefile
>>> (https://www.dropbox.com/s/in2z10mlerr2sfu/southern.zip?dl=0) and run the
>>> code below (see the comments after the plot commands):
>>>
>>> -----------------------
>>> library(raster)
>>> library(zoo)
>>>
>>> # Create the date sequence
>>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>>
>>> # Create raster stack and assign dates
>>> r <- raster(ncol=24, nrow=21, xmn=-58, xmx=-47.5, ymn=-34, ymx=-22,
>>> resolution=0.5)
>>> s <- stack(lapply(1:length(idx), function(x) setValues(r,
>>> runif(ncell(r)))))
>>> s <- setZ(s, idx)
>>>
>>> # Load shapefile and crop data
>>> shp <- shapefile('~/Downloads/southern.shp', warnPRJ=F)
>>> s.c <- crop(s, extent(shp))
>>> s.c <- mask(s.c, shp)
>>> plot(s.c,1)
>>> plot(shp,add=T) # Looks good
>>>
>>> # Loïc's approach using zApply
>>> sYM <- zApply(s.c, by = as.yearmon, sum)
>>> sM <- zApply(sYM, by = months, mean)
>>> plot(sYM,1) # Note that the "empty" space across the extent was filled
>>> with 0's
>>> plot(sM,1) # The same "gray area" here
>>>
>>> # Vijay's approach using calc
>>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>>> idxM <- unique(idxYM)%%100
>>> meanYM <- calc(s.c, fun=function(x) { by(x, idxYM, sum) })
>>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>>> plot(meanYM,1) # Note that no 0 was added across the extent
>>> plot(meanM,1) # This is OK too, no grey area
>>> -----------------------
>>>
>>> Basically, a portion of the raster extent was filled with 0's after using
>>> zApply. It doesn't happen when using calc.
>>>
>>> Any ideas on what can be causing this?
>>>
>>> Greetings,
>>> -- Thiago V. dos Santos
>>>
>>> PhD student
>>> Land and Atmospheric Science
>>> University of Minnesota
>>>
>>>
>>>
>>> On Friday, June 3, 2016 1:47 PM, Vijay Lulla <vijaylulla at gmail.com> wrote:
>>> Looking at Loïc's answer and from reading ?zApply I learned of
>>> stackApply, which is used internally by zApply. While zApply is for
>>> time series of layers, stackApply is more general and can be used for
>>> applying functions to groups of layers in a stack/brick. It is very
>>> similar to base R's `tapply`. And it is also fast!
>>>
>>>> system.time(meanYM1 <- stackApply(s, idxYM, mean))
>>>
>>> user system elapsed
>>> 0.45 0.03 0.48
>>>>
>>>> system.time(meanYM <- calc(s, fun=function(x) { by(x, idxYM, mean)}))
>>>
>>> user system elapsed
>>> 17.50 0.01 17.61
>>>>
>>>> all(meanYM[] == meanYM1[])
>>>
>>> [1] TRUE
>>>
>>> Thanks Loïc for pointing out zApply.
>>>
>>>
>>> On Fri, Jun 3, 2016 at 11:30 AM, Thiago V. dos Santos via R-sig-Geo
>>> <r-sig-geo at r-project.org> wrote:
>>>>
>>>> Cool Loïc, thanks for showing one more option.
>>>>
>>>> By the way, in this case zApply is surprisingly faster than calc:
>>>>
>>>>> system.time(meanYM <- calc(s,fun=function(x) { by(x, idxYM, sum) }))
>>>>
>>>> user system elapsed
>>>> 21.700 0.248 22.095
>>>>
>>>>
>>>>> system.time(sYM <- zApply(s, by = as.yearmon, sum))
>>>>
>>>> user system elapsed
>>>> 0.811 0.047 0.866
>>>>
>>>> Cheers,
>>>> -- Thiago V. dos Santos
>>>>
>>>> PhD student
>>>> Land and Atmospheric Science
>>>> University of Minnesota
>>>>
>>>>
>>>>
>>>> On Friday, June 3, 2016 4:25 AM, Loïc Dutrieux <loic.dutrieux at wur.nl>
>>>> wrote:
>>>> This can also be done with zApply:
>>>>
>>>> library(zoo)
>>>>
>>>> sYM <- zApply(s, by = as.yearmon, sum)
>>>> sM <- zApply(sYM, by = months, mean)
>>>>
>>>> Cheers,
>>>> Loïc
>>>>
>>>> On 06/03/2016 02:02 AM, Vijay Lulla wrote:
>>>>>
>>>>> I think the following StackOverflow question has the answer:
>>>>>
>>>>> http://stackoverflow.com/questions/16135877/applying-a-function-to-a-multidimensional-array-with-grouping-variable/16136775#16136775
>>>>>
>>>>> Following the instructions listed on that page for your case might go
>>>>> something like below:
>>>>>
>>>>>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>>>>>> idxM <- unique(idxYM)%%100
>>>>>> meanYM <- calc(s,fun=function(x) { by(x, idxYM, mean) })
>>>>>> meanYM
>>>>>
>>>>> class : RasterBrick
>>>>> dimensions : 20, 20, 400, 360 (nrow, ncol, ncell, nlayers)
>>>>> resolution : 18, 9 (x, y)
>>>>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
>>>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>>>> data source : in memory
>>>>> names : X196101, X196102, X196103, X196104, X196105, X196106,
>>>>> X196107, X196108, X196109, X196110, X196111, X196112, X196201,
>>>>> X196202, X196203, ...
>>>>> min values : 0.3728, 0.2725, 0.3421, 0.3652, 0.3342, 0.3185,
>>>>> 0.3130, 0.3780, 0.3376, 0.3727, 0.3537, 0.3737, 0.3515, 0.3588,
>>>>> 0.3334, ...
>>>>> max values : 0.6399, 0.6652, 0.6583, 0.6640, 0.6359, 0.6761,
>>>>> 0.6442, 0.6800, 0.6397, 0.6769, 0.6489, 0.6388, 0.6471, 0.6661,
>>>>> 0.6255, ...
>>>>>
>>>>>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>>>>>> meanM
>>>>>
>>>>> class : RasterBrick
>>>>> dimensions : 20, 20, 400, 12 (nrow, ncol, ncell, nlayers)
>>>>> resolution : 18, 9 (x, y)
>>>>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
>>>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>>>> data source : in memory
>>>>> names : X1, X2, X3, X4, X5, X6, X7,
>>>>> X8, X9, X10, X11, X12
>>>>> min values : 0.4645, 0.4715, 0.4768, 0.4717, 0.4749, 0.4705, 0.4697,
>>>>> 0.4724, 0.4629, 0.4774, 0.4736, 0.4708
>>>>> max values : 0.5274, 0.5275, 0.5293, 0.5259, 0.5285, 0.5276, 0.5269,
>>>>> 0.5260, 0.5256, 0.5281, 0.5279, 0.5286
>>>>>
>>>>>>
>>>>>
>>>>> I'm not sure how [in]efficient this is for actual (i.e. not toy
>>>>> example) data. Maybe others more experienced, and knowledgeable,
>>>>> members can provide better answers.
>>>>>
>>>>> HTH,
>>>>> Vijay.
>>>>>
>>>>> On Thu, Jun 2, 2016 at 4:30 PM, Thiago V. dos Santos via R-sig-Geo
>>>>> <r-sig-geo at r-project.org> wrote:
>>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I am working with daily time series of meteorological variables. This
>>>>>> is an example of the dataset:
>>>>>>
>>>>>> library(raster)
>>>>>>
>>>>>> # Create date sequence
>>>>>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>>>>>
>>>>>> # Create raster stack and assign dates
>>>>>> r <- raster(ncol=20, nrow=20)
>>>>>> s <- stack(lapply(1:length(idx), function(x) setValues(r,
>>>>>> runif(ncell(r)))))
>>>>>> s <- setZ(s, idx)
>>>>>>
>>>>>>
>>>>>> Now, let's assume those values represent daily precipitation. What I
>>>>>> need to do is to integrate daily to monthly values,
>>>>>> and then take a monthly climatology. Climatology in this case means
>>>>>> multi-year average of selected months, e.g., an average of the 30 Octobers
>>>>>> from 1961 to 1990, an average of the 30 Novembers from 1961 to 1990 and etc.
>>>>>>
>>>>>> On the other hand, let's assume the raster values represent daily
>>>>>> temperature. Integrating daily to monthly temperature doesn't make sense.
>>>>>> Hence, instead of integrating daily values, I need to take monthly means
>>>>>> (e.g. mean value of all days in every month), and then calculate the
>>>>>> climatology.
>>>>>>
>>>>>> What would be the best approach to achieve that using the raster
>>>>>> package?
>>>>>>
>>>>>> Greetings,
>>>>>> -- Thiago V. dos Santos
>>>>>>
>>>>>> PhD student
>>>>>> Land and Atmospheric Science
>>>>>> University of Minnesota
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-Geo mailing list
>>>>>> R-sig-Geo at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
More information about the R-sig-Geo
mailing list