[R-sig-Geo] How to calculate climatology in rasterbricks
Loïc Dutrieux
loic.dutrieux at wur.nl
Sat Jun 4 15:48:45 CEST 2016
Hi Thiago,
Yes, I can reproduce.
It's probably because zApply() defaults to na.rm = TRUE (in fact the
na.rm = argument is ignored if you pass it), and
sum(c(NA, NA), na.rm = TRUE)
[1] 0 (I was surprised by that).
Perhaps the stackApply() call in the zApply() function definition is
missing an ellipsis.
Cheers,
Loïc
On 06/03/2016 10:25 PM, Thiago V. dos Santos wrote:
> Dear Vijay and Loïc,
>
>
> I have identified some artifacts in the results when using zApply on my data. To reproduce it, please download this shapefile (https://www.dropbox.com/s/in2z10mlerr2sfu/southern.zip?dl=0) and run the code below (see the comments after the plot commands):
>
> -----------------------
> library(raster)
> library(zoo)
>
> # Create the date sequence
> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>
> # Create raster stack and assign dates
> r <- raster(ncol=24, nrow=21, xmn=-58, xmx=-47.5, ymn=-34, ymx=-22, resolution=0.5)
> s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
> s <- setZ(s, idx)
>
> # Load shapefile and crop data
> shp <- shapefile('~/Downloads/southern.shp', warnPRJ=F)
> s.c <- crop(s, extent(shp))
> s.c <- mask(s.c, shp)
> plot(s.c,1)
> plot(shp,add=T) # Looks good
>
> # Loïc's approach using zApply
> sYM <- zApply(s.c, by = as.yearmon, sum)
> sM <- zApply(sYM, by = months, mean)
> plot(sYM,1) # Note that the "empty" space across the extent was filled with 0's
> plot(sM,1) # The same "gray area" here
>
> # Vijay's approach using calc
> idxYM <- as.integer(strftime(idx,"%Y%m"))
> idxM <- unique(idxYM)%%100
> meanYM <- calc(s.c, fun=function(x) { by(x, idxYM, sum) })
> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
> plot(meanYM,1) # Note that no 0 was added across the extent
> plot(meanM,1) # This is OK too, no grey area
> -----------------------
>
> Basically, a portion of the raster extent was filled with 0's after using zApply. It doesn't happen when using calc.
>
> Any ideas on what can be causing this?
>
> Greetings,
> -- Thiago V. dos Santos
>
> PhD student
> Land and Atmospheric Science
> University of Minnesota
>
>
>
> On Friday, June 3, 2016 1:47 PM, Vijay Lulla <vijaylulla at gmail.com> wrote:
> Looking at Loïc's answer and from reading ?zApply I learned of
> stackApply, which is used internally by zApply. While zApply is for
> time series of layers, stackApply is more general and can be used for
> applying functions to groups of layers in a stack/brick. It is very
> similar to base R's `tapply`. And it is also fast!
>
>> system.time(meanYM1 <- stackApply(s, idxYM, mean))
> user system elapsed
> 0.45 0.03 0.48
>> system.time(meanYM <- calc(s, fun=function(x) { by(x, idxYM, mean)}))
> user system elapsed
> 17.50 0.01 17.61
>> all(meanYM[] == meanYM1[])
> [1] TRUE
>
> Thanks Loïc for pointing out zApply.
>
>
> On Fri, Jun 3, 2016 at 11:30 AM, Thiago V. dos Santos via R-sig-Geo
> <r-sig-geo at r-project.org> wrote:
>> Cool Loïc, thanks for showing one more option.
>>
>> By the way, in this case zApply is surprisingly faster than calc:
>>
>>> system.time(meanYM <- calc(s,fun=function(x) { by(x, idxYM, sum) }))
>> user system elapsed
>> 21.700 0.248 22.095
>>
>>
>>> system.time(sYM <- zApply(s, by = as.yearmon, sum))
>> user system elapsed
>> 0.811 0.047 0.866
>>
>> Cheers,
>> -- Thiago V. dos Santos
>>
>> PhD student
>> Land and Atmospheric Science
>> University of Minnesota
>>
>>
>>
>> On Friday, June 3, 2016 4:25 AM, Loïc Dutrieux <loic.dutrieux at wur.nl> wrote:
>> This can also be done with zApply:
>>
>> library(zoo)
>>
>> sYM <- zApply(s, by = as.yearmon, sum)
>> sM <- zApply(sYM, by = months, mean)
>>
>> Cheers,
>> Loïc
>>
>> On 06/03/2016 02:02 AM, Vijay Lulla wrote:
>>> I think the following StackOverflow question has the answer:
>>> http://stackoverflow.com/questions/16135877/applying-a-function-to-a-multidimensional-array-with-grouping-variable/16136775#16136775
>>>
>>> Following the instructions listed on that page for your case might go
>>> something like below:
>>>
>>>> idxYM <- as.integer(strftime(idx,"%Y%m"))
>>>> idxM <- unique(idxYM)%%100
>>>> meanYM <- calc(s,fun=function(x) { by(x, idxYM, mean) })
>>>> meanYM
>>> class : RasterBrick
>>> dimensions : 20, 20, 400, 360 (nrow, ncol, ncell, nlayers)
>>> resolution : 18, 9 (x, y)
>>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>> data source : in memory
>>> names : X196101, X196102, X196103, X196104, X196105, X196106,
>>> X196107, X196108, X196109, X196110, X196111, X196112, X196201,
>>> X196202, X196203, ...
>>> min values : 0.3728, 0.2725, 0.3421, 0.3652, 0.3342, 0.3185,
>>> 0.3130, 0.3780, 0.3376, 0.3727, 0.3537, 0.3737, 0.3515, 0.3588,
>>> 0.3334, ...
>>> max values : 0.6399, 0.6652, 0.6583, 0.6640, 0.6359, 0.6761,
>>> 0.6442, 0.6800, 0.6397, 0.6769, 0.6489, 0.6388, 0.6471, 0.6661,
>>> 0.6255, ...
>>>
>>>> meanM <- calc(meanYM, fun=function(x) { by(x, idxM, mean) })
>>>> meanM
>>> class : RasterBrick
>>> dimensions : 20, 20, 400, 12 (nrow, ncol, ncell, nlayers)
>>> resolution : 18, 9 (x, y)
>>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
>>> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
>>> data source : in memory
>>> names : X1, X2, X3, X4, X5, X6, X7,
>>> X8, X9, X10, X11, X12
>>> min values : 0.4645, 0.4715, 0.4768, 0.4717, 0.4749, 0.4705, 0.4697,
>>> 0.4724, 0.4629, 0.4774, 0.4736, 0.4708
>>> max values : 0.5274, 0.5275, 0.5293, 0.5259, 0.5285, 0.5276, 0.5269,
>>> 0.5260, 0.5256, 0.5281, 0.5279, 0.5286
>>>
>>>>
>>>
>>> I'm not sure how [in]efficient this is for actual (i.e. not toy
>>> example) data. Maybe others more experienced, and knowledgeable,
>>> members can provide better answers.
>>>
>>> HTH,
>>> Vijay.
>>>
>>> On Thu, Jun 2, 2016 at 4:30 PM, Thiago V. dos Santos via R-sig-Geo
>>> <r-sig-geo at r-project.org> wrote:
>>>> Dear all,
>>>>
>>>> I am working with daily time series of meteorological variables. This is an example of the dataset:
>>>>
>>>> library(raster)
>>>>
>>>> # Create date sequence
>>>> idx <- seq(as.Date("1961/1/1"), as.Date("1990/12/31"), by = "day")
>>>>
>>>> # Create raster stack and assign dates
>>>> r <- raster(ncol=20, nrow=20)
>>>> s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
>>>> s <- setZ(s, idx)
>>>>
>>>>
>>>> Now, let's assume those values represent daily precipitation. What I need to do is to integrate daily to monthly values,
>>>> and then take a monthly climatology. Climatology in this case means multi-year average of selected months, e.g., an average of the 30 Octobers from 1961 to 1990, an average of the 30 Novembers from 1961 to 1990 and etc.
>>>>
>>>> On the other hand, let's assume the raster values represent daily temperature. Integrating daily to monthly temperature doesn't make sense. Hence, instead of integrating daily values, I need to take monthly means (e.g. mean value of all days in every month), and then calculate the climatology.
>>>>
>>>> What would be the best approach to achieve that using the raster package?
>>>>
>>>> Greetings,
>>>> -- Thiago V. dos Santos
>>>>
>>>> PhD student
>>>> Land and Atmospheric Science
>>>> University of Minnesota
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
More information about the R-sig-Geo
mailing list