[R-sig-eco] patterns in weather data that could relate to pathogen prevalence

Bob O'Hara bohara at senckenberg.de
Fri Nov 30 13:58:58 CET 2012


On 11/30/2012 01:09 PM, Anto Raja wrote:
> Hi all
>
> I am searching for a tool that would help me to identify weather patterns
> that influence the prevalence of a pathogen, 'Pn'.
>
> Say, I have annual prevalence data (collected in April) and I know that the
> prevalence of 'Pn' is affected by the weather conditions since November. I
> also have daily data for different weeather variables.
You have a lot of weather data, but I assume you don't have so much 
prevalence data. So it's going to be difficult, whatever you do...
> The objective is to identify the relationship between the weather from
> Nov-Mar and the prevalence of Pn. We know that weather has an influence on
> Pn. The question is to find out weather from which period is relevant or
> what kind of weather is relevant. It could be that the first two winter
> months (Nov-Dec) is the decisive factor or that a certain weather situation
> (like 20 consecutive days of below zero conditions) occuring at any time is
> important or a combination of both.
>
> I have tried correlations between prevalence and monthly means for Mar,
> Feb-Mar, Jan-Mar and so on and nothing definite turned up. I could also do
> it on a weekly basis manually. But I wonder if there is a tool that uses a
> moving window of different sizes (say, from a min size of 1 week to a max
> of 4 months) and checks correlations for each of these periods.
>
> I am thinking of ARMA, but my present intention is not to forecast but only
> to study. Can it still be used? Or ARMA in combination with multivariate
> analysis to study the relative importance of each weather variable.
I don't see why an ARMA model would help you, as that assumes a 
covariance between times (i.e. autocorrelation) in the response (i.e. 
prevalence). There are methods for assuming that the response has an 
autocorrelation, but I don't think that's your big problem. My reaction 
(without seeing the data, of course) is that you might be asking too 
much of your data to get anything meaningful out of it.

> Any suugestions are welcome. I have used R for basic stats analysis but
> never worked with time-series data or the advanced tools of data mining.
> So, it could also be possible I am not thinking along the right lines. Feel
> free to correct if I am looking in the wrong place.
It sounds like you're trying to mine your data for any pattern. To be 
honest, if you do that, I wouldn't trust the results unless you can 
validate them independently: you'll find some relationship if you try 
enough models, but will it make biological sense? This is particularly 
problematic when you have correlated variables, which you will do 
(especially when you start sliding windows around)

I'd suggest you start by using what's known of the pathogen or its host, 
or of similar host-pathogen systems, to develop a smaller number of 
hypotheses about what sort of effects are likely. Plant ecologists often 
use GDD5 (Growing Degree Days above 5°C), which might be a useful way of 
reducing the temperature data to something smaller. Of course, another 
temperature than 5°C might work better for you.

Bob

-- 
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 7542 1863 /  +49 69 798 40226
Mobile: +49 1515 888 5440
WWW:   http://www.bik-f.de/root/index.php?page_id=219
Blog: http://blogs.nature.com/boboh
Journal of Negative Results - EEB: www.jnr-eeb.org



More information about the R-sig-ecology mailing list