[R-sig-Geo] Comparing abundances at fixed locations in space - Syrjala test

Mon Feb 11 12:26:04 CET 2008

I start by reposting my previous message which was sent from a  
different address and therefore probably did not reach the list. Sorry  
about this:

> On 2008-February-11  , at 10:19 , Barry Rowlingson wrote:
>> jiho wrote:
>>> Thank you very much for this reference. However the problem it is   
>>> dealing with is not really similar to the one I target. In this  
>>> paper  the authors assess the differences in positions of neurones  
>>> in a 2D  plane between three groups of patients, with replicates  
>>> in each group.  So the data of interest are the coordinates.
>>> In my case, the positions of sampling stations are fixed (and on  
>>> a  grid if that helps [1]) and I want to assess the differences  
>>> in  abundances of two groups at these positions. So the data of  
>>> interest  are the abundances (normalized to remove the effect of  
>>> total  population sizes), and more specifically, the way the  
>>> abundances are  distributed on these points. Maybe the subject of  
>>> this email is not  correctly stated then. I am not a native  
>>> english speaker and when it  comes to technical terms, it is even  
>>> worse.
>>
>> "Spatial Point Pattern Analysis" only refers to cases where the  
>> locations of the points are 'interesting', which usually means they  
>> are generated by a stochastic process - like tree locations in a  
>> natural forest rather than rows of trees in a plantation.
>
> Thanks for clarifying these terms. Indeed I am _not_ after spatial  
> point pattern techniques. I changed the subject accordingly.
>
>> Analysis of data that comes from spatial locations that are  
>> 'uninteresting' are another branch of statistics altogether. It  
>> will probably end up being generalised linear modelling with  
>> spatially-correlated errors, and how you deal with the correlations  
>> is the interesting part.
>>
>> See if you can write down a model for your data and include a  
>> smoothly-varying spatial error term.... Then maybe we can find some  
>> R code to solve it. I don't think we'll find it in Spatstat, which  
>> I think is still exclusively spatial point pattern analysis. Have a  
>> look at geoRglm maybe...
>
> Thank you for the pointer. The vignette of geoRglm seems promising,  
> though much is about prediction from a given model while I am most  
> interested in which terms are in the model, i.e. which variables  
> have a notable influence on the repartition of the organisms. My  
> scenario seems simpler than those presented however, since the data  
> are standardized by the sampling effort, meaning that the same  
> Poisson law applies to all points.
>
> A continuous variable than would represent the spatiality in this  
> dataset could simply be the distance from the lower-left corner of  
> the sampling grid for example, or the distance from the island  
> around which the sampling grid is designed (such a distance would  
> have a biological meaning since we expect the abundances to be  
> inversely proportional to it). Is that something that could fit your  
> definition of a "smoothly-varying spatial error term" or am I  
> completely mistaken?
>
> Your answer and the vignette of geoRglm highlight how little I know  
> about all this (I am just a young biologist after all) and how much  
> reading I need to do. The page of geoRglm has a nice list of  
> publications:
> 	http://www.daimi.au.dk/~olefc/geoRglm/Intro/books.html
> Could you (or someone else) direct me towards the best introductory  
> text(s) on this matter please?
>
> Thank you very much for your help.

Now for the current message:

On 2008-February-11  , at 11:46 , Barry Rowlingson wrote:
> Jean-Olivier Irisson wrote:
>> Thank you for the pointer. The vignette of geoRglm seems promising,
>> though much is about prediction from a given model while I am most
>> interested in which terms are in the model, i.e. which variables  
>> have a
>> notable influence on the repartition of the organisms. My scenario  
>> seems
>> simpler than those presented however, since the data are  
>> standardized by
>> the sampling effort, meaning that the same Poisson law applies to all
>> points.
>
>  I think you still need to fit a model, and then you can test how
> useful your covariates are with standard techniques.
>
>> A continuous variable than would represent the spatiality in this
>> dataset could simply be the distance from the lower-left corner of  
>> the
>> sampling grid for example, or the distance from the island around  
>> which
>> the sampling grid is designed (such a distance would have a  
>> biological
>> meaning since we expect the abundances to be inversely proportional  
>> to
>> it). Is that something that could fit your definition of a
>> "smoothly-varying spatial error term" or am I completely mistaken?
>
>  Think about fitting a straight line through some points. You find the
> line that best fits your points. Then you look at the residual
> differences between the line and your points. All the usual linear  
> model
> theory about predictions and significance depends on those residuals
> being uncorrelated and independent. If you are fitting a straight line
> to a curve then that won't be true, and if you then say something  
> about
> your straight line based on the linear model theory you'll be wrong.
>
>  Now, you could fit a non-spatial generalised linear model to your  
> data
> using glm() in R and then map the residuals. If the residual map shows
> structure, then there's something else going on that your model hasn't
> accounted for. Perhaps there is an obvious trend due to a covariate
> you've not included, such as elevation above sea level. You could then
> add this to your model. If the residual surface looks like random  
> noise
> then you can use standard linear model theory to make conclusions  
> about
> your covariate parameters.
>
>  If the residual surface doesn't look like random noise then that's
> when you get into geoRglm functions which (I think) fit a GLM where  
> the
> error surface (that's your residuals) is defined by a gaussian random
> field with a fitted covariance structure. Once that's done, the  
> geoRglm
> code will tell you about your covariate parameter significance (I  
> think!
> It's been a while since I've used it. Maybe Paulo and Ole can expand  
> on
> this).
>
>  So what I'd do is:
>
>  * fit a simple GLM using glm.
>  * Look at parameter estimates and significance.
>  * Draw a map of residuals.
>  * Then worry about spatial correlation.

Thank you very much for such a detailed explanation. This is very  
clear and helps me a lot. I already fitted a glm with spatial  
variables in it to inspect potential spatial effects but I never  
thought about mapping the residuals. I will refit the model excluding  
the spatial variables and check wether there is structure in the  
residuals as you advise. Then the inclusion of spatial variables may  
tell me something depending on their influence on the structure of the  
residuals.

>  Oh, I'd also, if I were you, try and find a local statistician  
> expert!

That would probably be the hardest part :/ Unfortunately there's no  
statistics department nearby and although we have biostatisticians in  
the lab, this is far from their field of activity. This lack of local  
expertise is becoming more and more of a problem but statisticians are  
a rare species!

Thank you again for your help. Sincerely,

Jean-Olivier Irisson
---
UMR 5244 CNRS-EPHE-UPVD, 52 av Paul Alduy, 66860 Perpignan Cedex, France
+336 21 05 19 90
http://jo.irisson.free.fr/work/