[R-sig-eco] GLMM

Thu Aug 20 15:31:52 CEST 2009

Dear Maarten,

as I understand it, your question goes beyond the technical questions 
r-sig-ecology usually is confronted with.
If I get it right, you want to control for the spatial overlap (causing 
spatial autocorrelation) AND at the same time analyse data points as 
nested within subpopulations, without, however, knowing where the 
absences for the subpopulations end.
My blunt approach would be: Is your evidence of subpopulations due to 
the spatial clustering of occurrences, or is there "hard" data (DNA, 
song variations, plumage patterns, whatever). If no "hard" data is 
present, I would ignore the subpopulations (for the time being) and 
"simply" run a GLMM with a spatial correlation structure. If you have 
the hard evidence, I would try to tessellate the study area based on the 
presence-data.

Technically, glmmPQL (in MASS) accepts a "corStruct" term, such as 
corExp(term=~X+Y), where X and Y are geographical coordinates (see nlme: 
?corExp). In my experience, you also have to give a random effect 
(although there isn't any), which I do by defining a variable group <- 
as.factor(rep("a", nrow(mydata))) and then use that in the glmmPQL call:
attach(mydata) # I think that is bizarrely necessary (!?)
glmmPQL(PA ~ env.variables, random=~1|group, corExp(term=~X+Y), 
data=mydata, family=binomial)
In the "hard" evidence case, group could be a different letter for each 
tessellated area, but the rest should remain similar.

NOTE: This advise comes without warranty. There may be statistical 
depths lurking which I am simply unaware of (but not intentionally unaware).

HTH,

Carsten

Maarten de Groot wrote:
> Dear R list,
>
> I am working on land use parameters affecting the habitat selection of 
> a bird species. In GIS, I calculated the land use parameters 100 m 
> around the individuals. In addition, I also calculated land use in 45 
> random points which are not covering each other in the area where the 
> species was not found.
> Because the individuals are aggregated in 9 sub populations, many 
> samples are overlapping. When I would use a simple logistic regression 
> for this would mean that there is pseudo replication. One way to 
> handle this is Generalized linear mixed models (GLMM). Now is my 
> question, whether I can use GLMM also when there are distinct groups 
> for the presence data and not for the absence data. And how should I 
> incorporate this in the data set? Are there any other methods to avoid 
> this problem beside GLMM or averaging the parameters within the sub 
> populations?
>
> Looking forward to your help,
>
> Maarten
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>

-- 
Dr. Carsten F. Dormann
Department of Computational Landscape Ecology
Helmholtz Centre for Environmental Research-UFZ 
Permoserstr. 15
04318 Leipzig
Germany

Tel: ++49(0)341 2351946
Fax: ++49(0)341 2351939
Email: carsten.dormann at ufz.de
internet: http://www.ufz.de/index.php?de=4205