[R-sig-eco] GLS, GEE or LMM ??

Fri Apr 16 14:09:08 CEST 2010

Dear Jens,

A random effect with only three levels is not a good idea. You are
estimating a variance on only three numbers. Have a look at the plot
below. It gives the confidence interval of the ratio between the
estimated variance and the true variance. Note that with three levels,
the estimated variance can be from 40 times smaller up to 3.7 times
larger than the true variance. If you have 30 (thirty) levels, this
range is reduced: from 1.8 times smaller up to 1.5 times larger.

n <- seq(2, 100)
low <- qchisq(p = 0.025, df = n - 1) / (n - 1)
high <- qchisq(p = 0.975, df = n - 1) / (n - 1)
plot(n, high, type = "l", ylim = c(0, 5))
lines(n, low)
abline(h = 1, lty = 2)

Therefore I recommend that you add the site variable to the fixed
effects and drop the random effects.

A) Centering continuous data will mostly only affect the estimates of
the intercept. The intercept is the expected value of your respons when
all variables are zero (or at their reference level). So if you have a
timeseries ranging from 2000 to 2010, then the intercept is the value in
the year 0. When you center year to 2000 (year = 2000 --> cyear = 0),
then the intercept will be the expected value in the year 2000. The
first is non sense given your time series, the latter has a practical
interpretation. Note that both model will be mathematically identical
but just use a different parametrisation.

B) Given that you have only three levels, neither a LMM nor GEE will be
a good model. So comparing them is not a good idea.

C) Lower AIC is always better. So -10 is better than -5. AIC = 2 k - 2
log(L) with k = number of parameters, L = likelihood. Models with a high
likelihood will have a lower AIC (if the number of parameters are
equal).

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

> -----Oorspronkelijk bericht-----
> Van: r-sig-ecology-bounces at r-project.org 
> [mailto:r-sig-ecology-bounces at r-project.org] Namens Jens Oldeland
> Verzonden: vrijdag 16 april 2010 12:50
> Aan: r-sig-ecology at r-project.org
> Onderwerp: [R-sig-eco] GLS, GEE or LMM ??
> 
> Dear All, 
> 
> I have run into a number of questions, and thus I hope you 
> could help me out. I am modelling the effect of oyster 
> density and nutrients on the bodyweight of mussels 
> (population average).
> Data was sampled at three different stations over 8 years, 
> with values measured in springtime once per year.
> 
> I was following Zuur et al 2009 Mixed Effects Models 
> (wonderful book!), but got lost at some points since
> different models lead to totally different results.
> 
> a) the first question is about "centring data". Zuur suggest 
> to center parameters (p.334) if they are highly correlated 
> with the intercept. 
> When I apply a lme (family=gaussian, random ~ 1 | bank,  
> correlation = corAR1(form = ~ daycount)) I have to center 
> nearly all the values. When I apply a GEE then there is no 
> correlation at all (r=0.14).
> Actually, centring the data leads to the same output at the 
> end (for the
> lme)
> 
> b) Choosing GEE, the effect of one parameter (salinity) is 
> highly significant, while using the LMM approach it is not, 
> which would be better for our interpretation...
> But why? Is it because GEE should not be used on normally 
> distributed data? I know that GEE uses sandwich estimator and 
> LMM uses ML. Which one would be more "trustworthy" or conservative?
> 
> c) one last qeustion: negative AICs, which one is better. 
> AIC: -10 or -5 ? I have read contrasting statements. Is there 
> any proof?? Does it hold for BIC as well?
> 
> thank you in advance!
> Jens
> 
> -- 
> +++++++++++++++++++++++++++++++++++++++++
> Dipl.Biol. Jens Oldeland
> Biodiversity of Plants
> Biocentre Klein Flottbek and Botanical Garden University of 
> Hamburg Ohnhorststr. 18
> 22609 Hamburg,
> Germany
> 
> Tel:    0049-(0)40-42816-407
> Fax:    0049-(0)40-42816-543
> Mail: 	Oldeland at botanik.uni-hamburg.de
>         Oldeland at gmx.de 	(for attachments > 2mb!!)
> Skype:	jens.oldeland
> http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
> +++++++++++++++++++++++++++++++++++++++++
> 
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.