[R-sig-eco] GLS, GEE or LMM ??

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Fri Apr 16 15:05:40 CEST 2010


Jens,

Please do read my e-mail again (and replace "three levels" with "four levels"). I was trying to explain that you have not enough levels of bank to use a mixed model.

And read chapter 6 of Zuur et al again. Do you have a trend in the observed values or in the residuals. The first requires time as a covariate in the fixed effects (or as a random slope). The latter requires a correlation structure along time.

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
  

> -----Oorspronkelijk bericht-----
> Van: Jens Oldeland [mailto:oldeland at gmx.de] 
> Verzonden: vrijdag 16 april 2010 14:59
> Aan: Ben Bolker
> CC: ONKELINX, Thierry; r-sig-ecology at r-project.org
> Onderwerp: Re: [R-sig-eco] GLS, GEE or LMM ??
> 
> Dear Ben,
> 
> > How many levels of "bank" are there? 
> 
> there are four banks, thus four levels and ...
> 
> > How many observations do you have overall?
> 
> fourty values overall which are roughly equally distributed 
> i.e. 10, 7,10,13.
> 
> > I take it that there are not trends in date?
> There is a clear trend in date. That´s why I used the 
> AR1-correlation structure.
> 
> 
> > What is the nature of aWert?  Counts, continuous measurements?
> the aWert is a continous unitless measure of the 
> "fleshweight" of a mussel. Originally we had many many single 
> values for aWert but since we had always the same values pH 
> etc. for each bank at each date, we averaged the aWert to a 
> population average per bank.
> 
> 
> 
> > The usual point of random-effects models is to analyze data where 
> > there are a *large* number of groups, possibly with 
> relatively small 
> > numbers of samples per group.
> 
> Hmmm, I see that this is might not be the case in our data. 
> But what is "large" and what is relatively small... is 4 
> groups with 10 samples a large number of groups, or not? 
> Sorry, I am lacking experience in this question :(
> 
> best
> jens
> 
> 
> 
> Ben Bolker schrieb:
> > Jens Oldeland wrote:
> >   
> >> Dear Thierry,
> >>
> >> thank you very much for your help! However, I think I have not 
> >> explained my approach very good.
> >> I am using this formula
> >>
> >> M1.1.lme <- lme(aWert ~ Salinity +  pH + chl.a + NO3 + oyster_qm + 
> >> meanspring,  random = ~ 1 | bank,  na.action=na.omit, method="ML", 
> >> data=mussels,  correlation = corAR1(form = ~ datumszahl))
> >>
> >> hence six variables for the fixed effect, bank (station) as the 
> >> location effect and "datumszahl" for the time effect. 
> Datumszahl is a 
> >> numeric that replaces a certain date. For example 35932  
> would be 17. May 98.
> >> Hence I am not using year 2000 but day..35000? oops :-)
> >>     
> >
> >   How many levels of "bank" are there?  That's the critical 
> question.
> >   I take it that there are not trends in date?  If so, you 
> should have 
> > 'datumszahl' in the fixed effects as well as in the 
> correlation structure.
> >    What is the nature of aWert?  Counts, continuous 
> measurements?  Are 
> > the counts small numbers?
> >
> >   How many observations do you have overall?
> >
> >   
> >> Do you still think that six variables are not enough to 
> calculate a 
> >> LMM or GEE?
> >> But than...what is the purpose of such models when they do 
> not work 
> >> with a small set of variables?
> >>     
> >
> >   The usual point of random-effects models is to analyze data where 
> > there are a *large* number of groups, possibly with 
> relatively small 
> > numbers of samples per group.
> >   
> >> thinking,
> >> Jens
> >>
> >>
> >>
> >> ONKELINX, Thierry schrieb:
> >>     
> >>> Dear Jens,
> >>>
> >>> A random effect with only three levels is not a good 
> idea. You are 
> >>> estimating a variance on only three numbers. Have a look 
> at the plot 
> >>> below. It gives the confidence interval of the ratio between the 
> >>> estimated variance and the true variance. Note that with three 
> >>> levels, the estimated variance can be from 40 times smaller up to 
> >>> 3.7 times larger than the true variance. If you have 30 (thirty) 
> >>> levels, this range is reduced: from 1.8 times smaller up 
> to 1.5 times larger.
> >>>
> >>> n <- seq(2, 100)
> >>> low <- qchisq(p = 0.025, df = n - 1) / (n - 1) high <- qchisq(p = 
> >>> 0.975, df = n - 1) / (n - 1) plot(n, high, type = "l", 
> ylim = c(0, 
> >>> 5)) lines(n, low) abline(h = 1, lty = 2)
> >>>
> >>> Therefore I recommend that you add the site variable to the fixed 
> >>> effects and drop the random effects.
> >>>
> >>> A) Centering continuous data will mostly only affect the 
> estimates 
> >>> of the intercept. The intercept is the expected value of your 
> >>> respons when all variables are zero (or at their 
> reference level). 
> >>> So if you have a timeseries ranging from 2000 to 2010, then the 
> >>> intercept is the value in the year 0. When you center 
> year to 2000 
> >>> (year = 2000 --> cyear = 0), then the intercept will be 
> the expected 
> >>> value in the year 2000. The first is non sense given your time 
> >>> series, the latter has a practical interpretation. Note that both 
> >>> model will be mathematically identical but just use a 
> different parametrisation.
> >>>
> >>> B) Given that you have only three levels, neither a LMM 
> nor GEE will 
> >>> be a good model. So comparing them is not a good idea.
> >>>
> >>> C) Lower AIC is always better. So -10 is better than -5. 
> AIC = 2 k - 
> >>> 2
> >>> log(L) with k = number of parameters, L = likelihood. 
> Models with a 
> >>> high likelihood will have a lower AIC (if the number of 
> parameters 
> >>> are equal).
> >>>
> >>> HTH,
> >>>
> >>> Thierry
> >>>
> >>>
> >>> 
> --------------------------------------------------------------------
> >>> ----
> >>> ----
> >>> ir. Thierry Onkelinx
> >>> Instituut voor natuur- en bosonderzoek team Biometrie & 
> >>> Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium
> >>>
> >>> Research Institute for Nature and Forest team Biometrics 
> & Quality 
> >>> Assurance Gaverstraat 4 9500 Geraardsbergen Belgium
> >>>
> >>> tel. + 32 54/436 185
> >>> Thierry.Onkelinx at inbo.be
> >>> www.inbo.be
> >>>
> >>> To call in the statistician after the experiment is done 
> may be no 
> >>> more than asking him to perform a post-mortem 
> examination: he may be 
> >>> able to say what the experiment died of.
> >>> ~ Sir Ronald Aylmer Fisher
> >>>
> >>> The plural of anecdote is not data.
> >>> ~ Roger Brinner
> >>>
> >>> The combination of some data and an aching desire for an 
> answer does 
> >>> not ensure that a reasonable answer can be extracted from a given 
> >>> body of data.
> >>> ~ John Tukey
> >>>   
> >>>
> >>>   
> >>>       
> >>>> -----Oorspronkelijk bericht-----
> >>>> Van: r-sig-ecology-bounces at r-project.org
> >>>> [mailto:r-sig-ecology-bounces at r-project.org] Namens Jens Oldeland
> >>>> Verzonden: vrijdag 16 april 2010 12:50
> >>>> Aan: r-sig-ecology at r-project.org
> >>>> Onderwerp: [R-sig-eco] GLS, GEE or LMM ??
> >>>>
> >>>> Dear All,
> >>>>
> >>>> I have run into a number of questions, and thus I hope you could 
> >>>> help me out. I am modelling the effect of oyster density and 
> >>>> nutrients on the bodyweight of mussels (population average).
> >>>> Data was sampled at three different stations over 8 years, with 
> >>>> values measured in springtime once per year.
> >>>>
> >>>> I was following Zuur et al 2009 Mixed Effects Models (wonderful 
> >>>> book!), but got lost at some points since different 
> models lead to 
> >>>> totally different results.
> >>>>
> >>>> a) the first question is about "centring data". Zuur suggest to 
> >>>> center parameters (p.334) if they are highly correlated with the 
> >>>> intercept.
> >>>> When I apply a lme (family=gaussian, random ~ 1 | bank, 
> correlation 
> >>>> = corAR1(form = ~ daycount)) I have to center nearly all the 
> >>>> values. When I apply a GEE then there is no correlation at all 
> >>>> (r=0.14).
> >>>> Actually, centring the data leads to the same output at the end 
> >>>> (for the
> >>>> lme)
> >>>>
> >>>> b) Choosing GEE, the effect of one parameter (salinity) 
> is highly 
> >>>> significant, while using the LMM approach it is not, 
> which would be 
> >>>> better for our interpretation...
> >>>> But why? Is it because GEE should not be used on normally 
> >>>> distributed data? I know that GEE uses sandwich
> estimator and LMM 
> >>>> uses ML. Which one would be more "trustworthy" or conservative?
> >>>>
> >>>> c) one last qeustion: negative AICs, which one is better. 
> >>>> AIC: -10 or -5 ? I have read contrasting statements. Is 
> there any 
> >>>> proof?? Does it hold for BIC as well?
> >>>>
> >>>> thank you in advance!
> >>>> Jens
> >>>>
> >>>> --
> >>>> +++++++++++++++++++++++++++++++++++++++++
> >>>> Dipl.Biol. Jens Oldeland
> >>>> Biodiversity of Plants
> >>>> Biocentre Klein Flottbek and Botanical Garden University 
> of Hamburg 
> >>>> Ohnhorststr. 18
> >>>> 22609 Hamburg,
> >>>> Germany
> >>>>
> >>>> Tel:    0049-(0)40-42816-407
> >>>> Fax:    0049-(0)40-42816-543
> >>>> Mail: 	Oldeland at botanik.uni-hamburg.de
> >>>>         Oldeland at gmx.de 	(for attachments > 2mb!!)
> >>>> Skype:	jens.oldeland
> >>>> http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
> >>>> +++++++++++++++++++++++++++++++++++++++++
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-ecology mailing list
> >>>> R-sig-ecology at r-project.org
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> >>>>
> >>>>     
> >>>>         
> >>> Druk dit bericht a.u.b. niet onnodig af.
> >>> Please do not print this message unnecessarily.
> >>>
> >>> Dit bericht en eventuele bijlagen geven enkel de visie van de 
> >>> schrijver weer en binden het INBO onder geen enkel beding, zolang 
> >>> dit bericht niet bevestigd is door een geldig ondertekend 
> document. 
> >>> The views expressed in  this message and any annex are 
> purely those 
> >>> of the writer and may not be regarded as stating an official 
> >>> position of INBO, as long as the message is not confirmed 
> by a duly signed document.
> >>>
> >>>   
> >>>       
> >>     
> >
> >
> >   
> 
> 
> -- 
> +++++++++++++++++++++++++++++++++++++++++
> Dipl.Biol. Jens Oldeland
> Biodiversity of Plants
> Biocentre Klein Flottbek and Botanical Garden University of 
> Hamburg Ohnhorststr. 18
> 22609 Hamburg,
> Germany
> 
> Tel:    0049-(0)40-42816-407
> Fax:    0049-(0)40-42816-543
> Mail: 	Oldeland at botanik.uni-hamburg.de
>         Oldeland at gmx.de 	(for attachments > 2mb!!)
> Skype:	jens.oldeland
> http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
> http://jensoldeland.wordpress.com
> +++++++++++++++++++++++++++++++++++++++++
> 
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.



More information about the R-sig-ecology mailing list