[R-sig-eco] GLS, GEE or LMM ??

Fri Apr 16 14:58:48 CEST 2010

Dear Ben,

> How many levels of "bank" are there? 

there are four banks, thus four levels and ...

> How many observations do you have overall?

fourty values overall which are roughly equally distributed i.e. 10, 
7,10,13.

> I take it that there are not trends in date?
There is a clear trend in date. That´s why I used the AR1-correlation structure.

> What is the nature of aWert?  Counts, continuous measurements?
the aWert is a continous unitless measure of the "fleshweight" of a mussel. Originally we had many many single values for aWert but since we had always the same values pH etc. for each bank at each date, we averaged the aWert to a population average per bank.

> The usual point of random-effects models is to analyze data where
> there are a *large* number of groups, possibly with relatively small
> numbers of samples per group.

Hmmm, I see that this is might not be the case in our data. But what is "large" and what is relatively small... is 4 groups with 10 samples a large number of groups, or not? Sorry, I am lacking experience in this question :(

best
jens

Ben Bolker schrieb:
> Jens Oldeland wrote:
>   
>> Dear Thierry,
>>
>> thank you very much for your help! However, I think I have not explained 
>> my approach very good.
>> I am using this formula
>>
>> M1.1.lme <- lme(aWert ~ Salinity +  pH + chl.a + NO3 + oyster_qm + 
>> meanspring,  random = ~ 1 | bank,  na.action=na.omit, method="ML",  
>> data=mussels,  correlation = corAR1(form = ~ datumszahl))
>>
>> hence six variables for the fixed effect, bank (station) as the location 
>> effect and "datumszahl" for the time effect. Datumszahl is a numeric 
>> that replaces a certain date. For example 35932  would be 17. May 98. 
>> Hence I am not using year 2000 but day..35000? oops :-)
>>     
>
>   How many levels of "bank" are there?  That's the critical question.
>   I take it that there are not trends in date?  If so, you should have
> 'datumszahl' in the fixed effects as well as in the correlation structure.
>    What is the nature of aWert?  Counts, continuous measurements?  Are
> the counts small numbers?
>
>   How many observations do you have overall?
>
>   
>> Do you still think that six variables are not enough to calculate a LMM 
>> or GEE?
>> But than...what is the purpose of such models when they do not work with 
>> a small set of variables?
>>     
>
>   The usual point of random-effects models is to analyze data where
> there are a *large* number of groups, possibly with relatively small
> numbers of samples per group.
>   
>> thinking,
>> Jens
>>
>>
>>
>> ONKELINX, Thierry schrieb:
>>     
>>> Dear Jens,
>>>
>>> A random effect with only three levels is not a good idea. You are
>>> estimating a variance on only three numbers. Have a look at the plot
>>> below. It gives the confidence interval of the ratio between the
>>> estimated variance and the true variance. Note that with three levels,
>>> the estimated variance can be from 40 times smaller up to 3.7 times
>>> larger than the true variance. If you have 30 (thirty) levels, this
>>> range is reduced: from 1.8 times smaller up to 1.5 times larger.
>>>
>>> n <- seq(2, 100)
>>> low <- qchisq(p = 0.025, df = n - 1) / (n - 1)
>>> high <- qchisq(p = 0.975, df = n - 1) / (n - 1)
>>> plot(n, high, type = "l", ylim = c(0, 5))
>>> lines(n, low)
>>> abline(h = 1, lty = 2)
>>>
>>> Therefore I recommend that you add the site variable to the fixed
>>> effects and drop the random effects.
>>>
>>> A) Centering continuous data will mostly only affect the estimates of
>>> the intercept. The intercept is the expected value of your respons when
>>> all variables are zero (or at their reference level). So if you have a
>>> timeseries ranging from 2000 to 2010, then the intercept is the value in
>>> the year 0. When you center year to 2000 (year = 2000 --> cyear = 0),
>>> then the intercept will be the expected value in the year 2000. The
>>> first is non sense given your time series, the latter has a practical
>>> interpretation. Note that both model will be mathematically identical
>>> but just use a different parametrisation.
>>>
>>> B) Given that you have only three levels, neither a LMM nor GEE will be
>>> a good model. So comparing them is not a good idea.
>>>
>>> C) Lower AIC is always better. So -10 is better than -5. AIC = 2 k - 2
>>> log(L) with k = number of parameters, L = likelihood. Models with a high
>>> likelihood will have a lower AIC (if the number of parameters are
>>> equal).
>>>
>>> HTH,
>>>
>>> Thierry
>>>
>>>
>>> ------------------------------------------------------------------------
>>> ----
>>> ir. Thierry Onkelinx
>>> Instituut voor natuur- en bosonderzoek
>>> team Biometrie & Kwaliteitszorg
>>> Gaverstraat 4
>>> 9500 Geraardsbergen
>>> Belgium
>>>
>>> Research Institute for Nature and Forest
>>> team Biometrics & Quality Assurance
>>> Gaverstraat 4
>>> 9500 Geraardsbergen
>>> Belgium
>>>
>>> tel. + 32 54/436 185
>>> Thierry.Onkelinx at inbo.be
>>> www.inbo.be
>>>
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to
>>> say what the experiment died of.
>>> ~ Sir Ronald Aylmer Fisher
>>>
>>> The plural of anecdote is not data.
>>> ~ Roger Brinner
>>>
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> ~ John Tukey
>>>   
>>>
>>>   
>>>       
>>>> -----Oorspronkelijk bericht-----
>>>> Van: r-sig-ecology-bounces at r-project.org 
>>>> [mailto:r-sig-ecology-bounces at r-project.org] Namens Jens Oldeland
>>>> Verzonden: vrijdag 16 april 2010 12:50
>>>> Aan: r-sig-ecology at r-project.org
>>>> Onderwerp: [R-sig-eco] GLS, GEE or LMM ??
>>>>
>>>> Dear All, 
>>>>
>>>> I have run into a number of questions, and thus I hope you 
>>>> could help me out. I am modelling the effect of oyster 
>>>> density and nutrients on the bodyweight of mussels 
>>>> (population average).
>>>> Data was sampled at three different stations over 8 years, 
>>>> with values measured in springtime once per year.
>>>>
>>>> I was following Zuur et al 2009 Mixed Effects Models 
>>>> (wonderful book!), but got lost at some points since 
>>>> different models lead to totally different results.
>>>>
>>>> a) the first question is about "centring data". Zuur suggest 
>>>> to center parameters (p.334) if they are highly correlated 
>>>> with the intercept. 
>>>> When I apply a lme (family=gaussian, random ~ 1 | bank,  
>>>> correlation = corAR1(form = ~ daycount)) I have to center 
>>>> nearly all the values. When I apply a GEE then there is no 
>>>> correlation at all (r=0.14).
>>>> Actually, centring the data leads to the same output at the 
>>>> end (for the
>>>> lme)
>>>>
>>>> b) Choosing GEE, the effect of one parameter (salinity) is 
>>>> highly significant, while using the LMM approach it is not, 
>>>> which would be better for our interpretation...
>>>> But why? Is it because GEE should not be used on normally 
>>>> distributed data? I know that GEE uses sandwich estimator and 
>>>> LMM uses ML. Which one would be more "trustworthy" or conservative?
>>>>
>>>> c) one last qeustion: negative AICs, which one is better. 
>>>> AIC: -10 or -5 ? I have read contrasting statements. Is there 
>>>> any proof?? Does it hold for BIC as well?
>>>>
>>>> thank you in advance!
>>>> Jens
>>>>
>>>> -- 
>>>> +++++++++++++++++++++++++++++++++++++++++
>>>> Dipl.Biol. Jens Oldeland
>>>> Biodiversity of Plants
>>>> Biocentre Klein Flottbek and Botanical Garden University of 
>>>> Hamburg Ohnhorststr. 18
>>>> 22609 Hamburg,
>>>> Germany
>>>>
>>>> Tel:    0049-(0)40-42816-407
>>>> Fax:    0049-(0)40-42816-543
>>>> Mail: 	Oldeland at botanik.uni-hamburg.de
>>>>         Oldeland at gmx.de 	(for attachments > 2mb!!)
>>>> Skype:	jens.oldeland
>>>> http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>
>>>> _______________________________________________
>>>> R-sig-ecology mailing list
>>>> R-sig-ecology at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>
>>>>     
>>>>         
>>> Druk dit bericht a.u.b. niet onnodig af.
>>> Please do not print this message unnecessarily.
>>>
>>> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
>>> en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
>>> door een geldig ondertekend document. The views expressed in  this message 
>>> and any annex are purely those of the writer and may not be regarded as stating 
>>> an official position of INBO, as long as the message is not confirmed by a duly 
>>> signed document.
>>>
>>>   
>>>       
>>     
>
>
>   

-- 
+++++++++++++++++++++++++++++++++++++++++
Dipl.Biol. Jens Oldeland
Biodiversity of Plants
Biocentre Klein Flottbek and Botanical Garden
University of Hamburg 
Ohnhorststr. 18
22609 Hamburg,
Germany

Tel:    0049-(0)40-42816-407
Fax:    0049-(0)40-42816-543
Mail: 	Oldeland at botanik.uni-hamburg.de
        Oldeland at gmx.de 	(for attachments > 2mb!!)
Skype:	jens.oldeland
http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
http://jensoldeland.wordpress.com
+++++++++++++++++++++++++++++++++++++++++