[R-sig-eco] GLS, GEE or LMM ??

Fri Apr 16 15:58:59 CEST 2010

Dear Ben and Thierry,

thank you very much for your advice! I think you are perfectly right 
concering the overcomplication thing.
I have checked again the residuals of the "lm" and, everything is fine. 
Now, back to the roots...

thank you very much!

Jens

Ben Bolker schrieb:
>   My advice: I think you are overcomplicating the analysis.  There is a
> rule of thumb that says that you should have *at least* 10 samples per
> parameter you hope to estimate (e.g. see Ellison and Gotelli _Primer of
> Ecology_ or Harrell _Applied Regression Analysis_).  Correlations are
> also harder to estimate than 'fixed effects'.  With 5 fixed effects
> (salinity, pH, chlorophylla, NO3, oyster_qm) you are already pushing the
> limits here, and that's not including 'bank effects' (which would add
> another 3 parameters, for deviations of banks), or temporal trends.
>
>   If you use plain old lm(), you don't really need to worry about
> centering for computational reasons, although it still may be useful to
> center for reasons of interpretation, e.g. to replace your time variable
> with (time - (time at beginning of observations)) so that the intercept
> parameter in your model is the expected response at the beginning of
> your experiment rather than (e.g.) in AD 0.
>
>    Are there any of these predictor variables you can bear to *not* analyze?
>
>   I would suggest a model *something* like
>
>   lm(aWert ~ bank + datumszahl + Salinity + pH + chl.a + NO3 + oyster_qm
> + meanspring, data = mussels)
>
>   although it would be much better to drop some predictors if you can.
>
>   Draw plots of your data.
>
>   Inspect the residuals for evidence of autocorrelation, etc. -- I doubt
> you will see much.
>
>  **Do not** be tempted by stepwise regression.
>
>   good luck,
>    Ben Bolker
>
>
>   
>>>> M1.1.lme <- lme(aWert ~ Salinity +  pH + chl.a + NO3 + oyster_qm
>>>> + meanspring,  random = ~ 1 | bank,  na.action=na.omit,
>>>> method="ML", data=mussels,  correlation = corAR1(form = ~
>>>> datumszahl))
>>>>         
>
> Jens Oldeland wrote:
>   
>> Dear Ben,
>>
>>     
>>> How many levels of "bank" are there?
>>>       
>> there are four banks, thus four levels and ...
>>
>>     
>>> How many observations do you have overall?
>>>       
>> fourty values overall which are roughly equally distributed i.e. 10,
>>  7,10,13.
>>
>>     
>>> I take it that there are not trends in date?
>>>       
>> There is a clear trend in date. That´s why I used the AR1-correlation
>> structure.
>>
>>
>>     
>>> What is the nature of aWert?  Counts, continuous measurements?
>>>       
>> the aWert is a continous unitless measure of the "fleshweight" of a
>> mussel. Originally we had many many single values for aWert but since
>> we had always the same values pH etc. for each bank at each date, we
>> averaged the aWert to a population average per bank.
>>
>>
>>
>>     
>>> The usual point of random-effects models is to analyze data where 
>>> there are a *large* number of groups, possibly with relatively
>>> small numbers of samples per group.
>>>       
>> Hmmm, I see that this is might not be the case in our data. But what
>> is "large" and what is relatively small... is 4 groups with 10
>> samples a large number of groups, or not? Sorry, I am lacking
>> experience in this question :(
>>
>> best jens
>>
>>
>>
>> Ben Bolker schrieb:
>>     
>>> Jens Oldeland wrote:
>>>
>>>       
>>>> Dear Thierry,
>>>>
>>>> thank you very much for your help! However, I think I have not
>>>> explained my approach very good. I am using this formula
>>>>
>>>> M1.1.lme <- lme(aWert ~ Salinity +  pH + chl.a + NO3 + oyster_qm
>>>> + meanspring,  random = ~ 1 | bank,  na.action=na.omit,
>>>> method="ML", data=mussels,  correlation = corAR1(form = ~
>>>> datumszahl))
>>>>
>>>> hence six variables for the fixed effect, bank (station) as the
>>>> location effect and "datumszahl" for the time effect. Datumszahl
>>>> is a numeric that replaces a certain date. For example 35932
>>>> would be 17. May 98. Hence I am not using year 2000 but
>>>> day..35000? oops :-)
>>>>
>>>>         
>>> How many levels of "bank" are there?  That's the critical question.
>>>  I take it that there are not trends in date?  If so, you should
>>> have 'datumszahl' in the fixed effects as well as in the
>>> correlation structure. What is the nature of aWert?  Counts,
>>> continuous measurements?  Are the counts small numbers?
>>>
>>> How many observations do you have overall?
>>>
>>>
>>>       
>>>> Do you still think that six variables are not enough to calculate
>>>> a LMM or GEE? But than...what is the purpose of such models when
>>>> they do not work with a small set of variables?
>>>>
>>>>         
>>> The usual point of random-effects models is to analyze data where 
>>> there are a *large* number of groups, possibly with relatively
>>> small numbers of samples per group.
>>>
>>>       
>>>> thinking, Jens
>>>>
>>>>
>>>>
>>>> ONKELINX, Thierry schrieb:
>>>>
>>>>         
>>>>> Dear Jens,
>>>>>
>>>>> A random effect with only three levels is not a good idea. You
>>>>> are estimating a variance on only three numbers. Have a look at
>>>>> the plot below. It gives the confidence interval of the ratio
>>>>> between the estimated variance and the true variance. Note that
>>>>> with three levels, the estimated variance can be from 40 times
>>>>> smaller up to 3.7 times larger than the true variance. If you
>>>>> have 30 (thirty) levels, this range is reduced: from 1.8 times
>>>>> smaller up to 1.5 times larger.
>>>>>
>>>>> n <- seq(2, 100) low <- qchisq(p = 0.025, df = n - 1) / (n - 1)
>>>>>  high <- qchisq(p = 0.975, df = n - 1) / (n - 1) plot(n, high,
>>>>> type = "l", ylim = c(0, 5)) lines(n, low) abline(h = 1, lty =
>>>>> 2)
>>>>>
>>>>> Therefore I recommend that you add the site variable to the
>>>>> fixed effects and drop the random effects.
>>>>>
>>>>> A) Centering continuous data will mostly only affect the
>>>>> estimates of the intercept. The intercept is the expected value
>>>>> of your respons when all variables are zero (or at their
>>>>> reference level). So if you have a timeseries ranging from 2000
>>>>> to 2010, then the intercept is the value in the year 0. When
>>>>> you center year to 2000 (year = 2000 --> cyear = 0), then the
>>>>> intercept will be the expected value in the year 2000. The 
>>>>> first is non sense given your time series, the latter has a
>>>>> practical interpretation. Note that both model will be
>>>>> mathematically identical but just use a different
>>>>> parametrisation.
>>>>>
>>>>> B) Given that you have only three levels, neither a LMM nor GEE
>>>>> will be a good model. So comparing them is not a good idea.
>>>>>
>>>>> C) Lower AIC is always better. So -10 is better than -5. AIC =
>>>>> 2 k - 2 log(L) with k = number of parameters, L = likelihood.
>>>>> Models with a high likelihood will have a lower AIC (if the
>>>>> number of parameters are equal).
>>>>>
>>>>> HTH,
>>>>>
>>>>> Thierry
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>  ---- ir. Thierry Onkelinx Instituut voor natuur- en
>>>>> bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500
>>>>> Geraardsbergen Belgium
>>>>>
>>>>> Research Institute for Nature and Forest team Biometrics &
>>>>> Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium
>>>>>
>>>>> tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be
>>>>>
>>>>> To call in the statistician after the experiment is done may be
>>>>> no more than asking him to perform a post-mortem examination:
>>>>> he may be able to say what the experiment died of. ~ Sir Ronald
>>>>> Aylmer Fisher
>>>>>
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>
>>>>> The combination of some data and an aching desire for an answer
>>>>> does not ensure that a reasonable answer can be extracted from
>>>>> a given body of data. ~ John Tukey
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Oorspronkelijk bericht----- Van:
>>>>>> r-sig-ecology-bounces at r-project.org 
>>>>>> [mailto:r-sig-ecology-bounces at r-project.org] Namens Jens
>>>>>> Oldeland Verzonden: vrijdag 16 april 2010 12:50 Aan:
>>>>>> r-sig-ecology at r-project.org Onderwerp: [R-sig-eco] GLS, GEE
>>>>>> or LMM ??
>>>>>>
>>>>>> Dear All,
>>>>>>
>>>>>> I have run into a number of questions, and thus I hope you 
>>>>>> could help me out. I am modelling the effect of oyster 
>>>>>> density and nutrients on the bodyweight of mussels 
>>>>>> (population average). Data was sampled at three different
>>>>>> stations over 8 years, with values measured in springtime
>>>>>> once per year.
>>>>>>
>>>>>> I was following Zuur et al 2009 Mixed Effects Models 
>>>>>> (wonderful book!), but got lost at some points since 
>>>>>> different models lead to totally different results.
>>>>>>
>>>>>> a) the first question is about "centring data". Zuur suggest
>>>>>>  to center parameters (p.334) if they are highly correlated 
>>>>>> with the intercept. When I apply a lme (family=gaussian,
>>>>>> random ~ 1 | bank, correlation = corAR1(form = ~ daycount)) I
>>>>>> have to center nearly all the values. When I apply a GEE then
>>>>>> there is no correlation at all (r=0.14). Actually, centring
>>>>>> the data leads to the same output at the end (for the lme)
>>>>>>
>>>>>> b) Choosing GEE, the effect of one parameter (salinity) is 
>>>>>> highly significant, while using the LMM approach it is not, 
>>>>>> which would be better for our interpretation... But why? Is
>>>>>> it because GEE should not be used on normally distributed
>>>>>> data? I know that GEE uses sandwich estimator and LMM uses
>>>>>> ML. Which one would be more "trustworthy" or conservative?
>>>>>>
>>>>>> c) one last qeustion: negative AICs, which one is better. 
>>>>>> AIC: -10 or -5 ? I have read contrasting statements. Is there
>>>>>>  any proof?? Does it hold for BIC as well?
>>>>>>
>>>>>> thank you in advance! Jens
>>>>>>
>>>>>> -- +++++++++++++++++++++++++++++++++++++++++ Dipl.Biol. Jens
>>>>>> Oldeland Biodiversity of Plants Biocentre Klein Flottbek and
>>>>>> Botanical Garden University of Hamburg Ohnhorststr. 18 22609
>>>>>> Hamburg, Germany
>>>>>>
>>>>>> Tel:    0049-(0)40-42816-407 Fax:    0049-(0)40-42816-543 
>>>>>> Mail: 	Oldeland at botanik.uni-hamburg.de Oldeland at gmx.de 	(for
>>>>>> attachments > 2mb!!) Skype:	jens.oldeland 
>>>>>> http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm 
>>>>>> +++++++++++++++++++++++++++++++++++++++++
>>>>>>
>>>>>> _______________________________________________ R-sig-ecology
>>>>>> mailing list R-sig-ecology at r-project.org 
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Druk dit bericht a.u.b. niet onnodig af. Please do not print
>>>>> this message unnecessarily.
>>>>>
>>>>> Dit bericht en eventuele bijlagen geven enkel de visie van de
>>>>> schrijver weer en binden het INBO onder geen enkel beding,
>>>>> zolang dit bericht niet bevestigd is door een geldig
>>>>> ondertekend document. The views expressed in  this message and
>>>>> any annex are purely those of the writer and may not be
>>>>> regarded as stating an official position of INBO, as long as
>>>>> the message is not confirmed by a duly signed document.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>       
>>     
>
>
>   

-- 
+++++++++++++++++++++++++++++++++++++++++
Dipl.Biol. Jens Oldeland
Biodiversity of Plants
Biocentre Klein Flottbek and Botanical Garden
University of Hamburg 
Ohnhorststr. 18
22609 Hamburg,
Germany

Tel:    0049-(0)40-42816-407
Fax:    0049-(0)40-42816-543
Mail: 	Oldeland at botanik.uni-hamburg.de
        Oldeland at gmx.de 	(for attachments > 2mb!!)
Skype:	jens.oldeland
http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
http://jensoldeland.wordpress.com
+++++++++++++++++++++++++++++++++++++++++