[R-sig-ME] AIC Comparison for MLM with Different Distributions

Thu Mar 5 10:01:18 CET 2020

Dear Kate,

The difference between models 1 & 2 and 3 & 4 is the log transformation.
That is IMHO a design issue. You get additive effects with an identity link
(Y = \beta_0  + \beta_1 X) and multiplicative effects with the log link (Y
= e ^ \beta_0 e ^ (\beta_1 X)). Use domain knowledge to make and motivate
that choice.

Gaussian or gamma? You could start with Gaussian and check the assumptions.
If they all hold you can stick with Gaussian. If they don't hold and you
get indications that gamma might be better, then try gamma and check its
assumptions.

The same holds for Poisson or negative binomial. Start with Poisson, check
assumptions. If they hold, you are good to go. If they don't, think about
what would improve the model (negative binomial, zero-inflation, missing
variables, missing correlation structure, ...)

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx using inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////

<https://www.inbo.be>

Op wo 4 mrt. 2020 om 17:39 schreef Kate R <kr.gitcode using gmail.com>:

> Hi Thierry,
>
> Thank you for your response!
>
> We are running different models - some have ordered factors as the
> response variable and others have continuous or count data as the
> response variable, and so I would still be curious to learn how to
> compare the AIC for models 1-4.
>
> One post suggested that in order to compare normal with log-normal, you
> would transform the AIC for the log-normal model with the following code: AIC
> + 2*sum(log(anxious)). I am still unsure how to compare the lmer/normal
> models with the glmer/gamma models, as well as between glmer/gamma models
> with different link functions.
>
> For the ordered factor, I'd prefer to use the clmm for this, but it's
> unfortunately common practice in the journals we publish in to use
> continuous models (for ease of interpretation and convention), and so I'd
> like to be able to show that the model fit is best with the clmm. In Burnham
> & Anderson's book, they compare continuous models with count models, so I
> hope it's possible to compare continuous with ordinal?
>
> For the models with count data (frequency of use) as the response
> variable, I suppose that we might also want to be able to compare poisson
> and negative binomial distributions...
>
> Overall, I'd like to learn how to compare models with different
> distributions and/or links for my general knowledge and future use with
> different research questions.
>
> Many thanks again for your help!
> Katie
>
> On Wed, Mar 4, 2020 at 6:25 AM Thierry Onkelinx <thierry.onkelinx using inbo.be>
> wrote:
>
>> Dear Kate,
>>
>> If your response variable is an ordered factor, then use the clmm model
>> as that is one with the most appropriate distribution. All other models are
>> workarounds. Hence the AIC comparison is not relevant.
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Statisticus / Statistician
>>
>> Vlaamse Overheid / Government of Flanders
>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>> AND FOREST
>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>> thierry.onkelinx using inbo.be
>> Havenlaan 88 bus 73, 1000 Brussel
>> www.inbo.be
>>
>>
>> ///////////////////////////////////////////////////////////////////////////////////////////
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>> ///////////////////////////////////////////////////////////////////////////////////////////
>>
>> <https://www.inbo.be>
>>
>>
>> Op di 3 mrt. 2020 om 23:30 schreef Kate R <kr.gitcode using gmail.com>:
>>
>>> Hi all,
>>>
>>> Thank you in advance for your time and consideration! I am a
>>> non-mathematically-inclined graduate student in communication just
>>> learning
>>> multilevel modeling.
>>>
>>> I am trying to compare the AIC for 5 different models:
>>>
>>>
>>>    1. model.mn5 <- lmer(anxious ~ num.cm + num.pmc + (1|userid), data =
>>> df,
>>>    REML = F)
>>>    2. model.mn5.log <- lmer(log(anxious) ~ num.cm + num.pmc +
>>> (1|userid),
>>>    data = df, REML = F)
>>>    3. model.mn5.gamma.log <- glmer(anxious ~ num.cm + num.pmc +
>>> (1|userid),
>>>    data = df, family = Gamma(link="log"))
>>>    4. model.mn5.gamma.id <- glmer(anxious ~ num.cm + num.pmc +
>>> (1|userid),
>>>    data = df, family = Gamma(link="identity"))
>>>    5. model.ord5 <- clmm(anxious ~ num.cm + num.pmc + (1|userid), data =
>>>    df, na.action = na.omit)
>>>
>>> (num.cm is the group mean and num.pmc is the group-mean-centered score
>>> of
>>> the predictor)
>>>
>>> Despite many posts on various help forums, I understand that it's
>>> possible
>>> to compare non-nested models with different distributions as long as all
>>> terms, including constants, are retained (i.e. see Burnham & Anderson, Ch
>>> 6.7 <https://www.springer.com/gp/book/9780387953649>), but that
>>> different R
>>> packages or model classes might handle constants differently or use
>>> different algorithms (see point 7 <
>>> https://robjhyndman.com/hyndsight/aic/>),
>>> thus making it difficult to directly compare AIC values. To avoid
>>> this non-comparability pitfall, it was suggested in one post to calculate
>>> your own log-likelihood (though I'm having trouble finding this post
>>> again).
>>>
>>> Please could you help with the following:
>>>
>>>    - What is the best practice for comparing the AICs for these 5 models?
>>>    - What is the R-code for manually calculating the log-likelihood
>>> and/or
>>>    the AIC to retain all terms, including constants?
>>>    - Can you compare ordinal models (clmm) with the continuous models?
>>>    - Do you recommend any other methods and/or packages for comparing
>>>    models with different distributions and/or links?
>>>
>>> Many thanks in advance for your time and consideration! I greatly
>>> appreciate any suggestions.
>>>
>>> Kind regards,
>>> K
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>

	[[alternative HTML version deleted]]