[R-sig-ME] lme4, cloglog vs. binomial link

Tue Jun 5 01:32:07 CEST 2012

On Mon, 4 Jun 2012, Tibor Kiss wrote:

> In the following mixed models, *target_noun_lemma* is the representation 
> of the noun in the construction, its categorical value being one of the 
> 712 different nouns in the sample. The sample contains 6.841 different 
> instances of the construction: 810 instances of determiner omission and 
> 6.031 instances of determiner realization.

>
> The distribution of *target_noun_lemma* is highly skewed (which is 
> standard for language samples): the top five nouns occur 1.225, 466, 
> 443, 414, and 304 times
>
> The model seems to be worse in terms of AIC (2360 compared to 2302),
>
> 1. Is it correct to assume that given a cloglog link, the less frequent 
> response should be considered the success?
> 2. Is it correct to conclude 
> that the changes in the model have led to less influence of the random 
> factor?
> 3. How shall I react to the increase in AIC?
>

I found all this quite dizzying.  I would first look for an optimal link 
function in a fixed effect GLM for a dataset of your top 5 nouns. I don't 
think you can read much into the scale of the random effects estimates 
using different link functions.  The other way of doing these things is 
changing the distribution of the random effects - for a single random 
effect like this there are nonparametric/mixture models (you could 
interpret this as clustering your nouns into families).

Interpretation of the AICs depends on the internals of the loglik 
for the different links.  They should be comparable, in which case 
logit good, cloglog bad.

> I am most curious to learn about possible
> modifications of the model so that an observed random effect can be 
> minimized

You can sometimes get rid of a random effect completely by transformation. 
The examples I know of are for continuous Y and crossed factors (additive 
and dominant genetic variances), where one factor can be removed.

Cheers, David Duffy.