[R-sig-ME] Fixing singularity in a generalized linear mixed effect model

Tue Mar 26 19:55:20 CET 2019

Hi René

To anwer your questions
1. offset is a continuous variable because it's fishing effort, measured in
km of net x day. The experiment has a paired design but within the same set
it could happen that the control net is longer than the experimental net.
So count is the number of individuals captured in the net that will be
later standardised using effort.
2. the offset is in fixed effects because we use the model to predict catch
per unit effort (CPUE) and the predict function will return a CPUE (and not
a count) if the offset is included in the formula:

pred.data <- with(Data, #your data
                  expand.grid(
                    CE=c("C","E"),
                    Effort= 1,
                    Count=0))

mm <- model.matrix(terms(m1),pred.data)

pred.data$Count <- predict(m1,pred.data,re.form=NA,type="response")

I have just tried removing the offset and running glmer(Count ~
CE +  (1|Lance.N) but the model is still singular, with Lance variance = 0
I must be missing something!

Alessandra

On Tue, Mar 26, 2019 at 4:03 PM René <bimonosom using gmail.com> wrote:

> Dear Alessandra,
>
> First, I think this is a related post (considering the separation between
> residual variance from random effect variance, with few/one random effect
> observation per design cell).
>
> https://stackoverflow.com/questions/19713228/lmer-error-grouping-factor-must-be-number-of-observations
>
> Considering your experimental set up:
>
> "each [Lance] consists of a pair of nets: one control net and one
> experimental net" (This must be the 'CE' variable)
>
> So this model alone should indeed work:
> m <- glmer(Count ~ CE +  (1|Lance.N), data =Data, glmerControl(optimizer =
> "bobyqa"), family= "poisson")
>
> but you defined it this way...
> m <- glmer(Count ~ CE + offset(log(Effort))+ (1|Lance.N) data =Data,
> glmerControl(optimizer = "bobyqa"), family= "poisson")
> And there are two odd things:
> First, offset seems to be a continuous variable, and now the above
> post-link definitely matters, as residual variance falls together with
> random effect variance -> not identifiable.
> Second, why is the offset in the fixed effects (I am not aware of this
> kind of application)?
> Usually, I think, the offset in poisson models is defined in the options
> (see ?glm), and also the offset usually is the number of overall
> observations (from which Count is the number of successes), but effort
> seems to be something else.
>
>
> Best, René
>
>
>
>
> Am Di., 26. März 2019 um 15:37 Uhr schrieb Alessandra Bielli <
> bielli.alessandra using gmail.com>:
>
>> Dear René and Thierry
>>
>> Thank you very much for your answer.
>> I have to check about sharing the data.
>>
>> In the mean time, I'll explain a bit more the experiment.
>> Lance means "set" and is the code given to a fishing set.
>> In the experiment, within each fishing trip there are multiple fishing
>> sets (i.e the fishermen soak their nets multiple times, each time on a
>> different day, usually consecutive days) and each fishing set consists of a
>> pair of nets: one control net and one experimental net. This is why I
>> believe I must include "Lance" as a random effect, because it is really
>> important for the design to have a paired experiment.
>> So an example from my dataset would be
>>
>> Trip.Code      Lance.N      Observer.Name  start date      Effort    CE
>>   Turtles.TOT
>> EC062 159       Alexis Lopez 1/7/2015 0.443103     C 0
>> EC062 159       Alexis Lopez 1/7/2015 0.398793     E 0
>> EC062 160       Alexis Lopez 1/8/2015 0.474345     C 0
>> EC062 160       Alexis Lopez 1/8/2015 0.426911      E 0
>> What confuses me is that, even if I leave "Lance" as the only random
>> effect, the model is still singular.
>> Does this happen because the number of levels for Lance is too high
>> compared to the number of observations?
>> Would it be better to have repeated Lance (set number) for each trip,
>> i.e. trip EC062 set 1,2,3 etc then trip EC063 set 1,2,3 etc...?
>>
>> Thanks again for your help!
>>
>> Alessandra
>>
>>
>> On Tue, Mar 26, 2019 at 9:08 AM René <bimonosom using gmail.com> wrote:
>>
>>> Hi Allessandra,
>>>
>>> your model output says:
>>> "Number of obs: 292"
>>> and your model has 2 fixed effects and 5 (!) random effects.
>>>  - If -  all these random effects are fully crossed. Then assuming you
>>> have 19 participants (e.g. 1|observers), and 4 random effects crossed on
>>> them with two levels each (2*2*2*2 = 16 cells), would make about 292
>>> observations.
>>> Now you see this math uncovers the most likely problem: A random effect
>>> (intercept) factor is urgently recommended to have least! 6 levels to make
>>> such a model meaningful).
>>> If all these random effects are not fully crossed, then the model is
>>> misspecified, i.e. defining random intercepts for factor 1 separately from
>>> random intercepts for another factor 2, when factor 1 is nested in factor
>>> 2, is over-identifying the randomness in your model -> singular.
>>>
>>> So,
>>> only define random intercepts for factors with more then 6 levels (move
>>> those factors with less levels to the fixed effects instead to control for
>>> them)
>>> only define separate random intercepts for factors that are crossed; for
>>> instance the factors boat name and lance seem suspicious. I guess, there is
>>> a world in which a 'lance 1' can only be on boat 'atlantis' to be used for
>>> fishing, but not on boat 'moby dick'. In this case, having a random
>>> intercept for "boat name" in addition to 'lance' would not add anything to
>>> the model, since lances would already cover the variance of boats (lances
>>> nested in boats).
>>> Get more observations
>>> Rerun the model
>>> Should be fine :)
>>>
>>> If singularities still occur use Bayesian models or come back here :))
>>>
>>> Best, René
>>>
>>>
>>> Am Di., 26. März 2019 um 09:30 Uhr schrieb Thierry Onkelinx via
>>> R-sig-mixed-models <r-sig-mixed-models using r-project.org>:
>>>
>>>> Dear Alessandra,
>>>>
>>>> Your problem is hard to diagnose without the data. Can you make the data
>>>> available? Does the combination of factors lead to unique observations?
>>>> Or
>>>> do some combinations have only zero's?
>>>>
>>>> Best regards,
>>>>
>>>> ir. Thierry Onkelinx
>>>> Statisticus / Statistician
>>>>
>>>> Vlaamse Overheid / Government of Flanders
>>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>>> AND
>>>> FOREST
>>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>>> thierry.onkelinx using inbo.be
>>>> Havenlaan 88 bus 73, 1000 Brussel
>>>> www.inbo.be
>>>>
>>>>
>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>> To call in the statistician after the experiment is done may be no more
>>>> than asking him to perform a post-mortem examination: he may be able to
>>>> say
>>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>> The combination of some data and an aching desire for an answer does not
>>>> ensure that a reasonable answer can be extracted from a given body of
>>>> data.
>>>> ~ John Tukey
>>>>
>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>
>>>> <https://www.inbo.be>
>>>>
>>>>
>>>> Op wo 20 mrt. 2019 om 23:19 schreef Alessandra Bielli <
>>>> bielli.alessandra using gmail.com>:
>>>>
>>>> > Dear List
>>>> >
>>>> > I am fitting this model using the lme4 package, in order to obtain
>>>> catch
>>>> > estimates using the predict function
>>>> >
>>>> > m1 <- glmer(Count ~ CE + offset(log(Effort)) + (1|SetYear)
>>>> +(1|Season) +
>>>> >                   (1|Lance.N) + (1|Boat.Name) + (1|Observer.Name),
>>>> data =
>>>> > Data,                     glmerControl(optimizer = "bobyqa"), family=
>>>> > "poisson")
>>>> >
>>>> >
>>>> > where: CE is a categorical (control or treatment), Effort is numerical
>>>> > (fishing effort), and all the other variables are random effects.
>>>> >
>>>> > *My problem is that I get a warning message saying that the model is
>>>> > singular*
>>>> >
>>>> > *>summary(m1)*
>>>> >
>>>> > Generalized linear mixed model fit by maximum likelihood (Laplace
>>>> > Approximation) [glmerMod]
>>>> >  Family: poisson  ( log )
>>>> > Formula: Count ~ CE + offset(log(Effort)) + (1 | SetYear) + (1 |
>>>> >     Season) + (1 | Lance.N) + (1 | Boat.Name) + (1 | Observer.Name)
>>>> >    Data: Data
>>>> > Control: glmerControl(optimizer = "bobyqa")
>>>> >
>>>> >      AIC      BIC   logLik deviance df.resid
>>>> >    148.6    174.3    -67.3    134.6      285
>>>> >
>>>> > Scaled residuals:
>>>> >     Min      1Q  Median      3Q     Max
>>>> > -0.4852 -0.1758 -0.1339 -0.1227  3.5980
>>>> >
>>>> > Random effects:
>>>> >  Groups        Name        Variance  Std.Dev.
>>>> >  Lance.N       (Intercept) 2.259e+00 1.503e+00
>>>> >  Boat.Name     (Intercept) 0.000e+00 0.000e+00
>>>> >  Observer.Name (Intercept) 0.000e+00 0.000e+00
>>>> >  Season        (Intercept) 4.149e-17 6.442e-09
>>>> >  SetYear       (Intercept) 0.000e+00 0.000e+00
>>>> > Number of obs: 292, groups:
>>>> > Lance.N, 146; Boat.Name, 21; Observer.Name, 5; Season, 4; SetYear, 4
>>>> >
>>>> > Fixed effects:
>>>> >             Estimate Std. Error z value Pr(>|z|)
>>>> > (Intercept)  -2.5751     0.6612  -3.895 9.83e-05 ***
>>>> > CEE          -0.5878     0.5003  -1.175     0.24
>>>> > ---
>>>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>> >
>>>> > Correlation of Fixed Effects:
>>>> >     (Intr)
>>>> > CEE -0.257
>>>> > *convergence code: 0*
>>>> > *singular fit*
>>>> >
>>>> > I am aware that there are a lot of random effects and some of them
>>>> have a
>>>> > number of levels <5. However, this study was carried out under real
>>>> fishery
>>>> > conditions, so these random effects seemed all important to me.
>>>> >
>>>> > I removed the random effects with variance zero as suggested here
>>>> >
>>>> >
>>>> https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1
>>>> > until I removed them all and found myself with a glm instead.
>>>> >
>>>> > My questions are
>>>> >
>>>> > - why the variance of Lance.N, initially positive, becomes zero after
>>>> I
>>>> > remove the other random effects that had variance equal zero?
>>>> > - is it acceptable to fit a glm just because all the random effect
>>>> > variances were zero?
>>>> >
>>>> > I hope I gave all the information you need.
>>>> >
>>>> > Thanks for any advice!
>>>> >
>>>> > Alessandra
>>>> >
>>>> >         [[alternative HTML version deleted]]
>>>> >
>>>> > _______________________________________________
>>>> > R-sig-mixed-models using r-project.org mailing list
>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>> >
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>

	[[alternative HTML version deleted]]