[R-sig-ME] Fixing singularity in a generalized linear mixed effect model

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Tue Mar 26 21:23:00 CET 2019


  A quick comment to René: specifying offset as part of the
fixed-effects formula, rather than as a separate argument, is fairly
standard (see the "offset" description in ?glm). As a developer I
actually slightly prefer it -- it makes it a little easier to keep track
of the offset information.

  cheers
    Ben Bolker


On 2019-03-26 2:55 p.m., Alessandra Bielli wrote:
> Hi René
> 
> To anwer your questions
> 1. offset is a continuous variable because it's fishing effort, measured in
> km of net x day. The experiment has a paired design but within the same set
> it could happen that the control net is longer than the experimental net.
> So count is the number of individuals captured in the net that will be
> later standardised using effort.
> 2. the offset is in fixed effects because we use the model to predict catch
> per unit effort (CPUE) and the predict function will return a CPUE (and not
> a count) if the offset is included in the formula:
> 
> pred.data <- with(Data, #your data
>                   expand.grid(
>                     CE=c("C","E"),
>                     Effort= 1,
>                     Count=0))
> 
> mm <- model.matrix(terms(m1),pred.data)
> 
> pred.data$Count <- predict(m1,pred.data,re.form=NA,type="response")
> 
> I have just tried removing the offset and running glmer(Count ~
> CE +  (1|Lance.N) but the model is still singular, with Lance variance = 0
> I must be missing something!
> 
> Alessandra
> 
> 
> 
> On Tue, Mar 26, 2019 at 4:03 PM René <bimonosom using gmail.com> wrote:
> 
>> Dear Alessandra,
>>
>> First, I think this is a related post (considering the separation between
>> residual variance from random effect variance, with few/one random effect
>> observation per design cell).
>>
>> https://stackoverflow.com/questions/19713228/lmer-error-grouping-factor-must-be-number-of-observations
>>
>> Considering your experimental set up:
>>
>> "each [Lance] consists of a pair of nets: one control net and one
>> experimental net" (This must be the 'CE' variable)
>>
>> So this model alone should indeed work:
>> m <- glmer(Count ~ CE +  (1|Lance.N), data =Data, glmerControl(optimizer =
>> "bobyqa"), family= "poisson")
>>
>> but you defined it this way...
>> m <- glmer(Count ~ CE + offset(log(Effort))+ (1|Lance.N) data =Data,
>> glmerControl(optimizer = "bobyqa"), family= "poisson")
>> And there are two odd things:
>> First, offset seems to be a continuous variable, and now the above
>> post-link definitely matters, as residual variance falls together with
>> random effect variance -> not identifiable.
>> Second, why is the offset in the fixed effects (I am not aware of this
>> kind of application)?
>> Usually, I think, the offset in poisson models is defined in the options
>> (see ?glm), and also the offset usually is the number of overall
>> observations (from which Count is the number of successes), but effort
>> seems to be something else.
>>
>>
>> Best, René
>>
>>
>>
>>
>> Am Di., 26. März 2019 um 15:37 Uhr schrieb Alessandra Bielli <
>> bielli.alessandra using gmail.com>:
>>
>>> Dear René and Thierry
>>>
>>> Thank you very much for your answer.
>>> I have to check about sharing the data.
>>>
>>> In the mean time, I'll explain a bit more the experiment.
>>> Lance means "set" and is the code given to a fishing set.
>>> In the experiment, within each fishing trip there are multiple fishing
>>> sets (i.e the fishermen soak their nets multiple times, each time on a
>>> different day, usually consecutive days) and each fishing set consists of a
>>> pair of nets: one control net and one experimental net. This is why I
>>> believe I must include "Lance" as a random effect, because it is really
>>> important for the design to have a paired experiment.
>>> So an example from my dataset would be
>>>
>>> Trip.Code      Lance.N      Observer.Name  start date      Effort    CE
>>>   Turtles.TOT
>>> EC062 159       Alexis Lopez 1/7/2015 0.443103     C 0
>>> EC062 159       Alexis Lopez 1/7/2015 0.398793     E 0
>>> EC062 160       Alexis Lopez 1/8/2015 0.474345     C 0
>>> EC062 160       Alexis Lopez 1/8/2015 0.426911      E 0
>>> What confuses me is that, even if I leave "Lance" as the only random
>>> effect, the model is still singular.
>>> Does this happen because the number of levels for Lance is too high
>>> compared to the number of observations?
>>> Would it be better to have repeated Lance (set number) for each trip,
>>> i.e. trip EC062 set 1,2,3 etc then trip EC063 set 1,2,3 etc...?
>>>
>>> Thanks again for your help!
>>>
>>> Alessandra
>>>
>>>
>>> On Tue, Mar 26, 2019 at 9:08 AM René <bimonosom using gmail.com> wrote:
>>>
>>>> Hi Allessandra,
>>>>
>>>> your model output says:
>>>> "Number of obs: 292"
>>>> and your model has 2 fixed effects and 5 (!) random effects.
>>>>  - If -  all these random effects are fully crossed. Then assuming you
>>>> have 19 participants (e.g. 1|observers), and 4 random effects crossed on
>>>> them with two levels each (2*2*2*2 = 16 cells), would make about 292
>>>> observations.
>>>> Now you see this math uncovers the most likely problem: A random effect
>>>> (intercept) factor is urgently recommended to have least! 6 levels to make
>>>> such a model meaningful).
>>>> If all these random effects are not fully crossed, then the model is
>>>> misspecified, i.e. defining random intercepts for factor 1 separately from
>>>> random intercepts for another factor 2, when factor 1 is nested in factor
>>>> 2, is over-identifying the randomness in your model -> singular.
>>>>
>>>> So,
>>>> only define random intercepts for factors with more then 6 levels (move
>>>> those factors with less levels to the fixed effects instead to control for
>>>> them)
>>>> only define separate random intercepts for factors that are crossed; for
>>>> instance the factors boat name and lance seem suspicious. I guess, there is
>>>> a world in which a 'lance 1' can only be on boat 'atlantis' to be used for
>>>> fishing, but not on boat 'moby dick'. In this case, having a random
>>>> intercept for "boat name" in addition to 'lance' would not add anything to
>>>> the model, since lances would already cover the variance of boats (lances
>>>> nested in boats).
>>>> Get more observations
>>>> Rerun the model
>>>> Should be fine :)
>>>>
>>>> If singularities still occur use Bayesian models or come back here :))
>>>>
>>>> Best, René
>>>>
>>>>
>>>> Am Di., 26. März 2019 um 09:30 Uhr schrieb Thierry Onkelinx via
>>>> R-sig-mixed-models <r-sig-mixed-models using r-project.org>:
>>>>
>>>>> Dear Alessandra,
>>>>>
>>>>> Your problem is hard to diagnose without the data. Can you make the data
>>>>> available? Does the combination of factors lead to unique observations?
>>>>> Or
>>>>> do some combinations have only zero's?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> ir. Thierry Onkelinx
>>>>> Statisticus / Statistician
>>>>>
>>>>> Vlaamse Overheid / Government of Flanders
>>>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>>>> AND
>>>>> FOREST
>>>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>>>> thierry.onkelinx using inbo.be
>>>>> Havenlaan 88 bus 73, 1000 Brussel
>>>>> www.inbo.be
>>>>>
>>>>>
>>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>> To call in the statistician after the experiment is done may be no more
>>>>> than asking him to perform a post-mortem examination: he may be able to
>>>>> say
>>>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>> The combination of some data and an aching desire for an answer does not
>>>>> ensure that a reasonable answer can be extracted from a given body of
>>>>> data.
>>>>> ~ John Tukey
>>>>>
>>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>>
>>>>> <https://www.inbo.be>
>>>>>
>>>>>
>>>>> Op wo 20 mrt. 2019 om 23:19 schreef Alessandra Bielli <
>>>>> bielli.alessandra using gmail.com>:
>>>>>
>>>>>> Dear List
>>>>>>
>>>>>> I am fitting this model using the lme4 package, in order to obtain
>>>>> catch
>>>>>> estimates using the predict function
>>>>>>
>>>>>> m1 <- glmer(Count ~ CE + offset(log(Effort)) + (1|SetYear)
>>>>> +(1|Season) +
>>>>>>                   (1|Lance.N) + (1|Boat.Name) + (1|Observer.Name),
>>>>> data =
>>>>>> Data,                     glmerControl(optimizer = "bobyqa"), family=
>>>>>> "poisson")
>>>>>>
>>>>>>
>>>>>> where: CE is a categorical (control or treatment), Effort is numerical
>>>>>> (fishing effort), and all the other variables are random effects.
>>>>>>
>>>>>> *My problem is that I get a warning message saying that the model is
>>>>>> singular*
>>>>>>
>>>>>> *>summary(m1)*
>>>>>>
>>>>>> Generalized linear mixed model fit by maximum likelihood (Laplace
>>>>>> Approximation) [glmerMod]
>>>>>>  Family: poisson  ( log )
>>>>>> Formula: Count ~ CE + offset(log(Effort)) + (1 | SetYear) + (1 |
>>>>>>     Season) + (1 | Lance.N) + (1 | Boat.Name) + (1 | Observer.Name)
>>>>>>    Data: Data
>>>>>> Control: glmerControl(optimizer = "bobyqa")
>>>>>>
>>>>>>      AIC      BIC   logLik deviance df.resid
>>>>>>    148.6    174.3    -67.3    134.6      285
>>>>>>
>>>>>> Scaled residuals:
>>>>>>     Min      1Q  Median      3Q     Max
>>>>>> -0.4852 -0.1758 -0.1339 -0.1227  3.5980
>>>>>>
>>>>>> Random effects:
>>>>>>  Groups        Name        Variance  Std.Dev.
>>>>>>  Lance.N       (Intercept) 2.259e+00 1.503e+00
>>>>>>  Boat.Name     (Intercept) 0.000e+00 0.000e+00
>>>>>>  Observer.Name (Intercept) 0.000e+00 0.000e+00
>>>>>>  Season        (Intercept) 4.149e-17 6.442e-09
>>>>>>  SetYear       (Intercept) 0.000e+00 0.000e+00
>>>>>> Number of obs: 292, groups:
>>>>>> Lance.N, 146; Boat.Name, 21; Observer.Name, 5; Season, 4; SetYear, 4
>>>>>>
>>>>>> Fixed effects:
>>>>>>             Estimate Std. Error z value Pr(>|z|)
>>>>>> (Intercept)  -2.5751     0.6612  -3.895 9.83e-05 ***
>>>>>> CEE          -0.5878     0.5003  -1.175     0.24
>>>>>> ---
>>>>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>>>>
>>>>>> Correlation of Fixed Effects:
>>>>>>     (Intr)
>>>>>> CEE -0.257
>>>>>> *convergence code: 0*
>>>>>> *singular fit*
>>>>>>
>>>>>> I am aware that there are a lot of random effects and some of them
>>>>> have a
>>>>>> number of levels <5. However, this study was carried out under real
>>>>> fishery
>>>>>> conditions, so these random effects seemed all important to me.
>>>>>>
>>>>>> I removed the random effects with variance zero as suggested here
>>>>>>
>>>>>>
>>>>> https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1
>>>>>> until I removed them all and found myself with a glm instead.
>>>>>>
>>>>>> My questions are
>>>>>>
>>>>>> - why the variance of Lance.N, initially positive, becomes zero after
>>>>> I
>>>>>> remove the other random effects that had variance equal zero?
>>>>>> - is it acceptable to fit a glm just because all the random effect
>>>>>> variances were zero?
>>>>>>
>>>>>> I hope I gave all the information you need.
>>>>>>
>>>>>> Thanks for any advice!
>>>>>>
>>>>>> Alessandra
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>>
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>>
>>>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>



More information about the R-sig-mixed-models mailing list