[R-sig-ME] Fixing singularity in a generalized linear mixed effect model
René
b|mono@om @end|ng |rom gm@||@com
Tue Mar 26 17:02:58 CET 2019
Dear Alessandra,
First, I think this is a related post (considering the separation between
residual variance from random effect variance, with few/one random effect
observation per design cell).
https://stackoverflow.com/questions/19713228/lmer-error-grouping-factor-must-be-number-of-observations
Considering your experimental set up:
"each [Lance] consists of a pair of nets: one control net and one
experimental net" (This must be the 'CE' variable)
So this model alone should indeed work:
m <- glmer(Count ~ CE + (1|Lance.N), data =Data, glmerControl(optimizer =
"bobyqa"), family= "poisson")
but you defined it this way...
m <- glmer(Count ~ CE + offset(log(Effort))+ (1|Lance.N) data =Data,
glmerControl(optimizer = "bobyqa"), family= "poisson")
And there are two odd things:
First, offset seems to be a continuous variable, and now the above
post-link definitely matters, as residual variance falls together with
random effect variance -> not identifiable.
Second, why is the offset in the fixed effects (I am not aware of this kind
of application)?
Usually, I think, the offset in poisson models is defined in the options
(see ?glm), and also the offset usually is the number of overall
observations (from which Count is the number of successes), but effort
seems to be something else.
Best, René
Am Di., 26. März 2019 um 15:37 Uhr schrieb Alessandra Bielli <
bielli.alessandra using gmail.com>:
> Dear René and Thierry
>
> Thank you very much for your answer.
> I have to check about sharing the data.
>
> In the mean time, I'll explain a bit more the experiment.
> Lance means "set" and is the code given to a fishing set.
> In the experiment, within each fishing trip there are multiple fishing
> sets (i.e the fishermen soak their nets multiple times, each time on a
> different day, usually consecutive days) and each fishing set consists of a
> pair of nets: one control net and one experimental net. This is why I
> believe I must include "Lance" as a random effect, because it is really
> important for the design to have a paired experiment.
> So an example from my dataset would be
>
> Trip.Code Lance.N Observer.Name start date Effort CE
> Turtles.TOT
> EC062 159 Alexis Lopez 1/7/2015 0.443103 C 0
> EC062 159 Alexis Lopez 1/7/2015 0.398793 E 0
> EC062 160 Alexis Lopez 1/8/2015 0.474345 C 0
> EC062 160 Alexis Lopez 1/8/2015 0.426911 E 0
> What confuses me is that, even if I leave "Lance" as the only random
> effect, the model is still singular.
> Does this happen because the number of levels for Lance is too high
> compared to the number of observations?
> Would it be better to have repeated Lance (set number) for each trip, i.e.
> trip EC062 set 1,2,3 etc then trip EC063 set 1,2,3 etc...?
>
> Thanks again for your help!
>
> Alessandra
>
>
> On Tue, Mar 26, 2019 at 9:08 AM René <bimonosom using gmail.com> wrote:
>
>> Hi Allessandra,
>>
>> your model output says:
>> "Number of obs: 292"
>> and your model has 2 fixed effects and 5 (!) random effects.
>> - If - all these random effects are fully crossed. Then assuming you
>> have 19 participants (e.g. 1|observers), and 4 random effects crossed on
>> them with two levels each (2*2*2*2 = 16 cells), would make about 292
>> observations.
>> Now you see this math uncovers the most likely problem: A random effect
>> (intercept) factor is urgently recommended to have least! 6 levels to make
>> such a model meaningful).
>> If all these random effects are not fully crossed, then the model is
>> misspecified, i.e. defining random intercepts for factor 1 separately from
>> random intercepts for another factor 2, when factor 1 is nested in factor
>> 2, is over-identifying the randomness in your model -> singular.
>>
>> So,
>> only define random intercepts for factors with more then 6 levels (move
>> those factors with less levels to the fixed effects instead to control for
>> them)
>> only define separate random intercepts for factors that are crossed; for
>> instance the factors boat name and lance seem suspicious. I guess, there is
>> a world in which a 'lance 1' can only be on boat 'atlantis' to be used for
>> fishing, but not on boat 'moby dick'. In this case, having a random
>> intercept for "boat name" in addition to 'lance' would not add anything to
>> the model, since lances would already cover the variance of boats (lances
>> nested in boats).
>> Get more observations
>> Rerun the model
>> Should be fine :)
>>
>> If singularities still occur use Bayesian models or come back here :))
>>
>> Best, René
>>
>>
>> Am Di., 26. März 2019 um 09:30 Uhr schrieb Thierry Onkelinx via
>> R-sig-mixed-models <r-sig-mixed-models using r-project.org>:
>>
>>> Dear Alessandra,
>>>
>>> Your problem is hard to diagnose without the data. Can you make the data
>>> available? Does the combination of factors lead to unique observations?
>>> Or
>>> do some combinations have only zero's?
>>>
>>> Best regards,
>>>
>>> ir. Thierry Onkelinx
>>> Statisticus / Statistician
>>>
>>> Vlaamse Overheid / Government of Flanders
>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>> AND
>>> FOREST
>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>> thierry.onkelinx using inbo.be
>>> Havenlaan 88 bus 73, 1000 Brussel
>>> www.inbo.be
>>>
>>>
>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>> To call in the statistician after the experiment is done may be no more
>>> than asking him to perform a post-mortem examination: he may be able to
>>> say
>>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>> The plural of anecdote is not data. ~ Roger Brinner
>>> The combination of some data and an aching desire for an answer does not
>>> ensure that a reasonable answer can be extracted from a given body of
>>> data.
>>> ~ John Tukey
>>>
>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>
>>> <https://www.inbo.be>
>>>
>>>
>>> Op wo 20 mrt. 2019 om 23:19 schreef Alessandra Bielli <
>>> bielli.alessandra using gmail.com>:
>>>
>>> > Dear List
>>> >
>>> > I am fitting this model using the lme4 package, in order to obtain
>>> catch
>>> > estimates using the predict function
>>> >
>>> > m1 <- glmer(Count ~ CE + offset(log(Effort)) + (1|SetYear) +(1|Season)
>>> +
>>> > (1|Lance.N) + (1|Boat.Name) + (1|Observer.Name),
>>> data =
>>> > Data, glmerControl(optimizer = "bobyqa"), family=
>>> > "poisson")
>>> >
>>> >
>>> > where: CE is a categorical (control or treatment), Effort is numerical
>>> > (fishing effort), and all the other variables are random effects.
>>> >
>>> > *My problem is that I get a warning message saying that the model is
>>> > singular*
>>> >
>>> > *>summary(m1)*
>>> >
>>> > Generalized linear mixed model fit by maximum likelihood (Laplace
>>> > Approximation) [glmerMod]
>>> > Family: poisson ( log )
>>> > Formula: Count ~ CE + offset(log(Effort)) + (1 | SetYear) + (1 |
>>> > Season) + (1 | Lance.N) + (1 | Boat.Name) + (1 | Observer.Name)
>>> > Data: Data
>>> > Control: glmerControl(optimizer = "bobyqa")
>>> >
>>> > AIC BIC logLik deviance df.resid
>>> > 148.6 174.3 -67.3 134.6 285
>>> >
>>> > Scaled residuals:
>>> > Min 1Q Median 3Q Max
>>> > -0.4852 -0.1758 -0.1339 -0.1227 3.5980
>>> >
>>> > Random effects:
>>> > Groups Name Variance Std.Dev.
>>> > Lance.N (Intercept) 2.259e+00 1.503e+00
>>> > Boat.Name (Intercept) 0.000e+00 0.000e+00
>>> > Observer.Name (Intercept) 0.000e+00 0.000e+00
>>> > Season (Intercept) 4.149e-17 6.442e-09
>>> > SetYear (Intercept) 0.000e+00 0.000e+00
>>> > Number of obs: 292, groups:
>>> > Lance.N, 146; Boat.Name, 21; Observer.Name, 5; Season, 4; SetYear, 4
>>> >
>>> > Fixed effects:
>>> > Estimate Std. Error z value Pr(>|z|)
>>> > (Intercept) -2.5751 0.6612 -3.895 9.83e-05 ***
>>> > CEE -0.5878 0.5003 -1.175 0.24
>>> > ---
>>> > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>> >
>>> > Correlation of Fixed Effects:
>>> > (Intr)
>>> > CEE -0.257
>>> > *convergence code: 0*
>>> > *singular fit*
>>> >
>>> > I am aware that there are a lot of random effects and some of them
>>> have a
>>> > number of levels <5. However, this study was carried out under real
>>> fishery
>>> > conditions, so these random effects seemed all important to me.
>>> >
>>> > I removed the random effects with variance zero as suggested here
>>> >
>>> >
>>> https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1
>>> > until I removed them all and found myself with a glm instead.
>>> >
>>> > My questions are
>>> >
>>> > - why the variance of Lance.N, initially positive, becomes zero after I
>>> > remove the other random effects that had variance equal zero?
>>> > - is it acceptable to fit a glm just because all the random effect
>>> > variances were zero?
>>> >
>>> > I hope I gave all the information you need.
>>> >
>>> > Thanks for any advice!
>>> >
>>> > Alessandra
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > R-sig-mixed-models using r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list