[R-sig-ME] Estimation of variance components in random- and mixed-effects models
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Sun Jul 4 02:49:56 CEST 2021
Very briefly (and tentatively: someone should feel free to correct
me if I got something wrong).
Analysis of variance doesn't usually try to *estimate* the difference
variance components, it just computes the mean sums of squares
associated with different factors/levels, e.g.
> summary(aov(angle ~ recipe/temperature/replicate, data = cake))
Df Sum Sq Mean Sq
recipe 2 135 67.54
recipe:temperature 15 2306 153.75
recipe:temperature:replicate 252 15702 62.31
These provide a framework for *testing the significance* of different
components. By avoiding explicit estimation, this avoids a lot of the
problems of instability of estimates. You will still generally have low
power for a significance test against the null hypothesis that a
particular variance component is zero ...
In the variance decomposition/partitioning literature, *negative*
variance components are the analog of the "what to do about singular
fits?" conversation from mixed models. There is a big literature on neg
https://scholar.google.com/scholar?q="negative+variance+components"
On 7/3/21 10:59 AM, kalman.toth wrote:
> Sorry for butting into this conversation but there is something I cannot get my head around. I often encounter situation where B grouping variable is nested into A grouping variable. B has around 10-20 levels and A only has around 3-5. Conceptually, everything is a random effect. If my understanding is correct in this case I should use the following lmer model due to the limited number of levels of B:
> R ~ A+(1|B)
> Even though, scientifically R ~ (1|A/B) would make more sense.
> Variance of B of the first term is not equal to variance of A:B in the second.
>
> I am most interested in a reasonable estimation on the variance of B while we know that it is nested in A.
>
> In general, we use the classical ANOVA to do that: aov(R~A/B). This gives the same variance estimate for A:B interaction as lmer(R ~ (1|A/B)) if it the design is relatively well balanced. In a way I am not surprised because the Cross Validated post suggested by Ben Bolker also mentions that in this case mixed effect model behaves similarly to classical ones. But does it mean that this ANOVA variance estimate is also biased/unreliable? Would the variance estimation of B from the first lmer model really be a better estimation?
>
> Best Regards,
> Kalman Toth
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Tuesday, June 29th, 2021 at 2:12 AM, Ben Bolker <bbolker using gmail.com> wrote:
>
>> A couple of quick responses.
>>
>> - I don't recommend dropping non-significant predictors, this is a
>>
>> good way to overfit models.
>> - Are the temperatures for your three cohorts in a strictly linear
>>
>> sequence? i.e., temperature (cohort 1) = T1, temp (2) = T1 + delta, temp
>>
>> (3) = T1 + 2*delta ? In that case, the two effects are indeed
>>
>> identical/confounded. In principle, your original model (using cohort
>>
>> as a random effect and temperature as fixed) is the right way to handle
>>
>> this, but for the size of data set you can't really identify
>>
>> among-cohort variation beyond the effect of temperature.
>>
>> A nice way to handle this is to treat cohort as an ordered
>>
>> categorical fixed effect (see ?ordered), and leave out temperature (this
>>
>> is assuming that the temperatures are as suggested above). If you do
>>
>> this (i.e. convert cohort to 'ordered' type), R will fit two parameters,
>>
>> one labeled .L and the other labeled .Q, which together explain all of
>>
>> the among-cohort variation; if you like (although it is quite a big
>>
>> assumption, and you must be explicit about it), you can ascribe the
>>
>> linear (".L") variation to temperature and the other (".Q" or quadratic)
>>
>> to non-temperature effects. However, given your experimental design,
>>
>> the following two explanations would be equally well supported:
>> - none of the between-cohort variation is due to temperature;
>> - temperature has a quadratic effect, so all of the between-cohort
>>
>> variation is due to temperature.
>>
>> cheers
>>
>> Ben Bolker
>>
>> On 6/28/21 3:20 PM, Amy Huang wrote:
>>
>>> Thank you very much for your responses and references. Sorry that I missed
>>>
>>> mentioning a lot of information.
>>>
>>> I am using lme4, and the fixed predictors are all numeric. Only having 3
>>>
>>> levels of cohorts is indeed the major issue. After removing insignificant
>>>
>>> predictors in the 2nd model, the only factor left is temperature: offspring
>>>
>>> trait ~ temperature + (1 | cohort/female/clutch), which gives the
>>>
>>> convergence warning.
>>>
>>> Now I treat cohort as a fixed effect, but when I include both cohort and
>>>
>>> temperature as fixed effects (in the 2nd model), a warning appears:
>>>
>>> "fixed-effect model matrix is rank deficient so dropping 1 column /
>>>
>>> coefficient". When I remove cohort (2nd model), the two models become very
>>>
>>> similar and give similar results.
>>>
>>> offspring trait ~ cohort + (1 | female/clutch)
>>>
>>> offspring trait ~ temperature + (1 | female/clutch)
>>>
>>> But am I not introducing pseudoreplicates if I do not include cohort as a
>>>
>>> factor?
>>>
>>> PS. The section "How do I compute a coefficient of determination (R2), or
>>>
>>> an analogue, for (G)LMMs?" in the GLMM FAQ also gives me some insight.
>>>
>>> However, the links provided there seem to be not working.
>>>
>>> Best regards,
>>>
>>> Amy Huang
>>>
>>> Am Mo., 28. Juni 2021 um 19:32 Uhr schrieb Ben Bolker bbolker using gmail.com:
>>>
>>>> See also:
>>>>
>>>>
>>>> https://stats.stackexchange.com/questions/37647/what-is-the-minimum-recommended-number-of-groups-for-a-random-effects-factor
>>>>
>>>> https://www.biorxiv.org/content/10.1101/2021.05.03.442487v2
>>>>
>>>> (I should these links, and the blog post link, to the GLMM FAQ ...)
>>>>
>>>> On 6/28/21 1:17 PM, Thierry Onkelinx wrote:
>>>>
>>>>> Another issue is that you have too few levels to fit "cohort" as a
>>>>>
>>>>> random effect. I wrote a blogpost on this a few years ago:
>>>>>
>>>>> https://www.muscardinus.be/2018/09/number-random-effect-levels/
>>>>>
>>>>> https://www.muscardinus.be/2018/09/number-random-effect-levels/
>>>>>
>>>>> Best regards,
>>>>>
>>>>> ir. Thierry Onkelinx
>>>>>
>>>>> Statisticus / Statistician
>>>>>
>>>>> Vlaamse Overheid / Government of Flanders
>>>>>
>>>>> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
>>>>>
>>>>> AND FOREST
>>>>>
>>>>> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
>>>>>
>>>>> thierry.onkelinx using inbo.be mailto:thierry.onkelinx using inbo.be
>>>>>
>>>>> Havenlaan 88 bus 73, 1000 Brussel
>>>>>
>>>>> www.inbo.be http://www.inbo.be
>>>>
>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>
>>>>> To call in the statistician after the experiment is done may be no more
>>>>>
>>>>> than asking him to perform a post-mortem examination: he may be able to
>>>>>
>>>>> say what the experiment died of. ~ Sir Ronald Aylmer Fisher
>>>>>
>>>>> The plural of anecdote is not data. ~ Roger Brinner
>>>>>
>>>>> The combination of some data and an aching desire for an answer does not
>>>>>
>>>>> ensure that a reasonable answer can be extracted from a given body of
>>>>>
>>>>> data. ~ John Tukey
>>>>
>>>> ///////////////////////////////////////////////////////////////////////////////////////////
>>>>
>>>>> https://www.inbo.be
>>>>>
>>>>> Op ma 28 jun. 2021 om 16:31 schreef Ben Bolker <bbolker using gmail.com
>>>>>
>>>>> mailto:bbolker using gmail.com>:
>>>>>
>>>>> Are you using lme4? (I'm 99% sure you are, but it's good to be
>>>>> explicit.)
>>>>>
>>>>> Are all of your fixed predictors numeric (rather than
>>>>> factor/categorical) ?
>>>>>
>>>>> Note that a convergence warning is a *warning*, not an error:
>>>>>
>>>>
>>>> have
>>>>
>>>>> you checked the troubleshooting steps in ?lme4::convergence (in
>>>>> particular, scaling and centering your predictor variables might
>>>>> help ...)
>>>>>
>>>>> cheers
>>>>> Ben Bolker
>>>>>
>>>>>
>>>>> On 6/28/21 10:17 AM, Amy Huang wrote:
>>>>> > Dear all,
>>>>> >
>>>>> > I am examining maternal effects, and my data have three hierarchy
>>>>> levels:
>>>>> > clutches of the same female, females, and cohorts. My explanatory
>>>>> variables
>>>>> > are at the female level (female length, age) and at the cohort
>>>>>
>>>>
>>>> level
>>>>
>>>>> > (temperature).
>>>>> >
>>>>> > I would like to estimate the variance components of each
>>>>> hierarchy level
>>>>> > (i.e. relative amount of variance at each level) and then to find
>>>>> out which
>>>>> > factors (female length, age, temperature) explain most of the
>>>>> variance. For
>>>>> > these, I have two models:
>>>>> > offspring trait ~ 1 + (1 | cohort/female/clutch)
>>>>> > offspring trait ~ temperature + female length + age + (1 |
>>>>> > cohort/female/clutch)
>>>>> >
>>>>> > The major problem is that I only have 3 cohorts (and so 3
>>>>> temperatures).
>>>>> > From the first model I am able to get the information, but from
>>>>> the second
>>>>> > one there is an error message: "Model failed to converge with 1
>>>>> negative
>>>>> > eigenvalue: -2.0e+01". The error pops up probably because I have
>>>>>
>>>>
>>>> both
>>>>
>>>>> > temperature (fixed) and cohort (random) included. Is my approach
>>>>> correct?
>>>>> > And is there a way to fix this error?
>>>>> >
>>>>> > Thank you so much for your time.
>>>>> >
>>>>> > Best regards,
>>>>> > Amy Huang
>>>>> >
>>>>> > [[alternative HTML version deleted]]
>>>>> >
>>>>> > _______________________________________________
>>>>> > R-sig-mixed-models using r-project.org
>>>>> <mailto:R-sig-mixed-models using r-project.org> mailing list
>>>>> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-mixed-models using r-project.org
>>>>> <mailto:R-sig-mixed-models using r-project.org> mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>> <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
>>>>>
>>>>
>>>> R-sig-mixed-models using r-project.org mailing list
>>>>
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> R-sig-mixed-models using r-project.org mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>> R-sig-mixed-models using r-project.org mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics
More information about the R-sig-mixed-models
mailing list