[R-sig-ME] reference category in varIdent function (once again)

Mon Dec 30 15:50:01 CET 2024

Regarding your P.S., I've added a callout to the post noting that the bug
has been squashed.
https://jepusto.com/posts/bug-in-nlme-getVarCov/

Kind Regards,
James

On Sun, Dec 29, 2024 at 5:23 PM Sebastian Meyer <seb.meyer using fau.de> wrote:

> Glad to hear that helped. Your exemplification of current varIdent
> behaviour is also helpful, thanks. I'll revisit the bug with a closer
> look next year.
>
> In principle, it would be better for the reference level to be based on
> the factor levels, but as you say, nlme's code base is very old, which
> means many publications (implicitly) rely on the current behaviour, so
> semantics or defaults won't be changed easily. Still, bugs should be
> fixed -- and it should be possible to choose the reference level.
>
> Best regards,
>
>         Sebastian Meyer
>
> PS: The getVarCov() bug that is linked in your report (to another post)
> was fixed a while ago in 2020 (nlme version 3.1-150), i.e., the
> `all.equal(V_raw, V_sorted)` tests now give TRUE. It might be worth
> noting this somewhere.
>
>
> Am 29.12.24 um 22:40 schrieb James Pustejovsky:
> > Hi Sebastian,
> >
> > Thanks for your reply. This helped me figure out how to control the
> > reference level for the particular model I'm working on. I wrote up what
> > I've learned (in an excessive degree of detail) here:
> > https://jepusto.com/posts/varIdent-function-in-nlme/
> > <https://jepusto.com/posts/varIdent-function-in-nlme/>
> >
> > James
> >
> > On Mon, Dec 23, 2024 at 10:40 AM Sebastian Meyer <seb.meyer using fau.de
> > <mailto:seb.meyer using fau.de>> wrote:
> >
> >     I think it is unfortunate that varIdent() selects the reference
> >     category
> >     based on how the data is sorted (using the first level that occurs).
> >     There is also a corresponding bug report
> >     <https://bugs.r-project.org/show_bug.cgi?id=18505
> >     <https://bugs.r-project.org/show_bug.cgi?id=18505>>.
> >
> >     Your example illustrates an additional issue with this behaviour,
> >     because lme() internally reorders the data by the grouping factor(s)
> >     before initialization, so the order of that factor ('Subject' in your
> >     example, with first level "M16", so male) will indirectly determine
> the
> >     reference level of varIdent(), regardless of how your data or strata
> >     levels are ordered originally.
> >
> >     It seems that only if there are more than two strata, the reference
> >     level for varIdent() can be chosen via a named initial parameter
> >     'value', leaving out the desired reference level, but I haven't
> tested
> >     if this works as intended with both gls() and lme(). It would then
> make
> >     sense to support such a choice also in the case of only two strata,
> so
> >     for a single parameter.
> >
> >     Best wishes,
> >
> >              Sebastian Meyer
> >
> >
> >     Am 20.12.24 um 20:39 schrieb James Pustejovsky:
> >      > Greetings,
> >      >
> >      > I'm trying to fit a linear mixed effect model using nlme::lme()
> that
> >      > includes heteroskedastic variances by group, where the variance
> >     for one
> >      > group is fixed to a specific value. I *think* I can do this using
> >      > varIdent() and setting lmeControl(sigma = x) for a specified
> value x.
> >      > However, I've run into what is evidently an old problem: that
> >     there doesn't
> >      > appear to be any way to control the reference level of varIdent().
> >      >
> >      > As was noted in a very old post to R-SIG-mixed-models (
> >      >
> >     https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html
> >     <
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020582.html>),
> the
> >      > reference level of varIdent() within an lme() call is set to the
> most
> >      > common level of the grouping factor. For reasons that are too
> >     involved to
> >      > get into, I need to set the reference level to something else (in
> >     fact, the
> >      > least frequently occurring level), so it seems I'm stuck.
> >      >
> >      > Curiously, the behavior of varIdent() within a gls() call is a bit
> >      > different. As documented in this StackOverflow (
> >      >
> >
> https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order
> <
> https://stackoverflow.com/questions/75929847/reference-category-in-varident-function-nlme-package-depends-on-data-order
> >),
> >      > the reference level used within gls() depends on the sort order
> >     of the
> >      > data. I've included below a minimal working example that
> >     demonstrates the
> >      > difference in behavior.
> >      >
> >      > Does anyone know of any work-around for setting the reference
> >     level of
> >      > varIdent() within an lme() call? I'd be grateful for any
> >     leads/suggestions
> >      > on what I could do (up to and including to just use a different
> >     package
> >      > already).
> >      >
> >      > Thanks,
> >      > James
> >      >
> >      >
> >      >
> >
>  ----------------------------------------------------------------------------
> >      > library(nlme)
> >      >
> >      > # Make reversed factor
> >      > table(Orthodont$Sex)
> >      > Orthodont$Sex_rev <- Orthodont$Sex
> >      > levels(Orthodont$Sex_rev) <- rev(levels(Orthodont$Sex))
> >      > table(Orthodont$Sex_rev)
> >      >
> >      > Ortho_male_first <- Orthodont[order(Orthodont$Sex),]
> >      > Ortho_female_first <- Orthodont[order(Orthodont$Sex, decreasing =
> >     TRUE),]
> >      >
> >      > # Fit gls models with heteroskedastic variance
> >      > gls1 <- gls(
> >      >    distance ~ age + Sex,
> >      >    weights = varIdent(form = ~ 1 | Sex),
> >      >    data = Ortho_male_first
> >      > )
> >      > gls2 <- update(gls1, data = Ortho_female_first)
> >      > gls3 <- update(gls1, weights = varIdent(form = ~ 1 |Sex_rev))
> >      > gls4 <- update(gls3, data = Ortho_female_first)
> >      >
> >      > # Parameterization depends on sort order
> >      > coef(gls1$modelStruct$varStruct)
> >      > coef(gls2$modelStruct$varStruct)
> >      > # But not on factor level order
> >      > coef(gls3$modelStruct$varStruct)
> >      > coef(gls4$modelStruct$varStruct)
> >      >
> >      > # Fit lme models with heteroskedastic level-1 variance
> >      >
> >      > lme1 <- lme(
> >      >    distance ~ age + Sex,
> >      >    random = ~ 1,
> >      >    weights = varIdent(form = ~ 1 | Sex),
> >      >    data = Ortho_male_first
> >      > )
> >      > lme2 <- update(lme1, data = Ortho_female_first)
> >      > lme3 <- update(lme1, weights = varIdent(form = ~ 1 |Sex_rev))
> >      > lme4 <- update(lme3, data = Ortho_female_first)
> >      >
> >      > # Parameterization depends neither on sort order nor on factor
> >     level order
> >      > coef(lme1$modelStruct$varStruct)
> >      > coef(lme2$modelStruct$varStruct)
> >      > coef(lme3$modelStruct$varStruct)
> >      > coef(lme4$modelStruct$varStruct)
> >      >
> >      >       [[alternative HTML version deleted]]
> >      >
> >      > _______________________________________________
> >      > R-sig-mixed-models using r-project.org
> >     <mailto:R-sig-mixed-models using r-project.org> mailing list
> >      > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> >
>

	[[alternative HTML version deleted]]