[R-sig-ME] Implications of modeling residuals in multilevel models

Mon Dec 25 05:55:47 CET 2023

Dear Ben,

A question regarding your response is inline below.

### I think it would be fair to call this a location-scale model since
you are modeling both the location (mean) and the scale (SD/variance),
although often people use the term for models where there is a random
effect in both the location and the scale model.

Q: Usually for the scale part of a location-scale model, the linear model
uses a logarithmic link to guarantee that the estimate of scale is positive:

log(scale_i) = a_0 + b_1*x_i1+ ... + b_n*x_ip  (for p predictors of scale)

But in the MODEL that I sketched below, how such a guarantee is made?

MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric ...,
         random = ~1| subject,
         data = data,
         correlation = corSymm(~1|subject),
         weights = varComb(varIdent(form = ~ 1 |  X1_categorical ),
                                          varPower(form = ~  X2_numeric )))

Thanks, Simon

On Fri, Dec 22, 2023 at 12:30 PM Ben Bolker <bbolker using gmail.com> wrote:

>    This slipped through the cracks.
>
> On 2023-12-22 1:10 p.m., Simon Harmel wrote:
> > Hello All,
> >
> > Just a follow-up, can we say the model I sketched above is a
> location-scale
> > model?
> >
> > Thanks,
> > Simon
> >
> > On Sat, Dec 16, 2023 at 9:30 PM Simon Harmel <sim.harmel using gmail.com>
> wrote:
> >
> >> Hello All,
> >>
> >> I have a highly skewed dataset.
>
>    As may have been said before, the *marginal* distribution of your
> data set (e.g., what you get if you plot hist(y)) is irrelevant/doesn't
> matter at all (the models make no assumptions about the marginal
> distribution)
>
>   But, my MODEL of choice below shows
> >> drastically improved, normally distributed residuals (and predicted
> values)
> >> compared to other models whose residuals are not modeled.
> >>
> >> Three quick questions:
> >>
> >> 1- Is MODEL below assuming that my data come from a population that
> looks
> >> normal once "X1_categorical" and "X2_numeric" are taken into account
> >> as modeled in MODEL?
>
>    Yes, the *conditional distribution* of y is Gaussian (but not IID
> because you have specified correlation and variance (weights) structures).
>
>
> >> 2- Do these normally distributed predictions work better for a subject
> >> randomly drawn from a similarly skewed population with a
> >> known "X1_categorical" and "X2_numeric"?
>
>    Don't think I understand what this means.
>
> >> 3- I think the distribution of residuals mirrors that of the data. If
> so,
> >> is it correct to say MODEL below is actually **trimming** my highly
> skewed
> >> data as if it was distributed as:
>
>    That seems an odd way to put it.  I would say that, if the residuals
> look Normally distributed, that means that the covariates in the model
> are accounting for the skew.
>
> Here's an example of a dataset where the marginal distribution is
> heavily skewed but the conditional distribution is (exactly) Normal:
>
> set.seed(101)
> x <- rgamma(1000, shape = 1)
> y <- rnorm(1000, mean = x, sd = 0.1)
> hist(y)
>
> qqnorm(residuals(lm(y~x)))
>
>    Heteroscedasticity (but not autocorrelation) will also change the
> distribution of the residuals, but I don't think it's necessary to
> illustrate the point here.
>
>    I think it would be fair to call this a location-scale model since
> you are modeling both the location (mean) and the scale (SD/variance),
> although often people use the term for models where there is a random
> effect in both the location and the scale model.
>
>
>
>
>
> >> ```
> >> hist(resid(MODEL, type = "normalized"))
> >> ```
> >>
> >>   MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric ...,
> >>           random = ~1| subject,
> >>           data = data,
> >>           correlation = corSymm(~1|subject),
> >>           weights = varComb(varIdent(form = ~ 1 |  X1_categorical ),
> >>                                            varPower(form = ~
> X2_numeric )))
> >>
> >> Thanks,
> >> Simon
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>  > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]