[R-sig-ME] Implications of modeling residuals in multilevel models
Simon Harmel
@|m@h@rme| @end|ng |rom gm@||@com
Mon Dec 25 05:55:47 CET 2023
Dear Ben,
A question regarding your response is inline below.
### I think it would be fair to call this a location-scale model since
you are modeling both the location (mean) and the scale (SD/variance),
although often people use the term for models where there is a random
effect in both the location and the scale model.
Q: Usually for the scale part of a location-scale model, the linear model
uses a logarithmic link to guarantee that the estimate of scale is positive:
log(scale_i) = a_0 + b_1*x_i1+ ... + b_n*x_ip (for p predictors of scale)
But in the MODEL that I sketched below, how such a guarantee is made?
MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric ...,
random = ~1| subject,
data = data,
correlation = corSymm(~1|subject),
weights = varComb(varIdent(form = ~ 1 | X1_categorical ),
varPower(form = ~ X2_numeric )))
Thanks, Simon
On Fri, Dec 22, 2023 at 12:30 PM Ben Bolker <bbolker using gmail.com> wrote:
> This slipped through the cracks.
>
> On 2023-12-22 1:10 p.m., Simon Harmel wrote:
> > Hello All,
> >
> > Just a follow-up, can we say the model I sketched above is a
> location-scale
> > model?
> >
> > Thanks,
> > Simon
> >
> > On Sat, Dec 16, 2023 at 9:30 PM Simon Harmel <sim.harmel using gmail.com>
> wrote:
> >
> >> Hello All,
> >>
> >> I have a highly skewed dataset.
>
> As may have been said before, the *marginal* distribution of your
> data set (e.g., what you get if you plot hist(y)) is irrelevant/doesn't
> matter at all (the models make no assumptions about the marginal
> distribution)
>
> But, my MODEL of choice below shows
> >> drastically improved, normally distributed residuals (and predicted
> values)
> >> compared to other models whose residuals are not modeled.
> >>
> >> Three quick questions:
> >>
> >> 1- Is MODEL below assuming that my data come from a population that
> looks
> >> normal once "X1_categorical" and "X2_numeric" are taken into account
> >> as modeled in MODEL?
>
> Yes, the *conditional distribution* of y is Gaussian (but not IID
> because you have specified correlation and variance (weights) structures).
>
>
> >> 2- Do these normally distributed predictions work better for a subject
> >> randomly drawn from a similarly skewed population with a
> >> known "X1_categorical" and "X2_numeric"?
>
> Don't think I understand what this means.
>
> >> 3- I think the distribution of residuals mirrors that of the data. If
> so,
> >> is it correct to say MODEL below is actually **trimming** my highly
> skewed
> >> data as if it was distributed as:
>
> That seems an odd way to put it. I would say that, if the residuals
> look Normally distributed, that means that the covariates in the model
> are accounting for the skew.
>
> Here's an example of a dataset where the marginal distribution is
> heavily skewed but the conditional distribution is (exactly) Normal:
>
> set.seed(101)
> x <- rgamma(1000, shape = 1)
> y <- rnorm(1000, mean = x, sd = 0.1)
> hist(y)
>
> qqnorm(residuals(lm(y~x)))
>
> Heteroscedasticity (but not autocorrelation) will also change the
> distribution of the residuals, but I don't think it's necessary to
> illustrate the point here.
>
> I think it would be fair to call this a location-scale model since
> you are modeling both the location (mean) and the scale (SD/variance),
> although often people use the term for models where there is a random
> effect in both the location and the scale model.
>
>
>
>
>
> >> ```
> >> hist(resid(MODEL, type = "normalized"))
> >> ```
> >>
> >> MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric ...,
> >> random = ~1| subject,
> >> data = data,
> >> correlation = corSymm(~1|subject),
> >> weights = varComb(varIdent(form = ~ 1 | X1_categorical ),
> >> varPower(form = ~
> X2_numeric )))
> >>
> >> Thanks,
> >> Simon
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
> > E-mail is sent at my convenience; I don't expect replies outside of
> working hours.
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list