[R-sig-ME] Implications of modeling residuals in multilevel models
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Fri Dec 22 19:30:01 CET 2023
This slipped through the cracks.
On 2023-12-22 1:10 p.m., Simon Harmel wrote:
> Hello All,
>
> Just a follow-up, can we say the model I sketched above is a location-scale
> model?
>
> Thanks,
> Simon
>
> On Sat, Dec 16, 2023 at 9:30 PM Simon Harmel <sim.harmel using gmail.com> wrote:
>
>> Hello All,
>>
>> I have a highly skewed dataset.
As may have been said before, the *marginal* distribution of your
data set (e.g., what you get if you plot hist(y)) is irrelevant/doesn't
matter at all (the models make no assumptions about the marginal
distribution)
But, my MODEL of choice below shows
>> drastically improved, normally distributed residuals (and predicted values)
>> compared to other models whose residuals are not modeled.
>>
>> Three quick questions:
>>
>> 1- Is MODEL below assuming that my data come from a population that looks
>> normal once "X1_categorical" and "X2_numeric" are taken into account
>> as modeled in MODEL?
Yes, the *conditional distribution* of y is Gaussian (but not IID
because you have specified correlation and variance (weights) structures).
>> 2- Do these normally distributed predictions work better for a subject
>> randomly drawn from a similarly skewed population with a
>> known "X1_categorical" and "X2_numeric"?
Don't think I understand what this means.
>> 3- I think the distribution of residuals mirrors that of the data. If so,
>> is it correct to say MODEL below is actually **trimming** my highly skewed
>> data as if it was distributed as:
That seems an odd way to put it. I would say that, if the residuals
look Normally distributed, that means that the covariates in the model
are accounting for the skew.
Here's an example of a dataset where the marginal distribution is
heavily skewed but the conditional distribution is (exactly) Normal:
set.seed(101)
x <- rgamma(1000, shape = 1)
y <- rnorm(1000, mean = x, sd = 0.1)
hist(y)
qqnorm(residuals(lm(y~x)))
Heteroscedasticity (but not autocorrelation) will also change the
distribution of the residuals, but I don't think it's necessary to
illustrate the point here.
I think it would be fair to call this a location-scale model since
you are modeling both the location (mean) and the scale (SD/variance),
although often people use the term for models where there is a random
effect in both the location and the scale model.
>> ```
>> hist(resid(MODEL, type = "normalized"))
>> ```
>>
>> MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric ...,
>> random = ~1| subject,
>> data = data,
>> correlation = corSymm(~1|subject),
>> weights = varComb(varIdent(form = ~ 1 | X1_categorical ),
>> varPower(form = ~ X2_numeric )))
>>
>> Thanks,
>> Simon
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of
working hours.
More information about the R-sig-mixed-models
mailing list