[R-sig-ME] Implications of modeling residuals in multilevel models

Fri Dec 22 23:07:13 CET 2023


On 2023-12-22 4:50 p.m., Simon Harmel wrote:
> Dear Ben,
> 
> These are exactly what I was looking for! Thank you so much. Please 
> allow me to clarify my 2nd question.
> 
> When I compare predict(MODEL) with predict(model_with_no_resid_modeled), 
> the former looks quite normal and unimodal in shape but the latter looks 
> multi-model (see the illustration below).
> 
> This made me think: should a good-fitting model produce normally 
> distributed predicted values? And does such a model do a better job of 
> predicting the value of random draw from the y population?

     Ideally the *marginal* (overall) distribution of *predictions* 
should match the marginal distribution of the response variable (this is 
what the 'posterior predictive check' panel of 
performance::check_model() does).  If the marginal distribution of your 
data is skewed, then the marginal distribution of your prediction should 
be skewed in the same way -- if it's not, then it's not doing a good job 
describing the data.

m <- lm(mpg ~ cyl + disp, mtcars)
performance::check_model(m, check = "pp_check")

   This is a different story from the *residuals* (which tell you 
something about the conditional distribution, not the marginal 
distribution ...)


>                 *
>              * * *
>         * * * * * * *
>     * * * * * * * * * *
> * * * * * * * * * * * *
> -------------------------  vs:
>                          *
>                        * *
>        *             * * *
>     * *   * *     * * * *
> * * * * * * * * * * * *
> -------------------------
> 
> On Fri, Dec 22, 2023 at 12:30 PM Ben Bolker <bbolker using gmail.com 
> <mailto:bbolker using gmail.com>> wrote:
> 
>         This slipped through the cracks.
> 
>     On 2023-12-22 1:10 p.m., Simon Harmel wrote:
>      > Hello All,
>      >
>      > Just a follow-up, can we say the model I sketched above is a
>     location-scale
>      > model?
>      >
>      > Thanks,
>      > Simon
>      >
>      > On Sat, Dec 16, 2023 at 9:30 PM Simon Harmel
>     <sim.harmel using gmail.com <mailto:sim.harmel using gmail.com>> wrote:
>      >
>      >> Hello All,
>      >>
>      >> I have a highly skewed dataset.
> 
>         As may have been said before, the *marginal* distribution of your
>     data set (e.g., what you get if you plot hist(y)) is irrelevant/doesn't
>     matter at all (the models make no assumptions about the marginal
>     distribution)
> 
>        But, my MODEL of choice below shows
>      >> drastically improved, normally distributed residuals (and
>     predicted values)
>      >> compared to other models whose residuals are not modeled.
>      >>
>      >> Three quick questions:
>      >>
>      >> 1- Is MODEL below assuming that my data come from a population
>     that looks
>      >> normal once "X1_categorical" and "X2_numeric" are taken into account
>      >> as modeled in MODEL?
> 
>         Yes, the *conditional distribution* of y is Gaussian (but not IID
>     because you have specified correlation and variance (weights)
>     structures).
> 
> 
>      >> 2- Do these normally distributed predictions work better for a
>     subject
>      >> randomly drawn from a similarly skewed population with a
>      >> known "X1_categorical" and "X2_numeric"?
> 
>         Don't think I understand what this means.
> 
>      >> 3- I think the distribution of residuals mirrors that of the
>     data. If so,
>      >> is it correct to say MODEL below is actually **trimming** my
>     highly skewed
>      >> data as if it was distributed as:
> 
>         That seems an odd way to put it.  I would say that, if the
>     residuals
>     look Normally distributed, that means that the covariates in the model
>     are accounting for the skew.
> 
>     Here's an example of a dataset where the marginal distribution is
>     heavily skewed but the conditional distribution is (exactly) Normal:
> 
>     set.seed(101)
>     x <- rgamma(1000, shape = 1)
>     y <- rnorm(1000, mean = x, sd = 0.1)
>     hist(y)
> 
>     qqnorm(residuals(lm(y~x)))
> 
>         Heteroscedasticity (but not autocorrelation) will also change the
>     distribution of the residuals, but I don't think it's necessary to
>     illustrate the point here.
> 
>         I think it would be fair to call this a location-scale model since
>     you are modeling both the location (mean) and the scale (SD/variance),
>     although often people use the term for models where there is a random
>     effect in both the location and the scale model.
> 
> 
> 
> 
> 
>      >> ```
>      >> hist(resid(MODEL, type = "normalized"))
>      >> ```
>      >>
>      >>   MODEL <- nlme::lme(y ~ X1_categorical + X2_numeric ...,
>      >>           random = ~1| subject,
>      >>           data = data,
>      >>           correlation = corSymm(~1|subject),
>      >>           weights = varComb(varIdent(form = ~ 1 |  X1_categorical ),
>      >>                                            varPower(form = ~ 
>     X2_numeric )))
>      >>
>      >> Thanks,
>      >> Simon
>      >>
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > _______________________________________________
>      > R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> 
>     -- 
>     Dr. Benjamin Bolker
>     Professor, Mathematics & Statistics and Biology, McMaster University
>     Director, School of Computational Science and Engineering
>     (Acting) Graduate chair, Mathematics & Statistics
>       > E-mail is sent at my convenience; I don't expect replies outside of
>     working hours.
> 
>     _______________________________________________
>     R-sig-mixed-models using r-project.org
>     <mailto:R-sig-mixed-models using r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>     <https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.