[R-sig-ME] [EXT] Re: Graphical validation of residuals of two models or their AICc?

Andrew Robinson @pro @end|ng |rom un|me|b@edu@@u
Tue Jan 17 02:39:19 CET 2023


Hi Tim,

Presumably the inference will be across all levels, and the model reports SE based on all the data.  In that case, I think that you're needing to look at all the model at once.

I'm afraid that I can't offer any further insight without a deeper dive into the data and model than I have time for.

In passing, I'd suggest that the lack of correlation within subject is a testable claim.

Cheers

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro using unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 16 Jan 2023 at 8:42 AM +1100, Timothy MacKenzie <fswfswt using gmail.com>, wrote:
External email: Please exercise caution

________________________________
Dear Andrew,

Thank you for the thoughtful comments. I'm mainly trying to estimate parameters with uncertainty.

The model is a multivariate multilevel model where `cal` defines the dependent variables. It does seem that the `cal` variables (there are 7 of them) don't correlate much with each other for each subject (`id`) across all subjects. As a result, I used `id = pdDiag(~cal +0)` to honor this feature.

The problem I'm facing is that the `cal` variables have very different descriptive statistics (e.g., their range varies substantially). As a result, I did expect the residuals of the model to be wild. To partly mitigate this, I let the residuals be spread differently across `cal:time` combinations for each subject across all subjects, that is: `weights = varIdent(form = ~ cal:time)`, correction: it must have been: `varIdent(form = ~ 1|cal*time)` .

On the issue of residuals, there is one thought that comes my mind: For a multivariate multilevel model like mine, is it even fair to look at residuals like univariate multilevel models, that is: simply `plot(fitted(m2), resid(m2, type="normalized"))` OR rather we should break the residuals down by `cal` levels across `id` levels and then look at the residuals' pattern?

My suggestion in the previous paragraph is based on the following post from our archives: (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2020q2/028684.html<https://stat.ethz.ch/pipermail/r-sig-mixed-models/2020q2/028684.html>)

Thanks,
Tim M


On Sun, Jan 15, 2023 at 2:32 PM Andrew Robinson <apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>> wrote:
Neither residual plot really floats my boat.  Presumably you have reasons to compare these model specifications that are unrelated to the data as we see them.  Ultimately it depends on the use to which the model will be put. Are you trying to estimate parameters with uncertainty or to make a prediction of some future state or something else?

I'd note that AIC is computed from LL at its maximum conditional on the model.  If the model is wrong then the MLE is unknowably wrong and the AIC is unknowably wrong. So absent other context I focus on the information that is available about the model, namely the residual plots.

I'd add that making time a factor is a curious decision.

Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro using unimelb.edu.au<mailto:apro using unimelb.edu.au>
Website: https://researchers.ms.unimelb.edu.au/~apro@unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects to their Elders.
On 15 Jan 2023 at 6:45 AM +1100, Timothy MacKenzie <fswfswt using gmail.com<mailto:fswfswt using gmail.com>>, wrote:
Dear All,

I'm deciding between two candidate models (m1 and m2). AICc shows that m2
is to be preferred over m1.

However, when I plot the residuals of m1 and m2, residuals of m1 look more
reasonable than those of m2.

I wonder which criteria to rely on to choose my model: graphical validation
of residuals of the two models or their AICc?

Thank you,
Tim M
#-----------------------------------------#
dat <- read.csv("https://raw.githubusercontent.com/fpqq/w/main/c.csv<https://raw.githubusercontent.com/fpqq/w/main/c.csv>")
dat <- transform(dat, time = factor(time))

library(nlme)
m1 <- lme(score ~ time*cal,
random = list(id = pdDiag(~cal +0)),
data = dat,
weights = varIdent(form = ~ cal:time),
control = lmeControl(msMaxIter = 2e2))

m2 <- lme(score ~ time*cal,
random = list(id = pdDiag(~cal:time +0)),
data = dat,
weights = varIdent(form = ~ cal:time),
control = lmeControl(msMaxIter = 2e2))

library(bbmle)

AICctab(m1, m2)

plot(m1)
plot(m2)
#-----------------------------------------#

[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models using r-project.org<mailto:R-sig-mixed-models using r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models<https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models>


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list