[R-meta] What to do if the residuals are not normally distributed?
Jonas Duus Stevens Lekfeldt
jdsl at plen.ku.dk
Fri Nov 10 12:35:10 CET 2017
Thanks again!
I tried both robust(model, cluster=dat$ExpName) and robust(model, cluster=dat$ControlName) and they both essentially gave the same result (see attached PDF).
This plot was drawn using:
>meta_resid <- rstandard.rma.mv(robust_model)
>z_val <- meta_resid$z
>qqnorm(z_val)
>qqline(z_val)
I am unsure how to interpret this result. Now the "se" of the estimates are closely following the residuals obtained which results in what looks like a bi-modal distribution of the z-values (around -1 and +1, or am I mistaken?).
So really my question to all of you is: do you believe that I need to do any further correction or could I use the original model (using the var-covar matrix) even though the distribution of residuals is too broad? With the note of caution that I may be underestimating the width of the CIs? As far as I understood the former post by Wolfgang the non-normality may not be such a big problem regarding the fixed effects, after all.
And then again: I have not yet removed outliers from the dataset which may also help to correct the problem. The question is of course if one would like to do that, but none the less it could be used as a sensitivity analysis.
Given my extremely limited knowledge of Bayesian statistics, JAGS and related subjects implementing the last suggestion by Wolfgang is unfortunately not a feasible way to go forward for me.
Best
Jonas
Ps. How can I best acknowledge your contribution to my paper, Wolfgang?
-----Oprindelig meddelelse-----
Fra: Viechtbauer Wolfgang (SP) [mailto:wolfgang.viechtbauer at maastrichtuniversity.nl]
Sendt: 10 November 2017 10:40
Til: Jonas Duus Stevens Lekfeldt; r-sig-meta-analysis at r-project.org
Emne: RE: [R-meta] What to do if the residuals are not normally distributed?
Thanks, the plot came through. Indeed, this looks like a fat-tailed distribution.
So, you could try:
robust(model, cluster=dat$ExpName)
or:
library(clubSandwich)
coef_test(model, vcov="CR2")
as one potential approach to deal with this. One could also try bootstrapping (http://www.metafor-project.org/doku.php/tips:bootstrapping_with_ma), although for multilevel models, this becomes increasingly more complex to do properly.
If the non-normality comes from the random effects, there are models that allow for non-normal distributions for the random effects; for example:
Baker, R., & Jackson, D. (2008). A new approach to outliers in meta-analysis. Health Care Management Science, 11(2), 121-131.
Beath, K. J. (2016). metaplus: An R package for the analysis of robust meta-analysis and meta-regression. The R Journal, 8(1), 5-16.
Branscum, A. J., & Hanson, T. E. (2008). Bayesian nonparametric meta-analysis using Polya tree mixture models. Biometrics, 64(3), 825-833.
Burr, D., & Doss, H. (2005). A Bayesian semiparametric model for random-effects meta-analysis. Journal of the American Statistical Association, 100(469), 242-251.
Lee, K. J., & Thompson, S. G. (2008). Flexible parametric models for random-effects distributions. Statistics in Medicine, 27(3), 418-434.
With 'metaplus', there is an even an R package for this, but it does not cover multilevel models like you are fitting. For the other approaches, I am not aware of any out-of-the-box implementations. Using BUGS/JAGS/Stan, one could implement things like t-distributed random effects quite easily. No idea whether this would much of a difference though.
Best,
Wolfgang
>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>project.org] On Behalf Of Jonas Duus Stevens Lekfeldt
>Sent: Friday, 10 November, 2017 9:09
>To: r-sig-meta-analysis at r-project.org
>Subject: Re: [R-meta] What to do if the residuals are not normally
>distributed?
>
>ATTACHMENT(S) REMOVED: QQnorm_no moderators.pdf
>
>Dear all
>
>Thanks again to Wolfgang for his reply.
>
>I have tried to attach the plot as a PDF. The model is based on 945
>data pairs and the output from rma.mv() looks like this:
>
>Multivariate Meta-Analysis Model (k = 945; method: REML)
>
>Variance Components:
>
> estim sqrt nlvls fixed factor
>sigma^2.1 0.0316 0.1779 114 no ExpName
>sigma^2.2 0.0074 0.0858 290 no ExpName/ControlName
>sigma^2.3 0.0101 0.1006 945 no ExpName/ControlName/VarID
>
>Test for Heterogeneity:
>Q(df = 944) = 14276.4311, p-val < .0001
>
>Model Results:
>
>estimate se tval pval ci.lb ci.ub
> 0.0893 0.0192 4.6489 <.0001 0.0516 0.1270 ***
>
>---
>Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>Best regards
>
>Jonas
>
>-----Oprindelig meddelelse-----
>Fra: Viechtbauer Wolfgang (SP)
>[mailto:wolfgang.viechtbauer at maastrichtuniversity.nl]
>Sendt: 08 November 2017 17:19
>Til: Jonas Duus Stevens Lekfeldt; r-sig-meta-analysis at r-project.org
>Emne: RE: What to do if the residuals are not normally distributed?
>
>Can you show us the plot? You can try attaching it, but the R project
>mailing lists are rather picky as to what attachments are allowed (for
>good reasons!). A png/pdf file should usually go through though.
>Otherwise post it somewhere online where we can look at it.
>
>Regardless, assessing (non)normality in more complex mixed-effects
>models is a tricky business. See, for example:
>
>Lange, N., & Ryan, L. (1989). Assessing normality in random effects
>models. Annals of Statistics, 17(2), 624-642.
>
>Nobre, J. S., & Singer, J. M. (2007). Residual analysis for linear
>mixed models. Biometrical Journal, 49(6), 863-875.
>
>In your case, there are the sampling distributions of the estimates
>themselves which may be non-normal (esp. in small samples), but also
>three sets of random effects, and each one of those sets may be non-
>normal. rstandard() gives you 'marginal' residuals, so it is not clear
>where the non-normality is coming from.
>
>Usually though, non-normality is not a major concern when it comes to
>your fixed effects (although the standard errors of the fixed effects
>may be adversely affected). These studies (focused on more simple RE
>models) suggest that non-normality is not a major issue:
>
>Kontopantelis, E., & Reeves, D. (2012). Performance of statistical
>methods for meta-analysis when true study effects are non-normally
>distributed: A simulation study. Statistical Methods in Medical
>Research, 21(4), 409-426.
>
>van den Noortgate, W., & Onghena, P. (2003). Multilevel meta-analysis:
>A comparison with traditional meta-analytic procedures. Educational and
>Psychological Measurement, 63(5), 765-790.
>
>More generally, there are also theoretical results that indicate that
>non-normality in mixed-effects models is not a major concern, although
>we may have to adjust how we compute the standard errors of the fixed
>effects:
>
>Verbeke, G., & Lesaffre, E. (1997). The effect of misspecifying the
>random-effects distribution in linear mixed models for longitudinal data.
>Computational Statistics & Data Analysis, 23, 541-556.
>
>This is also backed up by this article:
>
>Maas, C. J. M., & Hox, J. J. (2004). Robustness issues in multilevel
>regression analysis. Statistica Neerlandica, 58(2), 127-137.
>
>Using robust standard errors may help to deal with non-normality.
>
>Best,
>Wolfgang
>
>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>project.org] On Behalf Of Jonas Duus Stevens Lekfeldt
>Sent: Wednesday, 08 November, 2017 15:37
>To: r-sig-meta-analysis at r-project.org
>Subject: [R-meta] What to do if the residuals are not normally
>distributed?
>
>Thanks to both Wolfgang and Michael for their replies
>
>I am immensely impressed with the work Wolfgang is doing in sharing his
>knowledge in this forum!
>
>Now I have drawn a qqnorm-plot of my first model using the following
>code:
>
>meta_list <- rma.mv(yi=meta_filter_1$yi,
> V=VarC,
> random = ~1|ExpName/ControlName/VarID,
> test="t")
>
>meta_resid <- rstandard.rma.mv(meta_list)
z_val <- meta_resid$z
>qqnorm(z_val)
>qqline(z_val)
>
>Now comes the big problem: the data do not seem normal at all. They
>diverge counterclockwise in both ends (indicating longer tails than the
>normal distribution).
>
>So what to do now? Remove outliers? Is it a serious issue that the
>residuals are not normally distributed or do I not have to worry?
>
>Is there something I am missing?
>
>Best wishes and thanks again
>
>Jonas
>
>-----Oprindelig meddelelse-----
>Fra: Viechtbauer Wolfgang (SP)
>[mailto:wolfgang.viechtbauer at maastrichtuniversity.nl]
>Sendt: 08 November 2017 12:26
>Til: Jonas Duus Stevens Lekfeldt; r-sig-meta-analysis at r-project.org
>Emne: RE: [R-meta] Calculating the var-covar matrix for dependent
>effect sizes for ROM
>
>As far as I can tell, your calculation of the V matrix is correct. If
>you need a reference for the equation for the covariance, see:
>
>Lajeunesse, M. J. (2011). On the meta-analysis of response ratios for
>studies with correlated and multi-group designs. Ecology, 92(11), 2049-
>2055.
>
>One thing to be careful about: Make sure that your dataset is first
>ordered by the variable that you are splitting on with split().
>Otherwise, your V matrix is not ordered in the correct way. To
>illustrate:
>
>dat <- data.frame(study=c(3,3,2,1,1), yi=c(.1,.2,.4,.3,.5),
>vi=c(.02,.01,.07,.05,.03)) dat sav <- lapply(split(dat$vi, dat$study),
>function(x) diag(x, nrow=length(x))) sav
>bldiag(sav)
>
>So do:
>
>dat <- dat[order(dat$study),]
>dat
>sav <- lapply(split(dat$vi, dat$study), function(x) diag(x,
>nrow=length(x))) sav
>bldiag(sav)
>
>Let's just say that this has bitten me in the behind once or twice
>before.
>
>Also, since you are splitting on "ControlName", I assume that values of
>"ControlName" are never the same across different values of "ExpName"
>(otherwise, you would be computing covariances across different
>experiments).
>
>Your rma.mv() code looks fine.
>
>I haven't gotten around to implementing qqnorm() for 'rma.mv' objects.
>But since rstandard() (and rstudent() if you install the devel version
>of
>metafor) work for 'rma.mv' objects, you can just feed the standardized
>residuals to qqnorm() directly.
>
>Best,
>Wolfgang
>
>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>project.org] On Behalf Of Jonas Duus Stevens Lekfeldt
>Sent: Wednesday, 08 November, 2017 9:19
>To: r-sig-meta-analysis at r-project.org
>Subject: [R-meta] Calculating the var-covar matrix for dependent effect
>sizes for ROM
>
>Thank you, Wolfgang for the fast and clear reply!
>
>In an earlier reply by Wolfgang I found the following formula for
>calculating the covariance for ROM (log response ratio) when controls
>are shared among some of the datasets (if I have understood it correctly):
>
>Covariance = sd^2/(n*mean^2), from the group whose data is being re-used.
>
>I have calculated a new covariance column in the dataset data of the
>individual effect sizes based on the control data in the following way:
>
>data <- data %>%
> dplyr::mutate (covar=((data$sd2i)^2)/(data$n2i*(data$m2i^2))),
>
>where:
>sd2i is the standard deviation of the control group n2i is the sample
>size of the control group m2i is the mean of the control group
>
>Subsequently I have calculated the variance-covariance matrix (here
>called VarC) using the following code (again inspired by Wolfgang):
>
>calc.v.ROM <- function(x) {
> v <- matrix(x$covar[1],nrow=nrow(x),ncol=nrow(x))
> diag(v) <- x$vi
> v
>}
>
>covar_list <- lapply(split(data,data$ControlName),calc.v.ROM)
>VarC <- bldiag(covar_list)
>
>Where "ControlName" is the column in "data" where the names of the
>control groups are stored.
>
>Using VarC as the argument to V in the following code gives meaningful
>results so it seems to work, but I would like to ask if it seems correct?
>
>meta_list <- rma.mv(yi=data$yi,
> V=VarC,
> random = ~1|ExpName/ControlName/ID,
> test="t")
>
>Another question: drawing a qqnorm-plot does not seem to be implemented
>for rma.mv(). Is that right?
>
>Best regards
>
>Jonas Duus Stevens Lekfeldt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: QQnorm_robust.pdf
Type: application/pdf
Size: 60333 bytes
Desc: QQnorm_robust.pdf
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20171110/a30fc2a4/attachment-0001.pdf>
More information about the R-sig-meta-analysis
mailing list