[R-meta] What to do if the residuals are not normally distributed?

Wed Nov 8 17:18:51 CET 2017

Can you show us the plot? You can try attaching it, but the R project mailing lists are rather picky as to what attachments are allowed (for good reasons!). A png/pdf file should usually go through though. Otherwise post it somewhere online where we can look at it.

Regardless, assessing (non)normality in more complex mixed-effects models is a tricky business. See, for example:

Lange, N., & Ryan, L. (1989). Assessing normality in random effects models. Annals of Statistics, 17(2), 624-642.

Nobre, J. S., & Singer, J. M. (2007). Residual analysis for linear mixed models. Biometrical Journal, 49(6), 863-875.

In your case, there are the sampling distributions of the estimates themselves which may be non-normal (esp. in small samples), but also three sets of random effects, and each one of those sets may be non-normal. rstandard() gives you 'marginal' residuals, so it is not clear where the non-normality is coming from.

Usually though, non-normality is not a major concern when it comes to your fixed effects (although the standard errors of the fixed effects may be adversely affected). These studies (focused on more simple RE models) suggest that non-normality is not a major issue:

Kontopantelis, E., & Reeves, D. (2012). Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: A simulation study. Statistical Methods in Medical Research, 21(4), 409-426.

van den Noortgate, W., & Onghena, P. (2003). Multilevel meta-analysis: A comparison with traditional meta-analytic procedures. Educational and Psychological Measurement, 63(5), 765-790.

More generally, there are also theoretical results that indicate that non-normality in mixed-effects models is not a major concern, although we may have to adjust how we compute the standard errors of the fixed effects:

Verbeke, G., & Lesaffre, E. (1997). The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Computational Statistics & Data Analysis, 23, 541-556.

This is also backed up by this article:

Maas, C. J. M., & Hox, J. J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica, 58(2), 127-137.

Using robust standard errors may help to deal with non-normality.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Jonas Duus Stevens Lekfeldt
Sent: Wednesday, 08 November, 2017 15:37
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] What to do if the residuals are not normally distributed?

Thanks to both Wolfgang and Michael for their replies

I am immensely impressed with the work Wolfgang is doing in sharing his knowledge in this forum!

Now I have drawn a qqnorm-plot of my first model using the following code:

meta_list <- rma.mv(yi=meta_filter_1$yi,
                    V=VarC, 
                    random = ~1|ExpName/ControlName/VarID,
                    test="t")

meta_resid <- rstandard.rma.mv(meta_list)
z_val <- meta_resid$z
qqnorm(z_val)
qqline(z_val)

Now comes the big problem: the data do not seem normal at all. They diverge counterclockwise in both ends (indicating longer tails than the normal distribution).

So what to do now? Remove outliers? Is it a serious issue that the residuals are not normally distributed or do I not have to worry?

Is there something I am missing?

Best wishes and thanks again

Jonas

-----Oprindelig meddelelse-----
Fra: Viechtbauer Wolfgang (SP) [mailto:wolfgang.viechtbauer at maastrichtuniversity.nl] 
Sendt: 08 November 2017 12:26
Til: Jonas Duus Stevens Lekfeldt; r-sig-meta-analysis at r-project.org
Emne: RE: [R-meta] Calculating the var-covar matrix for dependent effect sizes for ROM

As far as I can tell, your calculation of the V matrix is correct. If you need a reference for the equation for the covariance, see:

Lajeunesse, M. J. (2011). On the meta-analysis of response ratios for studies with correlated and multi-group designs. Ecology, 92(11), 2049-2055.

One thing to be careful about: Make sure that your dataset is first ordered by the variable that you are splitting on with split(). Otherwise, your V matrix is not ordered in the correct way. To illustrate:

dat <- data.frame(study=c(3,3,2,1,1), yi=c(.1,.2,.4,.3,.5), vi=c(.02,.01,.07,.05,.03)) dat sav <- lapply(split(dat$vi, dat$study), function(x) diag(x, nrow=length(x))) sav
bldiag(sav)

So do:

dat <- dat[order(dat$study),]
dat
sav <- lapply(split(dat$vi, dat$study), function(x) diag(x, nrow=length(x))) sav
bldiag(sav)

Let's just say that this has bitten me in the behind once or twice before.

Also, since you are splitting on "ControlName", I assume that values of "ControlName" are never the same across different values of "ExpName" (otherwise, you would be computing covariances across different experiments).

Your rma.mv() code looks fine.

I haven't gotten around to implementing qqnorm() for 'rma.mv' objects. But since rstandard() (and rstudent() if you install the devel version of metafor) work for 'rma.mv' objects, you can just feed the standardized residuals to qqnorm() directly.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Jonas Duus Stevens Lekfeldt
Sent: Wednesday, 08 November, 2017 9:19
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Calculating the var-covar matrix for dependent effect sizes for ROM

Thank you, Wolfgang for the fast and clear reply!

In an earlier reply by Wolfgang I found the following formula for calculating the covariance for ROM (log response ratio) when controls are shared among some of the datasets (if I have understood it correctly):

Covariance = sd^2/(n*mean^2), from the group whose data is being re-used.

I have calculated a new covariance column in the dataset data of the individual effect sizes based on the control data in the following way:

data <- data %>%
  dplyr::mutate (covar=((data$sd2i)^2)/(data$n2i*(data$m2i^2))),

where:
sd2i is the standard deviation of the control group n2i is the sample size of the control group m2i is the mean of the control group

Subsequently I have calculated the variance-covariance matrix (here called VarC) using the following code (again inspired by Wolfgang):

calc.v.ROM <- function(x) {
  v <- matrix(x$covar[1],nrow=nrow(x),ncol=nrow(x))
  diag(v) <- x$vi
  v
}

covar_list <- lapply(split(data,data$ControlName),calc.v.ROM)
VarC <- bldiag(covar_list)

Where "ControlName" is the column in "data" where the names of the control groups are stored.

Using VarC as the argument to V in the following code gives meaningful results so it seems to work, but I would like to ask if it seems correct?

meta_list <- rma.mv(yi=data$yi,
                    V=VarC, 
                    random = ~1|ExpName/ControlName/ID,
                    test="t")

Another question: drawing a qqnorm-plot does not seem to be implemented for rma.mv(). Is that right? 

Best regards

Jonas Duus Stevens Lekfeldt