[R-meta] meta-analysis with proportion data and nested random terms

Thu May 23 11:43:08 CEST 2019

Dear Wolfgang,
Thanks again for your help. I was not dead set on using a beta-distribution, it was more my fear that people (peer-reviewers) will frown upon data transformations if there are potentially more suitable distributions. But your answer made it clearer that a logit-transformation is probably the most reasonable way to analyse that dataset.  It works and all output makes sense, so thanks a lot for your help.
Jens

________________________________________
Von: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Gesendet: Montag, 20. Mai 2019 19:03
An: Jens Joschinski; r-sig-meta-analysis using r-project.org
Betreff: RE: meta-analysis with proportion data and nested random terms

Hi Jens,

I keep coming back to this question (also saw it on SE) thinking I am going to answer it but then I kinda give up (because thinking this through all the way might take a lot of time). However, instead of leaving you completely hanging here, some thoughts.

You have values between 0 and 1 that are not binomial proportions. So it seems intuitive to think of them as beta distributed. But just because something falls between 0 and 1 doesn't imply a beta distribution is appropriate. Still might be a sensible approximation.

Alternatively, you could consider applying a logit transformation to those proportions (which should help to make the assumption of roughly normal sampling distributions more reasonable) and analyze them using a 'standard' model. For example:

library(metafor)

dat <- read.table("https://raw.githubusercontent.com/JensJoschi/variability_timing/master/tempdata.txt", header=TRUE)

res <- rma.mv(yi ~ p, vi, random = ~ 1 | order/spec/pop, data=dat)
res
plot(rstandard(res)$z, ylim=c(-3,3), pch=19)

Yeah, that looks pretty screwy. So:

dat$yti <- transf.logit(dat$yi)
dat$vti <- dat$vi / (dat$yi * (1 - dat$yi))^2

res <- rma.mv(yti ~ p, vti, random = ~ 1 | order/spec/pop, data=dat)
res
plot(rstandard(res)$z, ylim=c(-3,3), pch=19)

Much better. Still a bit of a cluster of points at the top right in the plot of the standardized residuals, but I wouldn't lose sleep over that. I would be more worried about that one data point at the bottom right being influential:

plot(dat$p, dat$yti, pch=19)

but one can do sensitivity analyses for that.

If you really want to go with beta regression, then I am trying to wrap my head around the meaning of those variances. The mean and variance of beta distributed values are linked, so I don't know what it even means if we try to fix the variances and I don't know what glmmTMB actually does if you specify 'weights' with family=beta_family(). Maybe these are then multiplative factors for the variances (which in turn are a function of the means). Leaving 'dispformula = ~1' at the default:

library(glmmTMB)

tmp <- glmmTMB(yi ~ p + (1|order/spec/pop), weights = vi, data=dat, family=beta_family())
summary(tmp)

This gives rather different results (even though the proportions are also analyzed on the logit scale). Especially the estimated variance components are totally different. Leaving out the variances in both models gives more similar results:

res <- rma.mv(yti ~ p, 0, random = ~ 1 | order/spec/pop, data=dat)
res
tmp <- glmmTMB(yi ~ p + (1|order/spec/pop), data=dat, family=beta_family())
summary(tmp)

Anyway, I don't really have a good suggestions on how to do this 'properly' with beta regression, but at least the 'standard' approach above (with logit-transformed proportions) should give you results that might be sensible.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Jens Joschinski
Sent: Monday, 20 May, 2019 11:04
To: r-sig-meta-analysis using r-project.org
Subject: Re: [R-meta] meta-analysis with proportion data and nested random terms

Hello everyone,
I have posted a question 2 weeks ago about suitable R packages for a meta-analysis of beta-distributed data. Before conducting a full-blown Bayesian analysis (with which I have no experience) I would like to ask once again if someone knows an easier solution, or would know where else to ask.

Thanks for any help!
Jens Joschinski
________________________________________
Von: R-sig-meta-analysis <r-sig-meta-analysis-bounces using r-project.org> im Auftrag von Jens Joschinski <Jens.Joschinski using UGent.be>
Gesendet: Montag, 6. Mai 2019 14:46
An: r-sig-meta-analysis using r-project.org
Betreff: [R-meta] meta-analysis with proportion data and nested random terms

Hi all,

I ran into some problems with a meta-analysis, and would really appreciate some help. I have posted a question on CrossValidated (https://stats.stackexchange.com/questions/405178/r-package-for-meta-analysis-of-beta-distributed-data), but received so far no reply and was advised to ask in this mailing list for help.

In short, I have searched for studies that measure insect diapause under multiple day lengths and extracted the raw data. This data takes a logit-form and is bounded between 0 and 100%. Depending on the study, there were 4-21 data points per curve available (mean 7), so the data is quite sparse. I used an MCMC approach to fit a logit-curve through this data, and report some properties of these curves along with a credible interval. In a second step these reaction norm properties are then correlated with climate data, for example, I correlated the inflection point of the curves with latitude of origin (https://stats.stackexchange.com/questions/402631/use-of-nested-random-terms-in-meta-analysis-with-a-moderator). One of the properties is, however, a proportion (proportion of variance within environments : variance among environments), and I do not know how to analyse it properly. Using the glmmTMB function with a beta family does not work when the dispersion parameter is set to 0, and metafor does (to my knowledge) not support beta-distributed data. Does anyone know a meta-analysis package that can model beta distributed data, or is there an alternative approach for my data?

Kind regards

Jens Joschinski