[R-meta] multilevel bivariate metaanalysis in the presence of phylogenetic correlation

Thu Sep 12 14:08:36 CEST 2024

Dear Wolfgang, thank you for the quick response! Just to clarify, yes I used struct = "UN" because I wanted to allow for a correlation between the two traits across species (due phylogeny and species-specific effects unrelated to phylogeny). In fact, this is the main objective of the analysis. But as I understand from your answer this may not be possible using metafor. And I should definitely not use the model I suggested. That is certainly good to know!

Thanks,
Sigurd

>-----Original Message-----
>From: Viechtbauer, Wolfgang (NP)
><wolfgang.viechtbauer using maastrichtuniversity.nl>
>Sent: Thursday, September 12, 2024 12:53 PM
>To: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-
>project.org>
>Cc: Sigurd Einum <sigurd.einum using ntnu.no>
>Subject: RE: [R-meta] multilevel bivariate metaanalysis in the presence of
>phylogenetic correlation
>
>Dear Sigurd,
>
>The syntax for allowing the random effects for a particular variable to be
>correlated according to a matrix specified via 'R' is:
>
>rma.mv(..., random = ~ 1 | variable, R=list(variable=A), ...)
>
>but since you used '~ trait | phylo' in 'random', rma.mv() does not pick up the
>A matrix (and gives you the warning that something appears to be off).
>Instead, '~ trait | phylo' will add random effects for each level of 'trait' within
>'phylo' (i.e., species) using by default a compound symmetry structure
>(struct="CS"), but given the output, you seem to have changed this to
>struct="UN" for an unstructured var-cov matrix (which, for two levels of 'trait'
>is identical to struct="HCS" for a heteroscedastic CS structure).
>
>Going back to the case of a single trait (and at the risk of explaining things you
>are more familiar with than I am), the amount to which phylogeny plays a role
>is typically quantified in such a model in terms of something like Pagel's
>lambda, which we can estimate with:
>
>rma.mv(..., random = list(~ 1 | species, ~ 1 | phylo), R=list(phylo=A), ...)
>
>where 'species' and 'phylo' are the same variable, just with different names,
>and then computing:
>
>lambda = sigma^2_phylo / (sigma^2_species + sigma^2_phylo).
>
>See Cinar et al. (2021), https://doi.org/10.1111/2041-210X.13760.
>
>Now with two traits, we can easily let sigma^2_species differ for the two traits
>with:
>
>rma.mv(..., random = list(~ trait | species, ~ 1 | phylo), R=list(phylo=A),
>struct="DIAG", ...)
>
>which will give you tau^2_trait1 and tau^2_trait2, based on which we
>compute:
>
>lambda_trait1 = sigma^2_phylo / (tau^2_trait1 + sigma^2_phylo)
>
>and
>
>lambda_trait2 = sigma^2_phylo / (tau^2_trait2 + sigma^2_phylo).
>
>Note that for now I used struct="DIAG" above, just letting the two variances to
>differ but assuming no correlation. One could also use struct="UN"/"HCS",
>which allows for correlation between the random effects for trait 1 and trait 2
>within species. Not sure if I would call this the 'phylogenetic correlation
>between the two traits' (because it is not sigma^2_species or tau^2_trait1 and
>tau^2_trait2 that indicate the degree to which phylogeny matters, but the ratios
>given above).
>
>Leaving terminology aside, we also might want to let sigma^2_phylo differ for
>the two traits. Your attempt with 'random = ~ trait | phylo, R = list(phylo = A)'
>seems logical, but as mentioned above, rma.mv() does not work that way. The
>question is also what we really mean by this and how this translates into the
>correlation matrix for the full dataset.
>
>Say we have these data:
>
>study trait species
>-------------------
>1     1     s1
>1     2     s1
>2     1     s2
>3     1     s3
>4     1     s2
>4     2     s2
>
>and the phylogeny is like this (use a fixed-width font if things don't line up):
>
>       +-- s1
>       |
>    +--+
>    |  |
>+---+  +-- s2
>|   |
>|   +----- s3
>|
>+--------- s4
>
>so A is something like:
>
>   s1 s2 s3 s4
>s1  1 .7 .4  0
>s2     1 .4  0
>s3        1  0
>s4           1
>
>Let's create the dataset and A matrix for this:
>
>dat <- data.frame(study=c(1,1,2,3,4,4),
>                  trait=c(1,2,1,1,1,2),
>                  species=c("s1","s1","s2","s3","s2","s2"))
>dat$species.trait <- paste0(dat$species, ".", dat$trait) dat
>
>A <- matrix(c(1,.7,.4,.7,1,.4,.4,.4,1), nrow=3)
>rownames(A) <- colnames(A) <- c("s1", "s2", "s3") A
>
>Then the correlation matrix for the six rows in the dataset is:
>
>A[dat$species, dat$species]
>
>Let me add the species.trait labels to this:
>
>M <- A[dat$species, dat$species]
>rownames(M) <- colnames(M) <- dat$species.trait M
>
>which looks like this:
>
>     s1.1 s1.2 s2.1 s3.1 s2.1 s2.2
>s1.1  1.0  1.0  0.7  0.4  0.7  0.7
>s1.2  1.0  1.0  0.7  0.4  0.7  0.7
>s2.1  0.7  0.7  1.0  0.4  1.0  1.0
>s3.1  0.4  0.4  0.4  1.0  0.4  0.4
>s2.1  0.7  0.7  1.0  0.4  1.0  1.0
>s2.2  0.7  0.7  1.0  0.4  1.0  1.0
>
>Now if we say that the variance component for this matrix differs for trait 1 and
>trait 2, then I get how this would affect the diagonal (it would alternately be
>tau2^1_trait1 and tau^2_trait2) and I get how this would affect element M[1,3]
>(= M[3,1]) (it would be multiplied by tau2^1_trait1) or M[2,6] (= M[6,2]) (it
>would be multiplied by tau2^1_trait2), but what about element M[1,2], a cross-
>trait correlation for the same species, or element M[1,5], a cross-trait
>correlation for different species? One might have to start adding more variance
>components as multipliers for these types, but this starts to get tricky to ensure
>that the sum of these matrices (times their variance component) stays positive
>definite. I am sure that some people much cleverer than I am have thought
>about this stuff, but I am not sure what is typically done in phylogenetic
>comparative methods to do this kind of analysis.
>
>I am stopping here because this is starting to get quite long, although this is an
>intriguing problem.
>
>Best,
>Wolfgang
>
>> -----Original Message-----
>> From: R-sig-meta-analysis <r-sig-meta-analysis-bounces using r-project.org>
>> On Behalf Of Sigurd Einum via R-sig-meta-analysis
>> Sent: Thursday, September 12, 2024 10:42
>> To: R-sig-meta-analysis using r-project.org
>> Cc: Sigurd Einum <sigurd.einum using ntnu.no>
>> Subject: [R-meta] multilevel bivariate metaanalysis in the presence of
>> phylogenetic correlation
>>
>> Dear all, I have a question regarding multilevel bivariate
>> metaanalysis in the presence of a known correlation structure
>> (phylogenetic relatedness) using metafor.
>>
>> Each study has one or more experiments ( "id"), where each experiment
>> has estimates of two different parameters ("trait", ARR and lambda),
>> together with their variances and covariances (providing
>> variance-covariance matrix for the errors in the parameter estimates,
>> V). Experiments are done on different species, and I have a phylogeny
>> which is used to create the phylogenetic variance-covariance matrix.
>>
>> One potential model I thought could be possible to use, which I
>> thought would allow for different phylogenetic correlations for the
>> different outcomes, as well as different species-level random effects
>> (independent of phylogeny) for the two traits:
>> mod1 <- rma.mv(yi = yi, V = V, mods = ~trait,  data = new_data, random
>> = list(~
>> 1 | study, ~1|id, ~ trait | species,~ trait | phylo), R = list(phylo =
>> A))
>>
>> However, this gives the warning: Argument 'R' specified, but list
>> name(s) not in 'random'.
>> So, on the one hand this suggests to me that the phylogenetic
>> variance- covariance matrix is ignored, and that phylo here is treated
>> as a species random effect independent of phylogenetic correlation
>> ("species" and "phylo" in new_data are identical species names).
>> However, on the other hand the model output does give separate estimates
>for the two levels:
>>
>>
>> Multivariate Meta-Analysis Model (k = 394; method: REML)
>>
>>      logLik     Deviance          AIC          BIC         AICc
>> -37620.5596   75241.1192   75269.1192   75324.5733   75270.2452
>>
>> Variance Components:
>>
>>             estim    sqrt  nlvls  fixed  factor
>> sigma^2.1  0.0001  0.0079     37     no   study
>> sigma^2.2  0.0007  0.0265    197     no      id
>>
>> outer factor: species (nlvls = 77)
>> inner factor: trait   (nlvls = 2)
>>
>>             estim    sqrt  k.lvl  fixed   level
>> tau^2.1    0.0121  0.1100    197     no     ARR
>> tau^2.2    0.0001  0.0117    197     no  lambda
>>
>>         rho.ARR  rho.lmbd    ARR  lmbd
>> ARR           1                -    77
>> lambda  -0.9852         1     no     -
>>
>> outer factor: phylo (nlvls = 77)
>> inner factor: trait (nlvls = 2)
>>
>>               estim    sqrt  k.lvl  fixed   level
>> gamma^2.1    0.0099  0.0994    197     no     ARR
>> gamma^2.2    0.0002  0.0155    197     no  lambda
>>
>>         phi.ARR  phi.lmbd    ARR  lmbd
>> ARR           1   -0.4692      -    77
>> lambda  -0.4692         1     no     -
>>
>> Test for Residual Heterogeneity:
>> QE(df = 388) = 8590750.5336, p-val < .0001
>>
>> Test of Moderators (coefficients 1:6):
>> QM(df = 6) = 34669.2288, p-val < .0001
>>
>> So, my questions are
>> (1) which of these interpretations are correct? If the model does
>> include the phylogenetic correlation structure, (2) is the value
>> -0.4692 the phylogenetic correlation between the two traits? (3) What
>> does the value -0.9852 from the
>> trait|species term represent? And (4) how do one calculate I2 for such
>> trait|a model
>> (I could only find examples from models that are either multilevel or
>> bivariate, and not both at the same time?)
>>
>> Any insights will be appreciated!
>> Best wishes,
>> Sigurd
>>
>> Prof. Sigurd Einum
>> Dept. Biology, NTNU, Norwayhttps://www.ntnu.no/ansatte/sigurd.einum