[R-meta] rma.mv predict() with newmods

Fri May 3 10:21:55 CEST 2019

Thanks a lot Wolfgang,

Both methods now indeed work on my full model, which indeed has much more data (3941 rows) than the example (50 rows).

One question though, since I’m using rma.mv and my random structure has ~ inner | outer terms, I understand I should use the use the tau2.levels and gamma2.levels arguments to specify the levels of the inner factors IF I’m interested in the credibility/prediction intervals.

mod = rma.mv(yi, vi, method = 'ML', struct="UN", sparse=TRUE, data=dat,
             mods = ~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + classMR:classOR:time:I(totalN^2) + cropSys + idF,
             random = list(~1|ref, ~1|idRow, ~ treatment|idSite, ~ treatment|idSite.time))

> levels(dat$treatment)
[1] "Control" "MR"      "OR"      "ORMR”

But the arguments need to be vectors of the same length as the predicted values, i.e. 813240 x 54 (more levels than in the example), so this doesn’t work:
pred = predict(mod, newmods = X, tau2.levels = c('Control', 'OR', 'MR', 'ORMR'), gamma2.levels = c('Control', 'OR', 'MR', 'ORMR’))

How should I specify the tau2/gamma2.levels arguments?
And I don’t understand how I then could choose for which level combination(s) the credibility/prediction interval should be provided?

In case it might help you help me, this is the full model as an .rds R object: https://we.tl/t-BsfxyI9eBu

with much appreciation for your work and help.

Gil Gram
PhD researcher  |  Natural resource management/CCAFS
International Institute of Tropical Agriculture (IITA)
East African Hub/Kampala/Country office
Address: IITA-Uganda Plot 15B, Naguru East Road. P.O Box 7878, Kampala
Mobile: +256 755 315236  |  Belgium Mobile: +32 484 981200
Skype: gil.gram

On 2 May 2019, at 19:32, Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl<mailto:wolfgang.viechtbauer using maastrichtuniversity.nl>> wrote:

Hi Gil,

Thanks for the reproducible example (but I guess 'dtRMA_rate' should have been 'dat').

First method: This can't work for various reasons. First of all, X is a character matrix:

head(X)

Also, 'newmods' doesn't automatically process factors, polynomials, interactions, and so on for you. You have to do that yourself beforehand (or use model.matrix() for that).

Second method: The model is overparameterized so several coefficients are being dropped to multicollinearity. See:

mod$coef.na

model.matrix() doesn't check for that, so it still includes those coefficients.

This will work:

X <- model.matrix(~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + classMR:classOR:time:I(totalN^2) + cropSys + idF, data=dat)
X <- X[,!mod$coef.na]
X <- X[,-1]
predict(mod, newmods = X)

But going back to the first approach, this will work:

X <- expand.grid(totalN = 0:250
                          , classOR = levels(dat$classOR)
                          , classMR = levels(dat$classMR)
                          , time = 0:26
                          , cropSys = levels(dat$cropSys)
                          , idF = levels(dat$idF))

X <- model.matrix(~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + classMR:classOR:time:I(totalN^2) + cropSys + idF, data=X)
X <- X[,!mod$coef.na]
X <- X[,-1]
pred <- predict(mod, newmods = X)

model.matrix() does not work for factors with a single level (which I cannot do anything about, since this is a function from the stats package and not metafor), so I removed the [1] part to make this work.

If you reinstall the 'devel' version, you can also replace that long model.matrix() line with:

X <- model.matrix(mod$formula.mods, data=X)

Finally, I hope you actually have much more data than you are showing. Otherwise, the random effects structure is also way overparameterized. Some combinations of levels only occur twice, so those correlations are essentially non-identified.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Gram, Gil (IITA)
Sent: Thursday, 02 May, 2019 17:44
To: r-sig-meta-analysis using r-project.org<mailto:r-sig-meta-analysis using r-project.org>
Subject: Re: [R-meta] rma.mv predict() with newmods

Hi Wolfgang,

Thanks for your reply.

Installing the devel version of Metafor did not completely solve the issue, only for the second option described below. There it could not find a certain variable in the model. I would however be more interested in the first option as I can see better for what variable levels I want the predictions.

This time, indeed, I pasted a reproducible example. I hope it works conveniently for you...

Reproducible example:

library(metafor) # packageVersion("metafor”) '2.1.0'

dat = structure(list(idRow = structure(1:50, .Label = c("1", "2", "5",
"6", "9", "10", "13", "14", "17", "18", "21", "22", "25", "26",
"29", "30", "37", "38", "41", "42", "119", "120", "121", "122",
"123", "124", "125", "126", "127", "128", "141", "153", "154",
"155", "156", "157", "158", "159", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "170", "171"), class = "factor"),
   ref = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
   3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
   5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1992 Smaling et al",
   "1994 Horst and Hardter", "1996 Mariki et al", "1997 Murwira et al",
   "1998 Kihanda et al"), class = "factor"), idSite = structure(c(1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
   3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L,
   6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
   6L, 6L, 6L, 6L), .Label = c("1992 Smaling et al_Kenya_Homa Bay",
   "1992 Smaling et al_Kenya_Shimba Hills", "1994 Horst and Hardter_Ghana_Nyankpala",
   "1996 Mariki et al_Tanzania_Selian arusha", "1997 Murwira et al_Zimbabwe_Chinyika",
   "1998 Kihanda et al_Kenya_Embu Kihanda"), class = "factor"),
   idSite.time = structure(c(1L, 1L, 6L, 6L, 10L, 10L, 12L,
   12L, 2L, 2L, 7L, 7L, 11L, 11L, 13L, 13L, 3L, 3L, 8L, 8L,
   4L, 4L, 4L, 4L, 4L, 9L, 9L, 9L, 9L, 9L, 5L, 14L, 14L, 14L,
   14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
   15L, 16L, 16L, 16L), .Label = c("1992 Smaling et al_Kenya_Homa Bay.1",
   "1992 Smaling et al_Kenya_Shimba Hills.1", "1994 Horst and Hardter_Ghana_Nyankpala.1",
   "1996 Mariki et al_Tanzania_Selian arusha.1", "1997 Murwira et al_Zimbabwe_Chinyika.1",
   "1992 Smaling et al_Kenya_Homa Bay.2", "1992 Smaling et al_Kenya_Shimba Hills.2",
   "1994 Horst and Hardter_Ghana_Nyankpala.2", "1996 Mariki et al_Tanzania_Selian arusha.2",
   "1992 Smaling et al_Kenya_Homa Bay.3", "1992 Smaling et al_Kenya_Shimba Hills.3",
   "1992 Smaling et al_Kenya_Homa Bay.4", "1992 Smaling et al_Kenya_Shimba Hills.4",
   "1998 Kihanda et al_Kenya_Embu Kihanda.8", "1998 Kihanda et al_Kenya_Embu Kihanda.9",
   "1998 Kihanda et al_Kenya_Embu Kihanda.10"), class = "factor"),
   nRep = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
   2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
   3L, 3L, 3L, 6L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), treatment = structure(c(1L,
   2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
   1L, 2L, 1L, 2L, 1L, 2L, 3L, 4L, 4L, 1L, 2L, 3L, 4L, 4L, 1L,
   1L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 1L, 2L, 3L, 3L, 3L, 4L, 4L,
   4L, 1L, 2L, 3L), .Label = c("Control", "MR", "OR", "ORMR"
   ), class = "factor"), classOR = structure(c(1L, 1L, 1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
   1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 3L,
   3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
   3L), .Label = c("zero", "three", "Manure"), class = "factor"),
   classMR = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
   2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
   2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
   1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L), .Label = c("0",
   "1"), class = "factor"), time = c(0, 0, 1, 1, 2, 2, 3, 3,
   0, 0, 1, 1, 2, 2, 3, 3, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1,
   1, 1, 1, 0, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8,
   8, 9, 9, 9), cropSys = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("biomass",
   "rotation"), class = "factor"), idF = structure(c(1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
   2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
   1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
   1L, 1L, 1L), .Label = c("NPK", "blanket"), class = "factor"),
   totalN = c(0, 50, 0, 50, 0, 50, 0, 50, 0, 50, 0, 50, 0, 50,
   0, 50, 0, 80, 0, 80, 0, 40, 27, 54, 67, 0, 40, 27, 54, 67,
   0, 0, 50, 32.5, 65, 97.5, 82.5, 115, 147.5, 0, 50, 32.5,
   65, 97.5, 82.5, 115, 147.5, 0, 50, 32.5), yield = c(3.2,
   6.3, 3, 5.4, 3.6, 6.9, 4.6, 7.1, 1.6, 2.1, 2.1, 3.1, 1.3,
   2.5, 1.3, 2.9, 1.8, 2.44, 2.2, 2.55, 4.37, 4.76, 4.34, 4.86,
   4.2, 4.34, 5.54, 5.08, 3.97, 4.05, 0.75, 1.81, 4.69, 1.69,
   2.93, 2.94, 3.96, 5.22, 4.76, 1.48, 2.29, 2.14, 3.09, 3.81,
   2.36, 2.83, 6.61, 1.35, 2.13, 2.06), sdYield = c(0.201, 0.201,
   0.158, 0.158, 0.232, 0.232, 0.272, 0.272, 0.07, 0.07, 0.095,
   0.095, 0.17, 0.17, 0.407, 0.407, 0.108, 0.108, 0.11, 0.11,
   0.634, 0.634, 0.634, 0.634, 0.634, 0.239, 0.239, 0.239, 0.239,
   0.239, 0.512, 0.204, 0.204, 0.204, 0.203, 0.204, 0.204, 0.204,
   0.204, 0.203, 0.203, 0.203, 0.203, 0.203, 0.203, 0.203, 0.203,
   0.145, 0.145, 0.145)), row.names = c(NA, -50L), class = "data.frame")

dat = escalc(measure="MN", mi=yield, sdi=sdYield, ni=nRep, data=dat)

#model takes a minute
mod = rma.mv(yi, vi, method = 'ML', struct="UN", sparse=TRUE, data=dat,
            mods = ~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + classMR:classOR:time:I(totalN^2) + cropSys + idF,
            random = list(~1|ref, ~1|idRow, ~ treatment|idSite, ~ treatment|idSite.time))
summary(mod)

# 1. First method
X <- as.matrix(expand.grid(totalN = 0:250
                          , classOR = levels(dat$classOR)
                          , classMR = levels(dat$classMR)
                          , time = 0:26
                          , cropSys = levels(dat$cropSys)[1]
                          , idF = levels(dat$idF)[1]))
# X <- X[,-1] #not sure about the intercept
predict(mod, newmods = X, tau2.levels = c('Control', 'OR', 'MR', 'ORMR'), gamma2.levels = c('Control', 'OR', 'MR', 'ORMR')) #levels(dat$treatment)
#Error Multiple matches for the same variable name

# 2. Second method
X <- model.matrix(~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + classMR:classOR:time:I(totalN^2) + cropSys + idF, data=dtRMA_rate)
X <- X[,-1] #by default adds the intercept automatically, it needs to be removed from what you feed to newmods
predict(mod, newmods = X, tau2.levels = c('Control', 'OR', 'MR', 'ORMR'), gamma2.levels = c('Control', 'OR', 'MR', 'ORMR'))
#Error: Could not find variable 'cropSysintercrop' in the model —> but the variable not found keeps on changing if I run it again

Kind regards!

Gil Gram
PhD researcher  |  Natural resource management/CCAFS
International Institute of Tropical Agriculture (IITA)
East African Hub/Kampala/Country office
Address: IITA-Uganda Plot 15B, Naguru East Road. P.O Box 7878, Kampala
Mobile: +256 755 315236  |  Belgium Mobile: +32 484 981200
Skype: gil.gram

On 30 Apr 2019, at 21:49, Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl<mailto:wolfgang.viechtbauer using maastrichtuniversity.nl><mailto:wolfgang.viechtbauer using maastrichtuniversity.nl>> wrote:

Dear Gil,

I have a hard time reading this code. A small and fully reproducible example that illustrates the problem would be much easier to work with.

But did you read the rest of the thread you posted? Especially:

https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-February/000585.html

So, in case you haven't done so, make sure you (re)install the latest 'devel' version of the metafor package.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Gram, Gil (IITA)
Sent: Wednesday, 17 April, 2019 15:40
To: r-sig-meta-analysis using r-project.org<mailto:r-sig-meta-analysis using r-project.org>
Subject: [R-meta] rma.mv predict() with newmods

Dear Wolfgang and the Metafor community,

I’m running in the same problem Cesar Terrer Moreno ran into a while ago without response (https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-February/000582.html). When running the predict(newmods) function on my rma.mv model, I get the error "Multiple matches for the same variable name.” I’ve tried 2 different ways, and also tried looking at the function code on Github, in vain.

# My model based on a dataset of 3941 rows:
# ML_intRate_TiQ2 = rma.mv(yi, vi, method = 'ML', struct="UN", sparse=TRUE, data=dat, # mods = ~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + cropSys + idF, # random = list(~1|ref, ~1|idRow, ~ treatment|idSite, ~ treatment|idSite.time))
# where
# classOR —> "zero" "one" "two" "three" "four" "Manure"
# classMR —> "0" “1"
# totalN —> numerical value ranging [0:720]
# time —> numerical value ranging [0:26]
# First level of cropSys = ‘biomass'
# First level of idF = ’NPK’
# treatment —> "Control" "MR" "OR" "ORMR"

# 1. First method X <- expand.grid(totalN = 0:250 , classOR = levels(dtRMA$classOR) , classMR = levels(dtRMA$classMR) # , time = 0:26 , cropSys = levels(dtRMA$cropSys)[1] , idF = levels(dtRMA$idF)[1]) predict(MOD, newmods = X, tau2.levels = c('Control', 'OR', 'MR', 'ORMR'), gamma2.levels = c('Control', 'OR', 'MR', 'ORMR'))
# 2. Second method X <- model.matrix(~ classOR:totalN + classMR:totalN + classMR:classOR:totalN + classOR:I(totalN^2) + classMR:I(totalN^2) + classMR:classOR:I(totalN^2) + classMR:classOR:time:totalN + cropSys + idF, data=dtRMA_rate) predict(MOD, newmods = X, tau2.levels = c('Control', 'OR', 'MR', 'ORMR'), gamma2.levels = c('Control', 'OR', 'MR', 'ORMR'))

Looking forward to some feedback!

Gil Gram
PhD researcher  |  Natural resource management/CCAFS
International Institute of Tropical Agriculture (IITA)
East African Hub/Kampala/Country office
Address: IITA-Uganda Plot 15B, Naguru East Road. P.O Box 7878, Kampala
Mobile: +256 755 315236  |  Belgium Mobile: +32 484 981200
Skype: gil.gram

	[[alternative HTML version deleted]]