[R-meta] R-help: rma() in metafor drops levels in variables that correlates perfectly with levels in another variable

Wed Oct 25 21:40:04 CEST 2017

Dear Wey Wen,

I think your example doesn't quite show what you meant it to show. For the data you provided, all levels are estimable. I think you meant 'Study C' to have the values '1 2 1'. And then indeed, the coefficient for level 3 of variable 2 is not estimable. This is not something specific to metafor, but applies to linear models in general. For example, you will find lm() to behave in the same way (except that it shows the coefficient as NA, while metafor drops it from the output).

To illustrate:

dat <- read.table(header=TRUE, text = "
study x1 x2 x3
'Study A'   1    1     2
'Study B'   1    2     2
'Study C'   1    2     1
'Study D'   2    3     3
'Study E'   2    3     1
'Study F'   2    3     2
'Study G'   3    1     3
'Study H'   3    1     3
'Study I'   3    2     2")

dat$y <- rnorm(9)
res <- lm(y ~ factor(x1) + factor(x2) + factor(x3), data=dat)
summary(res)

This happens because the model matrix is not of full rank. Take a look at:

model.matrix(res)

You will see that variable 'factor(x1)2' and 'factor(x2)3' are identical.

So, to answer your questions:

1) Yes, this is intentional.

2) It is indeed because of collinearity.

I don't know why you think that "If collinearity is an issue, it only applies to the specific level within a variable, not between variables." It may be worth reviewing: https://en.wikipedia.org/wiki/Multicollinearity

3) There are ways of estimating all coefficients, but (a) this requires taking a generalized inverse and (b) leads to a non-unique solution, so one could argue that the results will be arbitrary. I don't think this should be done (and apparently neither do the authors of lm() which I think says a lot).

Best,
Wolfgang

-- 
Wolfgang Viechtbauer, Ph.D., Statistician | Department of Psychiatry and 
Neuropsychology | Maastricht University | P.O. Box 616 (VIJV1) | 6200 MD 
Maastricht, The Netherlands | +31 (43) 388-4170 | http://www.wvbauer.com 

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of LIM, Wey Wen
Sent: Wednesday, 25 October, 2017 19:04
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] R-help: rma() in metafor drops levels in variables that correlates perfectly with levels in another variable

Dear R-sig-meta-analysis list members,

I would like to seek your help/advice in solving the problem I have
encountered recently.

When using the rma(... mods=~var1+var2+var3...) command for
meta-regression on subsets of datasets, metafor has a convenient
feature which automatically drops levels within variables that either
do not have studies/data. However, I have also found that it also
drops some of the levels within variables that correspond perfectly
with levels of other variables.

For example, a dataset like the one below will result in the dropping
of level 3 in variable 2 (because it corresponds to only level 2 of
variable 1) despite the level being represented in the dataset.

Variable [,1] [,2] [,3]
Study A   1    1     2
Study B   1    2     2
Study C   1    3     1
Study D   2    3     3
Study E   2    3     1
Study F   2    3     2
Study G   3    1    3
Study H   3    1    3
Study I    3     2    2

As there is still data for level 3 in variable 2, we would still like
to report the regression coefficient for that level.

Thus, my questions are:
1) Is the dropping of levels an intentional feature in metafor due to
statistical/computational reasons?
2) If it is, what is the reason behind this feature? (If collinearity
is an issue, it only applies to the specific level within a variable,
not between variables.)
3) If it is not, what is the possible reason behind this and can we
program rma() to keep the level despite it being perfectly correlated
with a level of only one other variable?

Have anyone of you encountered this problem before? If so, how did you
overcome it?

Warm regards,
Wey Wen