[R-meta] R-help: rma() in metafor drops levels in variables that correlates perfectly with levels in another variable

Thu Nov 2 05:49:29 CET 2017

Dear Dr Viechtbauer,

Thank you very much for your explanation. Indeed I have found issues with
collinearity and will try other means to deal with the missing data in
those levels.

Warm regards,
Wey Wen

On Thu, Oct 26, 2017 at 3:40 AM, Viechtbauer Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:

> Dear Wey Wen,
>
> I think your example doesn't quite show what you meant it to show. For the
> data you provided, all levels are estimable. I think you meant 'Study C' to
> have the values '1 2 1'. And then indeed, the coefficient for level 3 of
> variable 2 is not estimable. This is not something specific to metafor, but
> applies to linear models in general. For example, you will find lm() to
> behave in the same way (except that it shows the coefficient as NA, while
> metafor drops it from the output).
>
> To illustrate:
>
> dat <- read.table(header=TRUE, text = "
> study x1 x2 x3
> 'Study A'   1    1     2
> 'Study B'   1    2     2
> 'Study C'   1    2     1
> 'Study D'   2    3     3
> 'Study E'   2    3     1
> 'Study F'   2    3     2
> 'Study G'   3    1     3
> 'Study H'   3    1     3
> 'Study I'   3    2     2")
>
> dat$y <- rnorm(9)
> res <- lm(y ~ factor(x1) + factor(x2) + factor(x3), data=dat)
> summary(res)
>
> This happens because the model matrix is not of full rank. Take a look at:
>
> model.matrix(res)
>
> You will see that variable 'factor(x1)2' and 'factor(x2)3' are identical.
>
> So, to answer your questions:
>
> 1) Yes, this is intentional.
>
> 2) It is indeed because of collinearity.
>
> I don't know why you think that "If collinearity is an issue, it only
> applies to the specific level within a variable, not between variables." It
> may be worth reviewing: https://en.wikipedia.org/wiki/Multicollinearity
>
> 3) There are ways of estimating all coefficients, but (a) this requires
> taking a generalized inverse and (b) leads to a non-unique solution, so one
> could argue that the results will be arbitrary. I don't think this should
> be done (and apparently neither do the authors of lm() which I think says a
> lot).
>
> Best,
> Wolfgang
>
> --
> Wolfgang
> 
> Viechtbauer, Ph.D., Statistician | Department of Psychiatry and
> Neuropsychology | Maastricht University | P.O. Box 616 (VIJV1) | 6200 MD
> Maastricht, The Netherlands | +31 (43) 388-4170 | http://www.wvbauer.com
>
> -----Original Message-----
> From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-
> bounces at r-project.org] On Behalf Of LIM, Wey Wen
> Sent: Wednesday, 25 October, 2017 19:04
> To: r-sig-meta-analysis at r-project.org
> Subject: [R-meta] R-help: rma() in metafor drops levels in variables that
> correlates perfectly with levels in another variable
>
> Dear R-sig-meta-analysis list members,
>
> I would like to seek your help/advice in solving the problem I have
> encountered recently.
>
> When using the rma(... mods=~var1+var2+var3...) command for
> meta-regression on subsets of datasets, metafor has a convenient
> feature which automatically drops levels within variables that either
> do not have studies/data. However, I have also found that it also
> drops some of the levels within variables that correspond perfectly
> with levels of other variables.
>
> For example, a dataset like the one below will result in the dropping
> of level 3 in variable 2 (because it corresponds to only level 2 of
> variable 1) despite the level being represented in the dataset.
>
> Variable [,1] [,2] [,3]
> Study A   1    1     2
> Study B   1    2     2
> Study C   1    3     1
> Study D   2    3     3
> Study E   2    3     1
> Study F   2    3     2
> Study G   3    1    3
> Study H   3    1    3
> Study I    3     2    2
>
> As there is still data for level 3 in variable 2, we would still like
> to report the regression coefficient for that level.
>
> Thus, my questions are:
> 1) Is the dropping of levels an intentional feature in metafor due to
> statistical/computational reasons?
> 2) If it is, what is the reason behind this feature? (If collinearity
> is an issue, it only applies to the specific level within a variable,
> not between variables.)
> 3) If it is not, what is the possible reason behind this and can we
> program rma() to keep the level despite it being perfectly correlated
> with a level of only one other variable?
>
> Have anyone of you encountered this problem before? If so, how did you
> overcome it?
>
> Warm regards,
> Wey Wen
>
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

	[[alternative HTML version deleted]]