[R-meta] Moderator analysis: Subsample analysis vs. model without intercept in CHE models.

Mon May 1 20:13:10 CEST 2023

Hi James,

thank you for your very detailed and helpful answer.

“So which approach is correct? I would argue that this is a question that requires context knowledge to answer. Is it *theoretically* reasonable to partially pool across subjects? Or to partially pool across teaching aspects? Or both (i.e., do all subjects and all teaching aspects in one model)? “
I find the question of the theoretical justifiability of "borrowing" information between effect sizes or subsamples not easy to answer. I already find the theoretical justifiability of “correlated correlations” difficult to comprehend in terms of content of the included studies.

Here are some more details about the structure of our data: We are analyzing studies that report correlations between teaching characteristics and learning outcomes. Most of the studies report several teaching characteristics and their correlations with one outcome Y for one subject. We coded the teaching characteristics using categories a, b, c…. We are interested in the overall correlation between teaching characteristics and Y. Additionally, we want to know if the correlations differ with regard of the teaching categories (a, b, c…) and with regard of the subject of the class.
Most of the studies reported only one subject or sample and only one effect within several teaching categories (a,b,c…). The ES within one study are usually measured with a multidimensional questionnaire using the same sample. So they are dependent within the sample. Usually, a, b, c are correlated to each other (average r=.50). Therefore we assumed, that the correlations/ES within one sample (a ~~ Y, b~~Y, c~~Y,…) are correlated with rho = 0.5 and estimated a correlated and hierarchical effects model.
Is it at all correct to assume that if a, b, c.... correlate with each other, then also the correlations with Y (i.e., a ~~ Y, b~~Y...) correlate with each other? Or would a hierarchical effects model (possibly with CRVE) be sufficient?

Different subjects were measured with different samples. Most of the studies included only measures of one subject, few studies (n=4) of several subjects within one study, using the same questionnaire (and different learning achievement tests). One study reported effects of several samples of the same subject, using the same questionnaire.
Therefore, I also tested a 4-level random model with study / sample(subject) / ES, but this fitted only very slightly better to the data and the variance distribution between the different levels is not stable. Best model fit was reached using correlated effects within samples, nested in a hierarchical structure “study / ES” without considering the sample-level in the hierarchical structure.

So, in conclusion, one could argue both for and against joint estimation / pooling based on the theoretical assumptions and data structure. After some consideration, I am tending towards the following: the different teaching subjects use similar or even the same measurement methods for the teaching characteristics. Both instructional aspects and learning achievement are collected through similar measurement methods (student questionnaires and tests), and the former have been found to be correlated with each other (see above). So it should actually be possible to pool. The only argument against this would be that, based on some theoretical considerations, we suspect differential effects with respect to some aspects of instruction and subjects, which, however, can only be proven in a few cases.
Is this argumentation regarding the choice of model comprehensible and stringent?

Thank you!
Kind regards,
Sebastian

Von: James Pustejovsky <jepusto using gmail.com>
Gesendet: Freitag, 28. April 2023 17:12
An: R Special Interest Group for Meta-Analysis <r-sig-meta-analysis using r-project.org>
Cc: Röhl, Sebastian <sebastian.roehl using uni-tuebingen.de>
Betreff: Re: [R-meta] Moderator analysis: Subsample analysis vs. model without intercept in CHE models.

Hi Sebastian,

The two approaches you're looking at differ in multiple respects.

Your first approach looks at the subset where tq_ca == 1 and estimates a model with both math and language outcomes. The effect sizes for different subjects are modeled as correlated due to the study-level random effect (which is common across outcomes) and to the assumed correlation between the ES estimates as represented in V_mat_ca. Because of the correlation between ES for different subjects, the average effects will be estimated by "borrowing information" (or partially pooling) across subjects, which has the effect of pulling the estimates towards each other a bit.

If you wanted to estimate average ES for each subject without the borrowing of information, based only on the ES for that subject, you could do:

V_mat_ca_sub <- impute_covariance_matrix(daten_ca$class_var,
                                       cluster = daten_ca$studynr,
                                       subgroup = daten_ca$subject,
                                       r = rho,
                                       smooth_vi = TRUE)
model_ca_sub <- rma.mv<http://rma.mv/>(r_gesamt_z, V_mat_ca_sub, random =list(~ subject | studynr, ~ subject | nummer), struct = c("DIAG","DIAG"),
                        mods =~ -1 + subject_math + subject_langall,
                        data=daten_ca)
robust(model_ca_sub, daten_ca$studynr)

Comparing this to your first approach would let you isolate the consequences of borrowing information.

Your second approach looks at each subject area in separate models, but includes data from multiple teaching aspects. The effect sizes for different teaching aspects are modeled as correlated due to the study-level random effect (which is now common across teaching aspects) and to the assumed correlation between ES estimates. Again, because of these correlations, the average effects for each teaching aspect will be estimated by borrowing information across teaching aspects. You could adapt the code above but use subgroups by tq instead of by subject to isolate how the borrowing of information affects the estimates from your second set of models.

So which approach is correct? I would argue that this is a question that requires context knowledge to answer. Is it *theoretically* reasonable to partially pool across subjects? Or to partially pool across teaching aspects? Or both (i.e., do all subjects and all teaching aspects in one model)?

James

On Fri, Apr 28, 2023 at 5:38 AM Röhl, Sebastian via R-sig-meta-analysis <r-sig-meta-analysis using r-project.org<mailto:r-sig-meta-analysis using r-project.org>> wrote:
Hello,
I am currently conducting an analysis of about 500 ES in 50 teaching studies. Two ordinal moderators appear here: The teaching subject (e.g. math, language...) and the teaching aspect (e.g. athmosphere, clarity...).  I'm using a correlated and hierarchical model using "impute_covariance_matrix" from clubSandwich package.
I have a problem with differing estimates using different ways in conducting moderator analyses.
For example, I would like to analyze how ES for athmosphere differ between math and language subjects. A simple selection of the subset via "subset =" in the rma.mv<http://rma.mv> function is not possible, because the imputed covariance matrix does not have the correct dimensioning.

I have compared two possibilities (syntaxes at the end of this message):
A: I select only all ES related to athmosphere in math and language subjects and impute the covariance matrix for them. Then I analyze them with the subject as moderator, using a model without intercept.
Result:
                 estimate      se¹     tval¹  df¹    pval¹    ci.lb<http://ci.lb>¹    ci.ub¹
subject_math       0.1262  0.0204    6.1888   12   <.0001    0.0818    0.1707   ***
subject_langall   -0.0809  0.0102   -7.9658   12   <.0001   -0.1030   -0.0588   ***
(these are nearly the same estimates as using two subsets for ES with "math & atmosphere" and "language & atmosphere" and conducting separate analyses.)

B: I form two subsets for the subjects math and language, impute the covariance matrix and analyze "atmosphere" for the two subsets separately with the teaching aspects as moderator, using a model without intercept.

Math:          tq_ca           0.2088  0.0378   5.5213   17   <.0001    0.1290   0.2885   ***

Language:      tq_ca          -0.0227 0.0230 -0.9863 2.42 0.4120 -0.1065 0.0614

The numbers of effect sizes and studies are correct in the respective subsets and analyses.
Noteworthy: an analysis of all teaching aspects across all subjects in an interceptless model yields an estimate of about 0.20 for "atmosphere". In contrast, if I select only the subset with athmosphere ES, the result is an Estimate of 0.13.
Where do these large differences in the estimates come from and what would be the correct approach?

Thanks a lot for your help!

Best,
Sebastian

Syntax A:
daten_ca <- subset(daten, (daten$tq_ca==1 & (daten$subject_math==1 | daten$subject_langall==1)))
V_mat_ca <- impute_covariance_matrix(daten_ca$class_var,
                                       cluster = daten_ca$studynr,
                                       r = rho,
                                       smooth_vi = TRUE)
model_ca <- rma.mv<http://rma.mv>(r_gesamt_z, V_mat_ca, random =~ 1 | studynr / nummer,
                        mods =~ -1 + subject_math + subject_langall,
                        data=daten_ca)
robust(model_ca, daten_ca$studynr)

Syntax B:
daten_math <- subset(daten, daten$subject_math==1)
daten_langall <- subset(daten, daten$subject_langall==1)
V_mat_math <- impute_covariance_matrix(daten_math$class_var,
                                       cluster = daten_math$samplenr,
                                       r = rho,
                                       smooth_vi = TRUE)
V_mat_langall <- impute_covariance_matrix(daten_langall$class_var,
                                       cluster = daten_langall$samplenr,
                                       r = rho,
                                       smooth_vi = TRUE)
model_math <- rma.mv<http://rma.mv>(r_gesamt_z, V_mat_math, random =~ 1 | studynr / nummer,
                     mods =~ -1 + tq_ca + tq_cm + tq_cont + tq_pract + tq_assess +
                       tq_sup_em + tq_sup_learn + tq_adapt + tq_srl + tq_all + tq_other,
                     data=daten_math)
robust(model_math, daten_math$studynr)

model_langall <- rma.mv<http://rma.mv>(r_gesamt_z, V_mat_langall, random =~ 1 | studynr / nummer,
                     mods =~ -1 + tq_ca + tq_cm + tq_cont + tq_pract + tq_assess +
                       tq_sup_em + tq_sup_learn + tq_adapt + tq_srl + tq_all + tq_other,
                     data=daten_langall)
robust(model_langall, daten_langall$studynr)

****************************
Dr. Sebastian Röhl
Eberhard Karls Universität Tübingen
Institute for Educational Science
Tübingen School of Education (TüSE)
Wilhelmstraße 31 / Room 302
D-72074 Tübingen
Germany

Phone: +49 7071 29-75527
Fax: +49 7071 29-35309
Email: sebastian.roehl using uni-tuebingen.de<mailto:sebastian.roehl using uni-tuebingen.de><mailto:sebastian.roehl using uni-tuebingen.de<mailto:sebastian.roehl using uni-tuebingen.de>>
Twitter: @sebastian_roehl  @ResTeacherEdu

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org<mailto:R-sig-meta-analysis using r-project.org>
To manage your subscription to this mailing list, go to:
https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis

	[[alternative HTML version deleted]]