[R-sig-ME] Including a correlation matrix with the metafor package
Eike-Lena Neuschulz
Eike-Lena.Neuschulz at senckenberg.de
Thu May 7 16:20:44 CEST 2015
Dear R mailing list,
I have a problem using the metafor package. I would like to include a spatial correlation matrix with the R argument. I get an error message about levels in the random effects that are not represented in the rows/columns of the distance matrix. However, when comparing column and row names of the matrix with the levels of the random effect, they seem to be identical. See the output below.
I very much appreciate any help. Thank you very much!
Best,
Eike Lena Neuschulz
> SMDpot = rma.mv(yi, vi, random = ~ 1 |studyChr,
+ mods = ~ process -1 ,
+ R = list(studyChr = distMatModel),
+ data = SMD_big, method = "REML",
+ Rscale=FALSE)
Error in rma.mv(yi, vi, random = ~1 | studyChr, mods = ~process - 1, R = list(studyChr = distMatModel), :
There are levels in 'studyChr' for which there are no rows/columns in the corresponding 'R' matrix.
>
>
> identical(row.names(distMatModel),levels(SMD_big$studyChr))
[1] TRUE
> identical(colnames(distMatModel),levels(SMD_big$studyChr))
[1] TRUE
>
>>> 07.05.15 12.01 Uhr >>>
Send R-sig-mixed-models mailing list submissions to
r-sig-mixed-models at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
or, via email, send a message with subject or body 'help' to
r-sig-mixed-models-request at r-project.org
You can reach the person managing the list at
r-sig-mixed-models-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-sig-mixed-models digest..."
Today's Topics:
1. Interpreting the ACME in mediation (Elizabeth Pasipanodya)
2. Re: Including an autocorrelation term dramatically reduces
random variance (lme) (Thierry Onkelinx)
3. Re: question about gls in nlme (Thierry Onkelinx)
----------------------------------------------------------------------
Message: 1
Date: Wed, 6 May 2015 17:39:10 -0400
From: Elizabeth Pasipanodya
To: "'r-sig-mixed-models at r-project.org'"
Subject: [R-sig-ME] Interpreting the ACME in mediation
Message-ID:
<8DC1EA0EB7E3FD419D9B3BE76BD403867B15439C60 at razor.psych.udel.edu>
Content-Type: text/plain; charset="UTF-8"
Hello All,
I am a doctoral student working on data from a project focusing on couples coping with breast cancer. I have a mediation model with two simultaneous predictors, X1 (a positive relationship event) and X2 (a relationship conflict), a mediator, M (a measure of intimacy), and an outcome Y (a measure of anxiety). X1 and X2 are both binary while M is continuous. Additionally, Y follows a count distribution. These variables are repeatedly measured for each individual across a number of days and, thus, I used the R packages lme4 and mediation to conduct my analyses.
I would like to double-check the meaning of the ACME. My understanding is that it represents the expected difference in the potential outcome when the mediator takes the value that it would have under the treatment condition compared to the control condition, while the treatment condition is held constant. Since I have a count outcome, should I report and interpret my ACME in the same units as my count outcome, as one would a rate ratio? That is, the ACME shall represent, in the logs of expected counts, the estimated average change in Y among the treatment group (those with a relationship event) as a result of M (intimacy) rather than directly from X1 (positive relationship event) and X2 (negative relationship event)?
For instance, based on the mediation output below, could I say something like the following --- on a day in which participants reported experiencing at least one negative relationship event, their estimated average change in anxiety due to changes in intimacy was 1.08 times (e^0.07485) that of those without a relationship conflict, controlling for the occurrence of positive relationship events?
> med2 <- mediate(apath, bpath, treat = "X2", mediator = "M", sims=5000, control.value = -0.5, treat.value = 0.5, dropobs=TRUE, method = "boot", boot.type = "bca")
> summary(med2)
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Mediator Groups: ID
Outcome Groups: ID
Output Based on Overall Averages Across Groups
Estimate 95% CI Lower 95% CI Upper p-value
ACME (control) 0.07890 0.02639 0.15107 0.00
ACME (treated) 0.07080 0.02354 0.13856 0.00
ADE (control) -0.06645 -0.32283 0.17712 0.50
ADE (treated) -0.07455 -0.35353 0.19218 0.50
Total Effect 0.00435 -0.25810 0.26892 0.97
Prop. Mediated (control) 0.16404 -11.47515 12.32893 0.97
Prop. Mediated (treated) 0.20262 -9.82740 10.87888 0.97
ACME (average) 0.07485 0.02583 0.14364 0.00
ADE (average) -0.07050 -0.33728 0.18478 0.50
Prop. Mediated (average) 0.18333 -10.80037 11.56373 0.97
Sample Size Used: 602
Simulations: 5000
Best,
Elizabeth Pasipanodya
[[alternative HTML version deleted]]
------------------------------
Message: 2
Date: Thu, 7 May 2015 10:08:24 +0200
From: Thierry Onkelinx
To: David Villegas R?os ,
"r-sig-mixed-models at r-project.org"
Subject: Re: [R-sig-ME] Including an autocorrelation term dramatically
reduces random variance (lme)
Message-ID:
Content-Type: text/plain; charset="UTF-8"
Dear David,
Please keep the mailing list in cc.
Correlation structures in nlme work on the residuals conditional on the
random effects. This implies that residuals from different levels of the
random effects are assumed to be independent (not correlated). Some of the
information can be described by both the random effect and the correlation
structure. In fact a random intercept is equivalent with a compound
symmetry correlation structure.
You have only 3 observations per random effect level. Then it is hard to
make the difference between the average of within the group (the random
effect) and the correlated residuals. In such cases the shrinkage kick in
very hard, reducing the random effect variance strongly.
IMHO you need choose a model than matches the design. The design dictates
that you need a random effect of fish. This leads to 3 per fish. 3
observations is not enough to estimate a AR1 correlation. Since you have
only 3 observations is doesn't make sense to look at 15 lags. You only have
2...
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2015-05-06 14:52 GMT+02:00 David Villegas R?os :
> Dear Thierry, thank you for your kind response.
>
> I understand that the residual variance must be different between the two
> models (with and without autocorrelation term). However, as you can see in
> my original post, the main change (3 orders of magnitude) is in the random
> variance (the between-individuals variance). I don't understand this. It
> seems that the autocorrelation term "eats" all the variance previously
> explained by the individual ID...
>
> In addition, it makes sense of course to leave always the random ID in the
> model. But to me, the only way to test if the random variance (and
> therefore the repeatability) is different from zero, is to test if the
> model needs the random ID (AIC, anova,...between model with and model
> without random ID). Of course I could calculate the CI for the
> repeatability using parametric bootstrapping, but this seems a difficult
> task for such complex models according to this previous post:
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2015q2/023385.html
>
> Best,
>
> David
>
>
>
> 2015-05-06 14:21 GMT+02:00 Thierry Onkelinx :
>
>> Dear David,
>>
>> A model without correlation has $\epsilon_t \sim N(0, \sigma_1)$
>> AR1 correlation implies: $\epsilon_t = \phi \epsilon_{t-1} + a_t$ with
>> $\a_t \sim N(0, \sigma_2)$ (see Pinheiro and Bates, 2000). So the residual
>> variance in this model is not the same thing. It makes sense that
>> $\sigma_2$ is smaller than $\sigma_1$.
>>
>> You should takes this into account when calculating the repeatability.
>>
>> Don't bother testing the need of ID. The design of your experiment
>> dictates the need to include it in the model.
>>
>> Weights affect the residual variance somewhat similar as a correlation
>> structure does. Have a look at the formulas in Pinheiro and Bates (2000).
>>
>> Best regards,
>>
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> and Forest
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>>
>> To call in the statistician after the experiment is done may be no more
>> than asking him to perform a post-mortem examination: he may be able to say
>> what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner
>> The combination of some data and an aching desire for an answer does not
>> ensure that a reasonable answer can be extracted from a given body of data.
>> ~ John Tukey
>>
>> 2015-05-06 10:07 GMT+02:00 David Villegas R?os :
>>
>>> Dear list,
>>>
>>> My dataset includes 285 individuals for which I have measured one trait
>>> over three months. The traits has been measured at two different time
>>> scales: monthly (maximum 3 replicates per fish) and daily (maximum 90
>>> replicates per fish). For one third of the fish, the trait was measured
>>> in
>>> 2012, for another third in 2013 and for the last third in 2014. But I
>>> measured always in the same three months: june, july and august.
>>>
>>> My objective is to use a mixed modelling approach to estimate
>>> repeatability
>>> (Vrandom/(Vrandom+Vresidual)) of this trait. I want to account for three
>>> fixed factors: area (two sites), total length (tl) and time. For
>>> comparative purposes, I want to do it for the two temporal scales, i.e.,
>>> one model using month and therefore three replicates per ID, and a
>>> different model using (julian) day and 90 replicates per ID.
>>>
>>> My attempts so far for the model with month:
>>>
>>> MODEL 1: Mixed model without autocorrelation term
>>>
>>>
>>> lme1=lme(loghr~area+tl+factor(month2),random=~1|fish,data=hrm3,method="REML")
>>>
>>> > summary(lme1)
>>> Linear mixed-effects model fit by REML
>>> Data: hrm3
>>> AIC BIC logLik
>>> 1498.771 1531.537 -742.3854
>>>
>>> Random effects:
>>> Formula: ~1 | fish
>>> (Intercept) Residual
>>> StdDev: 0.4649101 0.4790193
>>>
>>> Fixed effects: loghr ~ area + tl + factor(month2)
>>> Value Std.Error DF t-value
>>> p-value
>>> (Intercept) -1.1688029 0.15189536 514 -7.694790 0.0000
>>> areatvedestrand -0.9943160 0.06490892 283 -15.318635 0.0000
>>> tl -0.0034115 0.00308169 283 -1.107031 0.2692
>>> factor(month2)7 -0.2242556 0.04074810 514 -5.503462 0.0000
>>> factor(month2)8 -0.1269165 0.04245662 514 -2.989322 0.0029
>>>
>>> Correlation:
>>> (Intr) artvds tl fc(2)7
>>> areatvedestrand -0.224
>>> tl -0.943 0.019
>>> factor(month2)7 -0.121 0.003 -0.011
>>> factor(month2)8 -0.113 -0.006 -0.011 0.475
>>>
>>> Standardized Within-Group Residuals:
>>> Min Q1 Med Q3 Max
>>> -2.7792032 -0.5127146 -0.1031226 0.3935509 4.3119462
>>>
>>> Number of Observations: 802
>>> Number of Groups: 286
>>>
>>> The estimation of Repeatability from this model would be
>>> 0.46/(0.46+0.47)=0.485
>>> However if I extract the residuals from this model
>>> res=resid(lme,type="normalized") and do acf(residuals) the plot shows a
>>> high autocorrelation in the first 15 lags. So I tried the following:
>>>
>>> MODEL 2: same model including an autocorrelation term
>>>
>>> lme3=update(lme1,correlation=corAR1(form=~month2))
>>>
>>> > summary(lme)
>>> Linear mixed-effects model fit by REML
>>> Data: hrm3
>>> AIC BIC logLik
>>> 1461.013 1498.46 -722.5066
>>>
>>> Random effects:
>>> Formula: ~1 | fish
>>> (Intercept) Residual
>>> StdDev: 0.0005272246 0.6819065
>>>
>>> Correlation Structure: ARMA(1,0)
>>> Formula: ~month2 | fish
>>> Parameter estimate(s):
>>> Phi1
>>> 0.6097222
>>> Fixed effects: loghr ~ area + tl + factor(month2)
>>> Value Std.Error DF t-value
>>> p-value
>>> (Intercept) -1.1584548 0.15740075 514 -7.359906 0.0000
>>> areatvedestrand -1.0019165 0.06730508 283 -14.886194 0.0000
>>> tl -0.0035497 0.00319449 283 -1.111192 0.2674
>>> factor(month2)7 -0.2212453 0.03628563 514 -6.097327 0.0000
>>> factor(month2)8 -0.1275377 0.04735955 514 -2.692968 0.0073
>>> Correlation:
>>> (Intr) artvds tl fc(2)7
>>> areatvedestrand -0.227
>>> tl -0.944 0.022
>>> factor(month2)7 -0.104 0.003 -0.009
>>> factor(month2)8 -0.124 -0.005 -0.013 0.611
>>>
>>> Standardized Within-Group Residuals:
>>> Min Q1 Med Q3 Max
>>> -2.62204450 -0.68663130 -0.08503995 0.58177543 3.79290975
>>>
>>> Number of Observations: 802
>>> Number of Groups: 286
>>>
>>> If I plot acf(residual) now the plot looks much nicer (almost all the
>>> autocorrelation has been corrected)
>>>
>>> As you can see, the fact of including the autocorrelation term
>>> dramatically
>>> reduces the random variance from 0.46 to 0.0005, which in turn reduces
>>> the
>>> Repeatability from 0.485 to 5.98e-07.
>>>
>>> NOW: if repeat this exercise working at a daily scale (90 replicates per
>>> ID), the reduction in random variance and repeatability is much smaller
>>> (from 0.46 to 0.16), but very noticeable still.
>>>
>>> My questions are:
>>> - How should I interpret this? Is the autocorrelation term in model 2
>>> taking most of the random variance previously explained by the individual
>>> ID in model 1?
>>> - Model 2 yields a very low value of repeatability, however model
>>> selection
>>> is telling me than the random effect (individual ID) has to be included
>>> in
>>> the model. I interpret this as the repeatability being significantly
>>> different from zero, even it is really low. Correct?
>>> - With a different trait, I observe the same reduction after including a
>>> "weights" argument. And with another trait, I observe a similar reduction
>>> when I pass from an autocorrelation structure corAR1 to a correlation
>>> structure corARMA (2,2). Any thought?
>>> - Could I say that including explanatory variables (fixed part of the
>>> model) will reduce the residual variance, and including terms in the
>>> random
>>> part of the model (random factors, autocorrelations terms, weights, ...)
>>> will reduce the random variance?
>>>
>>> Many thanks in advance,
>>>
>>> David
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>
>>
>
[[alternative HTML version deleted]]
------------------------------
Message: 3
Date: Thu, 7 May 2015 10:12:29 +0200
From: Thierry Onkelinx
To: Mark Leeds
Cc: "r-sig-mixed-models at r-project.org"
Subject: Re: [R-sig-ME] question about gls in nlme
Message-ID:
Content-Type: text/plain; charset="UTF-8"
Dear Mark,
It depends on which residuals you extract from the model. See the type
argument of residuals.gls()
Best regards,
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
2015-05-06 19:57 GMT+02:00 Mark Leeds :
> Hi: I've looked around for a long while and I can't figure out something
> about
> the gls function in the nlme package. If someone wouldn't mind explaining
> it or knows of a good reference ( I have pinheiro and bates but it's 10
> states away because I lent it to someone ) that explains the following,
> that would be fine also.
>
> Suppose I am estimating a 2 predictor regression model with AR(1) errors.
>
> So, the mode isl y_t = B_0 + B_1*x_t + B2*y_2 + u_t t = 1,..... T
>
> where u_t = phi*u_t-1 + epsilon_t
>
> where var(epsilon_t) is sigma^2.
>
> My first question is:
>
> When one runs the function gls, I can't tell what the residuals represent.
> In other words, are they
>
> A) the residuals associated with the transformed regression where the error
> term is epsilon_t with variance(sigma^2).
>
> or
>
> B) the residuals associated with the untransformed model written above
> so with covariance matrix V and elements V_ij = phi^(|j-i| * sigma^2 /
> (1-phi^2).
>
> My second question is:
>
> Whether the result represents A or B, is there a way to use a gls related
> function to convert back and forth between the two ? Thanks a lot for any
> wisdom references etc. I've looked all over and can't find much related to
> my question.
>
> The other possible option is to consider using the systemfit package but I
> have a feeling there must be a way to do this using gls.
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
[[alternative HTML version deleted]]
------------------------------
Subject: Digest Footer
_______________________________________________
R-sig-mixed-models mailing list
R-sig-mixed-models at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
------------------------------
End of R-sig-mixed-models Digest, Vol 101, Issue 8
More information about the R-sig-mixed-models
mailing list