[R-sig-ME] Standard errors of variances; negative variance

Thu Jun 22 09:19:37 CEST 2023

Hi Will,

I was wondering if your "strange variance" result is basically comparable
to the following simple R script. Data are simulated, with no higher level
at all. For each "subject" there is only one single observation. Trying to
estimate both the residual variance SE and also a random intercept variance
with lmer (after switching off all mechanisms which prevent a user to
estimate such a "strange" model) and lme, leads to quite arbitrary results
for the two variance estimates. Obviously, estimating two quantities when
there exists only one is problematic.... so why would I like to estimate
such a model? But maybe your situation is different, a reproducible example
(data + sas code) might be enlightening. Best regards, Ben P.

y <- rnorm(100, 0, 1)
subject <- 1:100

library(lme4)
m1 <- lmer(y ~ (1|subject),
           control=lmerControl(check.nobs.vs.nlev = "ignore",
                               check.nobs.vs.rankZ = "ignore",
                               check.nobs.vs.nRE="ignore"))
summary(m1)

library(nlme)
m2 <- lme(y ~ 1, random=~1|subject)
summary(m2)

On Thu, 22 Jun 2023 at 00:25, Will Hopkins <willthekiwi using gmail.com> wrote:

> Someone emailed me personally to point out a mistake in my SAS
> random-effects code for individual responses. I had random
> xVarExpt*SubjectID, which is correct, but it is equivalent to random
> xVarExpt/subject=SubjectID, not intercept/subject=SubjectID. Sorry about
> that. It's distressingly easy to make such simple errors, but hopefully
> they become obvious when you run the program and get silly results,
> infinite likelihood, or failed to converge.
>
> If you model original scores rather than change scores, the fixed effect
> is Group*Trial and there are two random effects: SubjectID
> xVarExpt*SubjectID (or intercept xVarExpt/subject=SubjectID) along with the
> residual. By the way, with a big-enough sample size, you can estimate
> individual responses with this model for the original scores and data for
> only one group. The fixed effect is then simply Trial. Of course, there is
> a lot of uncertainty, depending on the sample size and the relative
> magnitudes of the between-subject SD and error-of-measurement (residual)
> SD. It still amazes me that Proc Mixed can partition out three variances
> from two SDs on the same subjects. If you don't estimate individual
> responses directly with a dummy variable, you can estimate a different
> residual variance in the pre- and post-test. The individual-response
> variance is the difference between the residual variances. Does lme have
> the option for specifying different random effects and/or residuals in diffe
>  rent groups? In SAS its simply done with the /group= syntax (here
> repeated/group=Trial).
>
> Stevedrd has raised the issue of adjusting for the pretest value. Yes,
> that improves precision by accounting for regression to the mean with
> change scores, when error of measurement is substantial relative to the
> between-subject SD in the pretest. I routinely include the pretest when
> modeling change scores; the fixed effects are then Group Group*PreTest. You
> thereby estimate the individual responses with improved precision, but it
> has no effect on their expected value, unless the pretest really does
> modify the treatment effect (which sometimes happens, especially with human
> performance). The resulting individual responses are those not explained by
> the pretest and by any other effect modifiers you put in the fixed-effects
> model. You can't include the pretest when you model original scores, unless
> you make the post-test value the dependent and adjust for the pretest; I
> don't normally do it that way, because modeling change scores is easier for
> people to understand and to visualize in
>  scatterplots, IMHO.
>
> In response to Ben Bolker... I said merDeriv's results may be
> untrustworthy, because you said previously that "the standard errors are
> often extremely poor summaries of the uncertainty or RE variances". I
> thought your comment was about the standard errors produced by merDeriv,
> not any standard errors for variances produced by mixed models. I can't
> remember what package Alice found for producing standard errors with lme,
> but it was around 2017 and so it may have been merDeriv. As I said in my
> first message, the SEs weren't the same as SAS's, so I assumed "extremely
> poor" referred to merDeriv. As for the standard errors being extremely poor
> summaries in general, I don't know where that assessment comes from. In my
> experience with SAS they are excellent summaries. For simple models you get
> exactly what you would expect for the degrees of freedom, and the
> confidence intervals have acceptable coverage. SAS assumes a normal
> sampling distribution for the variance, centered on the observed v
>  ariance, not the log of the variance.
>
> In response to Ben Peizer... The change score is the dependent variable in
> a controlled trial, with one observation per subject in the control group
> and one per subject in the experimental group. If the SD of the change
> scores in the experimental group is less than that in the control group,
> and you specify a model that estimates extra variance in the experimental
> group with a dummy variable, you will get negative variance. If instead you
> specify extra variance in the control group, you will get equal and
> opposite variance. The SE is the same. Why bother to specify extra variance
> in the experimental group, if you can see that the variance in that group
> is less than that in the control group?  Because there probably are real
> individual responses in the experimental group, and you want to see how big
> they could be, by estimating their upper confidence limit.  The observed SD
> of change scores in the experimental group may be less than that in the
> control group simply because of sampling
>   variation, especially for small sample sizes.
>
> I like John Maindonald's comparison with square root of -1. I wonder if it
> is possible to work with SDs as complex numbers. I can't see how at the
> moment.
>
> Will
>
> -----Original Message-----
> From: Will Hopkins <willthekiwi using gmail.com>
> Sent: Wednesday, June 21, 2023 12:08 PM
> To: r-sig-mixed-models using r-project.org
> Subject: RE: [R-sig-ME] Standard errors of variances; negative variance
>
> Thanks heaps for the seven replies from five people on this list. In
> summary, it looks like standard errors could be derived for variances in R:
> with asreml-R (which you have to pay for?); possibly with nlme; with
> merDeriv, but they may be untrustworthy; and with Bayesian packages (but
> I'd be specifying weakly informative priors that in the limit have no
> effect on the posterior, which might not work in some packages). Negative
> variance seems to be off the table, however. Ben, James and Timothee have
> indicated scenarios where negative variance made some sense.
>
> I had a particular reason for seeking advice on this list; see the bottom
> of my message. First, here's my justification for negative variance and
> standard errors of variances.
>
> My favorite example is the variance representing individual responses to a
> treatment, derived as the variance of the change scores in an experimental
> group minus the variance of the change scores in a control group. In a
> mixed model, the dependent is the change score, there is a fixed effect for
> group (to estimate the difference in the mean changes), other fixed-effect
> modifiers if you want them, and a random effect for the interaction of a
> dummy variable (1 in the experimental group, 0 in the control group) with
> subject identity. I usually call the dummy variable something like
> xVarExpt, to indicate extra variance in the experimental change scores, and
> the code in SAS is random xVarExpt*SubjectID; or random
> intercept/subject=SubjectID;. And of course there is the residual.
>
> Usually the variance of the change scores in the experimental group is
> greater than that in the control group, owing to individual responses. But
> with small sample sizes, the variance on the experimental group could be
> less than that in the control, simply because of sampling variation. You
> will therefore get a negative variance point estimate. Importantly, its
> upper confidence limit will tell you how big and positive the individual
> responses could be. The confidence interval for variances estimated by Proc
> Mixed in SAS have good coverage in my experience with many simulations over
> the years. (The coverage isn't quite so good in meta-analytic mixed models
> when the study-estimate sample sizes are small, but the coverage is
> generally conservative, i.e., what you specify as a 90%CI has greater than
> 90% coverage, as I recall.) The coverage works well ONLY when you allow
> negative variance for the random effects. Interestingly, the default in SAS
> is positive-only variance, and if you ask f
>  or confidence limits, they are calculated from a standard error and the
> chi-squared distribution. The resulting confidence limits are unrealistic:
> you can get enormous upper limits. When you allow negative variance, they
> are realistic; quoting from the on-line documentation, "Wald Z-scores and
> normal quantiles are used to construct the limits". (See
> https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_mixed_syntax01.htm
> There is no further information about how the standard errors are
> calculated and no mention of profile confidence limits.) The confidence
> limits for the residual(s) are always calculated from the chi-squared
> distribution, unless you specify otherwise, and these are always realistic
> and have correct coverage, in my experience.
>
> The variance in the experimental group could truly be less than that in
> the control group, either because the treatment compresses the responses
> towards some upper or lower limit, or less likely because there are
> individual responses to the control treatment that exceed those to the
> experimental treatment. Either way negative variance is meaningful. You can
> turn the dummy variable around if you like, but you don't have to.
>
> When I get a negative variance, or a negative lower confidence limit, I
> have no hesitation in changing the sign, taking the square root, and
> calling it a negative SD. It just simply means that the individual
> responses, or more generally whatever differences or variability are
> represented by the random effect, could be reduced differences or
> variability. I know a negative SD for the intercept random effect is a hard
> one to swallow, but it's better to see that and think again about your
> model and/or what might really be happening than to simply let it be set to
> zero.
>
> I would like to emphasize just how important the random effects are in the
> analysis of repeated measurement or other clustered data. That's where the
> individual differences and individual responses are found, which are
> crucial in this age of personalized medicine. Hence you need to make
> inferences about the random effects; they are not simply nuisance
> parameters that you have to include to make sure the uncertainty in the
> mean effects are trustworthy, then forget about. I make inferences about
> magnitudes of the random effects (Bayesian or hypothesis testing against
> smallest importants) assuming a normal distribution (chi-squared for
> residuals) and smallest importants for SDs that are half those for mean
> effects.
>
> OK, enough of the hobby horse. I wanted to find out whether R allows for
> negative variances and standard errors of variances, because I am working
> with a colleague on a manuscript focusing on the correct way to analyze,
> report and interpret errors of measurement in various kinds of reliability
> study. Example: for some measures, the first measurement in a series of
> repeated measurements can have more error, which is easily specified and
> estimated with a dummy variable, as explained above for individual
> responses. Another example: the possibility of additional error arising
> between time-separated clusters of repeated measurements needs to be
> specified as a random effect that can have negative variance. We've
> simulated data and analyzed them all with SAS Studio, and we hope to
> publish the programs as supplementary material. We're looking for a
> co-author/collaborator who had experience of measurement studies and skill
> with mixed models in R to provide the same simulations and analyses wi
>  th R. It's contingent upon estimating negative variance and standard
> errors of variances, both of which require more than lme or nlme, by the
> look of it. Maybe it's a good opportunity for someone to get involved and
> figure out the easiest way to get the same estimates and uncertainties as
> in SAS, because that would or should be useful for mixed modeling more
> generally with R.
>
> Will
>
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]