[R-sig-ME] Standard errors of variances; negative variance

Wed Jun 21 02:08:14 CEST 2023

Thanks heaps for the seven replies from five people on this list. In summary, it looks like standard errors could be derived for variances in R: with asreml-R (which you have to pay for?); possibly with nlme; with merDeriv, but they may be untrustworthy; and with Bayesian packages (but I'd be specifying weakly informative priors that in the limit have no effect on the posterior, which might not work in some packages). Negative variance seems to be off the table, however. Ben, James and Timothee have indicated scenarios where negative variance made some sense. 

I had a particular reason for seeking advice on this list; see the bottom of my message. First, here's my justification for negative variance and standard errors of variances.

My favorite example is the variance representing individual responses to a treatment, derived as the variance of the change scores in an experimental group minus the variance of the change scores in a control group. In a mixed model, the dependent is the change score, there is a fixed effect for group (to estimate the difference in the mean changes), other fixed-effect modifiers if you want them, and a random effect for the interaction of a dummy variable (1 in the experimental group, 0 in the control group) with subject identity. I usually call the dummy variable something like xVarExpt, to indicate extra variance in the experimental change scores, and the code in SAS is random xVarExpt*SubjectID; or random intercept/subject=SubjectID;. And of course there is the residual. 

Usually the variance of the change scores in the experimental group is greater than that in the control group, owing to individual responses. But with small sample sizes, the variance on the experimental group could be less than that in the control, simply because of sampling variation. You will therefore get a negative variance point estimate. Importantly, its upper confidence limit will tell you how big and positive the individual responses could be. The confidence interval for variances estimated by Proc Mixed in SAS have good coverage in my experience with many simulations over the years. (The coverage isn't quite so good in meta-analytic mixed models when the study-estimate sample sizes are small, but the coverage is generally conservative, i.e., what you specify as a 90%CI has greater than 90% coverage, as I recall.) The coverage works well ONLY when you allow negative variance for the random effects. Interestingly, the default in SAS is positive-only variance, and if you ask for confidence limits, they are calculated from a standard error and the chi-squared distribution. The resulting confidence limits are unrealistic: you can get enormous upper limits. When you allow negative variance, they are realistic; quoting from the on-line documentation, "Wald Z-scores and normal quantiles are used to construct the limits". (See  https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_mixed_syntax01.htm There is no further information about how the standard errors are calculated and no mention of profile confidence limits.) The confidence limits for the residual(s) are always calculated from the chi-squared distribution, unless you specify otherwise, and these are always realistic and have correct coverage, in my experience.

The variance in the experimental group could truly be less than that in the control group, either because the treatment compresses the responses towards some upper or lower limit, or less likely because there are individual responses to the control treatment that exceed those to the experimental treatment. Either way negative variance is meaningful. You can turn the dummy variable around if you like, but you don't have to.

When I get a negative variance, or a negative lower confidence limit, I have no hesitation in changing the sign, taking the square root, and calling it a negative SD. It just simply means that the individual responses, or more generally whatever differences or variability are represented by the random effect, could be reduced differences or variability. I know a negative SD for the intercept random effect is a hard one to swallow, but it's better to see that and think again about your model and/or what might really be happening than to simply let it be set to zero. 

I would like to emphasize just how important the random effects are in the analysis of repeated measurement or other clustered data. That's where the individual differences and individual responses are found, which are crucial in this age of personalized medicine. Hence you need to make inferences about the random effects; they are not simply nuisance parameters that you have to include to make sure the uncertainty in the mean effects are trustworthy, then forget about. I make inferences about magnitudes of the random effects (Bayesian or hypothesis testing against smallest importants) assuming a normal distribution (chi-squared for residuals) and smallest importants for SDs that are half those for mean effects.

OK, enough of the hobby horse. I wanted to find out whether R allows for negative variances and standard errors of variances, because I am working with a colleague on a manuscript focusing on the correct way to analyze, report and interpret errors of measurement in various kinds of reliability study. Example: for some measures, the first measurement in a series of repeated measurements can have more error, which is easily specified and estimated with a dummy variable, as explained above for individual responses. Another example: the possibility of additional error arising between time-separated clusters of repeated measurements needs to be specified as a random effect that can have negative variance. We've simulated data and analyzed them all with SAS Studio, and we hope to publish the programs as supplementary material. We're looking for a co-author/collaborator who had experience of measurement studies and skill with mixed models in R to provide the same simulations and analyses with R. It's contingent upon estimating negative variance and standard errors of variances, both of which require more than lme or nlme, by the look of it. Maybe it's a good opportunity for someone to get involved and figure out the easiest way to get the same estimates and uncertainties as in SAS, because that would or should be useful for mixed modeling more generally with R.

Will