[R-sig-ME] Standard errors of variances; negative variance

Fri Jun 23 03:42:02 CEST 2023

In response to John Maindonald's latest post... John, you suggested that the variance components "cannot necessarily be interpreted as variances". I don’t think so. For any real data, variance components can and should be interpreted as the square of standard deviations, which have a direct interpretation in terms of differences between and/or changes within subjects and/or settings. You only have to simulate data to realize that: all the random effects are generated randomly with a mean of zero and a given SD. (Admittedly the data have to be generated with positive SDs. I'm not sure how to simulate data directly with a negative SD, although I can do it indirectly by adding random variability in the "wrong" or unexpected place.) To allow for sampling variation in the estimation of the SD, you have to allow negative variance. When you do that, you estimate the simulated SD without bias, and the confidence intervals have correct coverage. If you don't allow negative variance, you will get bias and unrealistic upper confidence limits, with small sample sizes anyway. Negative point estimates and negative lower confidence limits are realistic in the sense that they properly allow for sampling uncertainty, and in any case they may be real, in the sense that the extra variance could be in a different place from what you expected and modeled.

in response to Ben Pelzer's latest post... Ben, you can't get the extra variance with only one observation per subject in one group. You can get it with two observations on the same subject in one group; so for a group of subjects measured in two trials, without necessarily anything else happening, you can estimate the variance representing true differences between subjects with random int/subject=SubjectID and two residual variances with repeated/group=Trial. And with the same data and exactly the same results, you can estimate one residual variance and the difference between the residuals as extra variance, either on the first trial or the second trial, and it will positive or negative variance, depending on which residual was greater with the repeated/group=Trial analysis. (The variances add up exactly, but the predicted values differ very slightly between the models where the extra variance is positive and negative.) Here is the SAS code for doing all that, written with macro variables to allow you to change values of means, SDs and sample size. Remember that sample size has to large to get good precision for the SDs. I've set the sample size to 100. If you make it 10, there's huge uncertainty, and sometimes it gives infinite likelihood (although you can probably overcome that with suitable starting values for the covparms). Note that I have not added extra variance representing individual responses in Trial 2 (as shown with %let ir=0;), so half the simulations will produce negative extra variance in Trial 2, and you can play with estimating the extra variance in either of the trials to see that you get the same extra variance differing only in sign, and the same standard error.

*Simple test-retest reliability, allowing for different residuals in the two trials
 and individual responses in the second trial;
*Modify the program as shown in the proc mixed step to estimate one residual
 and extra variance one or other trial;

%let mean=100; *true grand mean in the first test;
%let sd=10; *true between-subject SD;
%let tem=5; *typical (standard) error of measurement;
%let deltamean=2; *change in mean test2-test1;
%let ir=0; *SD for individual responses (in Trial 2);
%let ss=100; *sample size;

*generate the data;
data dat1;
do SubjectID=1 to &ss;
  TruePerform=&mean+&sd*rannor(0);
  Trial=1;
  IndivResp=0;
  ObsvdPerform=TruePerform+&tem*rannor(0)+IndivResp;
  output;
  Trial=2;
  IndivResp=&ir*rannor(0);
  ObsvdPerform=TruePerform+&tem*rannor(0)+&deltamean+IndivResp;
  output;
  end;

title "Raw data, first 10 observations";
proc print data=dat1(obs=10);
format TruePerform ObsvdPerform IndivResp 5.1;
run;

title "Simple stats";
proc means maxdec=1;
var TruePerform ObsvdPerform IndivResp;
class Trial;
run;

*Set up dummy variables for estimating extra variance in Trial 1 or Trial 2;
data dat2;
set dat1;
xVarTrial1=0;
if Trial=1 then xVarTrial1=1;
xVarTrial2=0;
if Trial=2 then xVarTrial2=1;

ods select none;
title "Rely mixed model, ssize=&ss";
title2 "True mean=&mean, true between SD=&sd, TEM=&tem, delta mean=&deltamean";
proc mixed data=dat2 covtest cl alpha=0.1 nobound;
class SubjectID Trial;
model ObsvdPerform=Trial/noint ddfm=sat solution cl alpha=0.1 outp=pred alphap=0.1;
random int/subject=SubjectID s cl alpha=0.1; *This estimates true differences between subjects;
repeated/group=Trial; *This estimates different residuals in Trial 1 and 2;
*Star off the above two lines and unstar one or other (but not both) of the following two lines...;
*...to estimate one residual and extra variance on either Trial 1 or Trial 2;
*random int xVarTrial1/subject=SubjectID s cl alpha=0.1; *extra error on Trial 1;
*random int xVarTrial2/subject=SubjectID s cl alpha=0.1; *or extra variance on Trial 1;
estimate "Trial 2-1" Trial -1 1/cl alpha=0.1;
ods output classlevels=clev;
ods output covparms=cov;
ods output solutionf=solf;
ods output solutionr=solr;
ods output estimates=est;
run;
ods select all;

title3 "Class levels";
proc print data=clev;
run;

title3 "Covparms";
proc print data=cov;
run;

title3 "Predicteds and residuals, first 10 observations";
proc print data=pred(obs=10);
run;

title3 "Fixed-effects solution";
proc print data=solf;
run;

title3 "Random-effects solution, first 10 observations";
proc print data=solr(obs=10);
run;

title3 "Estimates";
proc print data=est;
run;