[R-sig-ME] Help understanding residual variance

Thu Mar 29 21:23:16 CEST 2012

On Mar 29, 2012, at 11:25 AM PDT, Ben Bolker wrote:

> Joshua Wiley <jwiley.psych at ...> writes:
> 
>> 
>> Hi Ista,
>> 
>> To me the onis is on the statistician consultant to explain *why* you
>> cannot have both random intercepts and slopes.  Does the consultant
>> have papers to reference or proofs?
>> 
>> In any case, this is hardly exclusive to 'R doing something strange'.
>> SAS and Stata happily join the gang.  See the attached file for code
>> and output from all three using a minidataset simulated in R.
>> 
>> I suppose one could bicker over whether a random intercept and slope
>> is a good idea, but possible it certainly is.  You might suggest that
>> it is poor fare to voice strong opinions about matters which one does
>> not understand.
>> 
>> Cheers,
>> 
>> Josh
> 
>  Very nice example.  Do note that while R::lme and Stata give
> the same point estimates, Stata also provides estimated confidence
> intervals, which are enormous -- suggesting that, while one can
> do this, the resulting model might be unidentifiable.
> 
>  Doug Bates commented off-list that:
> 
>> I believe that the estimates for such a model are at least ill-defined
>> if not unidentifiable.

I concur with this -- I believe that the statistician consultant is right in this case, and that with only two observations per subject (plus crucially, no pair of observations for a given subject having the same value of Days), the model is actually unidentifiable given the dataset.  Here's one way of thinking about it: imagine that we represent the Days==1 and Days==9 observations for each subject as a 2-vector, (y_1 y_9).  If the residual variance is s^2 and the covariance matrix for the subject-level means for Days==1 and Days==9 as

 | t1^2             rho*t1t9     |
 | rho*t1t9             t9^2     |

then the marginal distribution for the total contribution of subject-level and residual error to the two observations for a given subject is bivariate normal with covariance matrix

 | t1^2 + s^2       rho*t1t9 |
 | rho*t1t9         t9^2+s^2 |

But as this illustrates, there's no way of distinguishing the residual error term s^2 from the subject random effects t1^2 and t9^2.  (This problem would not arise if there were at least two observations for one value of Days in at least one subject, because for that subject one would not be able to represent the data such that there is effectively only one multivariate observation per subject.)

I'll be interested to know whether this explanation helps shed light on the matter!

Best

Roger

--

Roger Levy                      Email: rlevy at ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://idiom.ucsd.edu/~rlevy