[R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

Fri Jan 27 14:38:46 CET 2012

Charles,

I guess that SAS is using the Satterthwaite formula to approximate 
denominator degrees of freedom (google 'Satterthwaite'). See Kenward and 
Roger (1997) Biometrics 53 983-997 for another approach.

Charles Determan Jr wrote:
> Thank you John, I truly appreciate all the advice I have received.  At no
> point to I feel entitled for someone to provide this for me.  I do hope
> that others may find this thread useful as well.  If I may just as one more
> question.  Is there an accepted way to calculate the degrees of freedom in
> this case to come to the value of 25?  This has been something I have been
> trying to determine.
> 
> Again, thank you to all for providing me with all the information you have,
> 
> Charles
> 
> On Fri, Jan 27, 2012 at 12:10 AM, John Maindonald <
> john.maindonald at anu.edu.au> wrote:
> 
>> I've twigged that "unstructured" means a variance-covariance matrix
>> that ignores the time structure, i.e., each element is estimated
>> separately.  One can do this in lme4, thus:
>>
>> Orthodont$Age <- factor(Orthodont$age)
>>> orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject),
>> + data=Orthodont)
>>> anova(orth.lmer)
>> Analysis of Variance Table
>>        Df Sum Sq Mean Sq F value
>> Sex      1  9.591  9.5905 33.9645
>> Age      3 34.876 11.6252 41.1703
>> Sex:Age  3  2.930  0.9768  3.4594
>>
>> The sum of squares for the Sex*Age interaction agrees with that from SAS.
>> lmer() leaves the user to work out an appropriate df for the F-statistic.
>>  One
>> has, in agreement with the SAS output:
>>> 1-pf(2.93, 3, 25)
>> [1] 0.05318945
>>
>> The main effects SS and F's are of course not expected to agree and
>> indeed, in my strong view, true SAS type III SS's are inappropriate.
>>
>> I regard this R analysis however as an inelegant use of the data, with low
>> power.  My book with John Braun (3rd edn and if I recall correctly, 2nd)
>> has an analysis that I am prepared to defend that uses lmer() from lme4.
>> Several years ago, I placed on the web a version of the analysis that uses
>>  lme() from nlme:
>>   http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
>> I'd forgotten that it was there.
>>
>> This is a list for R users.  You will not necessarily find folk here with
>> a high
>> level of SAS expertise, maybe not even enough to understand what the
>> SAS documentation means when it uses the term "unstructured".  Charles,
>> I think you've done pretty well in the amount of free advice that you have
>> received.  I think too that Steven McKinney's point is well made.  Kevin's
>> interpretation of the EULA is probably more or less correct, but the
>> document comes across rather more ferociously than he suggests, and it
>> is not a good starting point for scientific assessment of the respective
>> methodologies.  Many of us have other reasons why we are not very
>> interested in SAS does.  The implied threat in the EULA provides, even
>> if we are not party to any such agreement, just one more reason why SAS
>> is not to any large extent in our path ahead to the future.
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>> http://www.maths.anu.edu.au/~johnm
>>
>> On 27/01/2012, at 4:03 PM, Kevin Wright wrote:
>>
>>> Dear Charles,
>>>
>>> First, I hope you are not put off by the tone of some the responses in
>> this
>>> email thread.  Sometimes people have strong opinions and it might come
>>> across aggressively.
>>>
>>> Second, be sure to understand that reproducing a SAS analysis with lme in
>>> no way violates any legal agreements that SAS may have, if for no other
>>> reason than you never signed an agreement with SAS!  That bit in the EULA
>>> about decompiling and reverse engineering means that people are
>> prohibited
>>> from creating a new version of PROC MIXED that does the same thing.  The
>>> nlme package uses different methods than SAS. E.g. different optimizers,
>>> even uses a log-parameterization deep in the code so that negative
>> variance
>>> components cannot happen.
>>>
>>> Third, by now you've probably figured out that PROC MIXED and lme have
>> very
>>> different ideas about degrees of freedom.  Also, the loglikelihoods are
>> on
>>> different scales.  For that reason, when I try to reproduce an analysis,
>> I
>>> find the best way to compare is to look at the variance components.
>>>
>>> Here is what PROC MIXED says:
>>>
>>>    Cov Parm Estimate     Std Error       Z  Pr > |Z|
>>> UN(1,1)     5.41545455    1.53172185    3.54    0.0004
>>> UN(2,1)     2.71681818    1.09623989    2.48    0.0132
>>> UN(2,2)     4.18477273    1.18363247    3.54    0.0004
>>> UN(3,1)     3.91022727    1.41775367    2.76    0.0058
>>> UN(3,2)     2.92715909    1.19304751    2.45    0.0141
>>> UN(3,3)     6.45573864    1.82595863    3.54    0.0004
>>> UN(4,1)     2.71022727    1.17209851    2.31    0.0208
>>> UN(4,2)     3.31715909    1.12903016    2.94    0.0033
>>> UN(4,3)     4.13073864    1.40356157    2.94    0.0033
>>> UN(4,4)     4.98573864    1.41017984    3.54    0.0004
>>> Residual         1.00000000 .       .         .
>>>
>>> That's our target, and here is the lme code to get us there.
>>>
>>> library(nlme)
>>> orth <- as.data.frame(Orthodont)
>>> m1 <- gls(distance ~ Sex*age,
>>>          correlation=corSymm(form = ~ 1 | Subject),
>>>          weights = varIdent(form = ~ 1 | age),
>>>          data = orth)
>>> cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
>>> vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
>>> covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
>>> cors*covs
>>>
>>> Now, some explanations will surely help, so let's step through the code
>>> with comments.
>>>
>>> The Orthodont data is a groupedData object, which can be helpful, but
>>> sometimes confusing, so coercing to a data.frame removes the formula--you
>>> can see it with 'formula(Orthodont)'.
>>>
>>> orth <- as.data.frame(Orthodont)
>>>
>>> Since there is no "random" line in the PROC MIXED code, we don't want to
>>> use 'lme' (also why I removed the formula from the data), but instead use
>>> 'gls' for generalized least squares.  The 'corSymm' part specifies a
>>> symmetric correlation matrix with 1 on the diagonal.  Looking at the
>> MIXED
>>> parameters, the results are given a covariances, not correlations.  Note
>>> how the variances UN(1,1), UN(2,2,), etc are different, not constant
>> along
>>> the diagonal.  We use the 'weights' statement to specify different
>> stratum
>>> variances.
>>>
>>> m1 <- gls(distance ~ Sex*age,
>>>          correlation=corSymm(form = ~ 1 | Subject),
>>>          weights = varIdent(form = ~ 1 | age),
>>>          data = orth)
>>>
>>> Now we have to convert the correlations and standard deviations of lme
>> into
>>> covariance parameters of MIXED.
>>>
>>> First, extract the correlation matrix from lme.
>>>
>>> cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
>>>
>>> Now, hard-code the stratum std deviations and square them to get
>>> variances.  Multiply by the square of the residual std dev.  (There's a
>> way
>>> to extract all these values from the fitted model instead of hand-typing
>>> them, but I can't find them at the moment.)
>>>
>>> vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
>>>
>>> Now create a matrix of variances and covariances.
>>>
>>> covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
>>>
>>> Finally, multiply the correlations by the covariances.  Let's compare
>>> results:
>>>
>>> PROC MIXED:
>>>
>>> Row Col1 Col2 Col3 Col4
>>> 1 5.415 2.716 3.910 2.710
>>> 2 2.716 4.184 2.927 3.317
>>> 3 3.910 2.927 6.455 4.130
>>> 4 2.710 3.317 4.130 4.985
>>>
>>> R> round(cors*covs,3)
>>>      [,1]  [,2]  [,3]  [,4]
>>> [1,] 5.425 2.709 3.841 2.715
>>> [2,] 2.709 4.191 2.975 3.314
>>> [3,] 3.841 2.975 6.263 4.133
>>> [4,] 2.715 3.314 4.133 4.986
>>>
>>> In my experience, the small differences in the results are typical for
>>> unstructured models.
>>>
>>> Hope this helps.
>>>
>>> Kevin Wright
>>>
>>>
>>> On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu
>>> wrote:
>>>
>>>> Greetings,
>>>>
>>>> I have been working on R for some time now and I have begun the
>> endeavor of
>>>> trying to replicate some SAS code in R.  I have scoured the forums but
>>>> haven't been able to find an answer.  I hope one of you could be so
>> kind as
>>>> to enlighten me.
>>>>
>>>> I am attempting to replicate a repeated measures experiment using some
>>>> standard data.  I have posted the SAS code and output directly from a
>>>> publication as well as my attempts in R to replicate it.  My main issue
>>>> comes with the 'unstructured' component.
>>>>
>>>> The 'dental' dataset from 'mixedQF' package,
>>>> equivalent to formixed data in SAS
>>>>
>>>>   distance age Subject    Sex
>>>> 1       26.0   8     M01   Male
>>>> 2       25.0  10     M01   Male
>>>> 3       29.0  12     M01   Male
>>>> 4       31.0  14     M01   Male
>>>> 5       21.5   8     M02   Male
>>>> 6       22.5  10     M02   Male
>>>> 7       23.0  12     M02   Male
>>>> 8       26.5  14     M02   Male
>>>> 9       23.0   8     M03   Male
>>>> 10      22.5  10     M03   Male
>>>> 11      24.0  12     M03   Male
>>>> 12      27.5  14     M03   Male
>>>> 13      25.5   8     M04   Male
>>>> 14      27.5  10     M04   Male
>>>> 15      26.5  12     M04   Male
>>>> 16      27.0  14     M04   Male
>>>> 17      20.0   8     M05   Male
>>>> 18      23.5  10     M05   Male
>>>> 19      22.5  12     M05   Male
>>>> 20      26.0  14     M05   Male
>>>> 21      24.5   8     M06   Male
>>>> 22      25.5  10     M06   Male
>>>> 23      27.0  12     M06   Male
>>>> 24      28.5  14     M06   Male
>>>> 25      22.0   8     M07   Male
>>>> 26      22.0  10     M07   Male
>>>> 27      24.5  12     M07   Male
>>>> 28      26.5  14     M07   Male
>>>> 29      24.0   8     M08   Male
>>>> 30      21.5  10     M08   Male
>>>> 31      24.5  12     M08   Male
>>>> 32      25.5  14     M08   Male
>>>> 33      23.0   8     M09   Male
>>>> 34      20.5  10     M09   Male
>>>> 35      31.0  12     M09   Male
>>>> 36      26.0  14     M09   Male
>>>> 37      27.5   8     M10   Male
>>>> 38      28.0  10     M10   Male
>>>> 39      31.0  12     M10   Male
>>>> 40      31.5  14     M10   Male
>>>> 41      23.0   8     M11   Male
>>>> 42      23.0  10     M11   Male
>>>> 43      23.5  12     M11   Male
>>>> 44      25.0  14     M11   Male
>>>> 45      21.5   8     M12   Male
>>>> 46      23.5  10     M12   Male
>>>> 47      24.0  12     M12   Male
>>>> 48      28.0  14     M12   Male
>>>> 49      17.0   8     M13   Male
>>>> 50      24.5  10     M13   Male
>>>> 51      26.0  12     M13   Male
>>>> 52      29.5  14     M13   Male
>>>> 53      22.5   8     M14   Male
>>>> 54      25.5  10     M14   Male
>>>> 55      25.5  12     M14   Male
>>>> 56      26.0  14     M14   Male
>>>> 57      23.0   8     M15   Male
>>>> 58      24.5  10     M15   Male
>>>> 59      26.0  12     M15   Male
>>>> 60      30.0  14     M15   Male
>>>> 61      22.0   8     M16   Male
>>>> 62      21.5  10     M16   Male
>>>> 63      23.5  12     M16   Male
>>>> 64      25.0  14     M16   Male
>>>> 65      21.0   8     F01 Female
>>>> 66      20.0  10     F01 Female
>>>> 67      21.5  12     F01 Female
>>>> 68      23.0  14     F01 Female
>>>> 69      21.0   8     F02 Female
>>>> 70      21.5  10     F02 Female
>>>> 71      24.0  12     F02 Female
>>>> 72      25.5  14     F02 Female
>>>> 73      20.5   8     F03 Female
>>>> 74      24.0  10     F03 Female
>>>> 75      24.5  12     F03 Female
>>>> 76      26.0  14     F03 Female
>>>> 77      23.5   8     F04 Female
>>>> 78      24.5  10     F04 Female
>>>> 79      25.0  12     F04 Female
>>>> 80      26.5  14     F04 Female
>>>> 81      21.5   8     F05 Female
>>>> 82      23.0  10     F05 Female
>>>> 83      22.5  12     F05 Female
>>>> 84      23.5  14     F05 Female
>>>> 85      20.0   8     F06 Female
>>>> 86      21.0  10     F06 Female
>>>> 87      21.0  12     F06 Female
>>>> 88      22.5  14     F06 Female
>>>> 89      21.5   8     F07 Female
>>>> 90      22.5  10     F07 Female
>>>> 91      23.0  12     F07 Female
>>>> 92      25.0  14     F07 Female
>>>> 93      23.0   8     F08 Female
>>>> 94      23.0  10     F08 Female
>>>> 95      23.5  12     F08 Female
>>>> 96      24.0  14     F08 Female
>>>> 97      20.0   8     F09 Female
>>>> 98      21.0  10     F09 Female
>>>> 99      22.0  12     F09 Female
>>>> 100     21.5  14     F09 Female
>>>> 101     16.5   8     F10 Female
>>>> 102     19.0  10     F10 Female
>>>> 103     19.0  12     F10 Female
>>>> 104     19.5  14     F10 Female
>>>> 105     24.5   8     F11 Female
>>>> 106     25.0  10     F11 Female
>>>> 107     28.0  12     F11 Female
>>>> 108     28.0  14     F11 Female
>>>>
>>>> *Mixed modeling and fixed effect test*
>>>> SAS
>>>> proc mixed data=formixed;
>>>> class gender age person;
>>>> model y = gender|age;
>>>> repeated / type=cs sub=person;
>>>> run;
>>>>
>>>> output of interest to me
>>>>         Tests of Fixed Effects
>>>> Source             NDF   DDF    Type III F    Pr > F
>>>> GENDER           1        25        9.29        0.0054
>>>> AGE                  3        75       35.35       0.0001
>>>> GENDER*AGE   3        75        2.36        0.0781
>>>>
>>>> R (nlme package)
>>>> y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
>>>> anova(y)
>>>>
>>>>           numDF denDF  F-value p-value
>>>> (Intercept)     1    75 4123.156  <.0001
>>>> Sex              1    25    9.292  0.0054
>>>> age               3    75   40.032  <.0001
>>>> Sex:age        3    75    2.362  0.0781
>>>>
>>>> Now this isn't exact but it is extremely close, however when I try to
>>>> replicate the unstructured,
>>>>
>>>> SAS
>>>> proc mixed data=formixed;
>>>> class gender age person;
>>>> model y = gender|age;
>>>> repeated / type=un sub=person;
>>>> run;
>>>>
>>>>            Tests of Fixed Effects
>>>> Source          NDF DDF Type III F Pr > F
>>>> GENDER         1    25     9.29    0.0054
>>>> AGE                3    25    34.45   0.0001
>>>> GENDER*AGE 3    25     2.93    0.0532
>>>>
>>>> R
>>>> either
>>>> y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
>>>> data=dental)
>>>> anova(y)
>>>> or
>>>> z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(),
>> data=dental)
>>>> anova(z)
>>>>
>>>> gives the output
>>>>
>>>>           numDF denDF  F-value    p-value
>>>> (Intercept)     1    75     4052.028  <.0001
>>>> Sex              1    25       8.462      0.0075
>>>> age               3    75      39.022    <.0001
>>>> Sex:age        3    75       2.868      0.0421
>>>>
>>>> What am I doing wrong to replicate the unstructured linear mixed model
>> from
>>>> SAS?
>>>>
>>>> Regards,
>>>>
>>>> Charles
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>
>>> --
>>> Kevin Wright
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.