[R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R

John Maindonald john.maindonald at anu.edu.au
Fri Jan 27 07:10:18 CET 2012


I've twigged that "unstructured" means a variance-covariance matrix 
that ignores the time structure, i.e., each element is estimated 
separately.  One can do this in lme4, thus:

Orthodont$Age <- factor(Orthodont$age)
> orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject), 
+ data=Orthodont)
> anova(orth.lmer)
Analysis of Variance Table
        Df Sum Sq Mean Sq F value
Sex      1  9.591  9.5905 33.9645
Age      3 34.876 11.6252 41.1703
Sex:Age  3  2.930  0.9768  3.4594

The sum of squares for the Sex*Age interaction agrees with that from SAS.
lmer() leaves the user to work out an appropriate df for the F-statistic.  One
has, in agreement with the SAS output:
> 1-pf(2.93, 3, 25)
[1] 0.05318945

The main effects SS and F's are of course not expected to agree and 
indeed, in my strong view, true SAS type III SS's are inappropriate.

I regard this R analysis however as an inelegant use of the data, with low
power.  My book with John Braun (3rd edn and if I recall correctly, 2nd)
has an analysis that I am prepared to defend that uses lmer() from lme4.
Several years ago, I placed on the web a version of the analysis that uses
 lme() from nlme:
   http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
I'd forgotten that it was there.

This is a list for R users.  You will not necessarily find folk here with a high
level of SAS expertise, maybe not even enough to understand what the
SAS documentation means when it uses the term "unstructured".  Charles,
I think you've done pretty well in the amount of free advice that you have
received.  I think too that Steven McKinney's point is well made.  Kevin's
interpretation of the EULA is probably more or less correct, but the 
document comes across rather more ferociously than he suggests, and it
is not a good starting point for scientific assessment of the respective 
methodologies.  Many of us have other reasons why we are not very
interested in SAS does.  The implied threat in the EULA provides, even
if we are not party to any such agreement, just one more reason why SAS
is not to any large extent in our path ahead to the future.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

On 27/01/2012, at 4:03 PM, Kevin Wright wrote:

> Dear Charles,
> 
> First, I hope you are not put off by the tone of some the responses in this
> email thread.  Sometimes people have strong opinions and it might come
> across aggressively.
> 
> Second, be sure to understand that reproducing a SAS analysis with lme in
> no way violates any legal agreements that SAS may have, if for no other
> reason than you never signed an agreement with SAS!  That bit in the EULA
> about decompiling and reverse engineering means that people are prohibited
> from creating a new version of PROC MIXED that does the same thing.  The
> nlme package uses different methods than SAS. E.g. different optimizers,
> even uses a log-parameterization deep in the code so that negative variance
> components cannot happen.
> 
> Third, by now you've probably figured out that PROC MIXED and lme have very
> different ideas about degrees of freedom.  Also, the loglikelihoods are on
> different scales.  For that reason, when I try to reproduce an analysis, I
> find the best way to compare is to look at the variance components.
> 
> Here is what PROC MIXED says:
> 
>    Cov Parm Estimate     Std Error       Z  Pr > |Z|
> UN(1,1)     5.41545455    1.53172185    3.54    0.0004
> UN(2,1)     2.71681818    1.09623989    2.48    0.0132
> UN(2,2)     4.18477273    1.18363247    3.54    0.0004
> UN(3,1)     3.91022727    1.41775367    2.76    0.0058
> UN(3,2)     2.92715909    1.19304751    2.45    0.0141
> UN(3,3)     6.45573864    1.82595863    3.54    0.0004
> UN(4,1)     2.71022727    1.17209851    2.31    0.0208
> UN(4,2)     3.31715909    1.12903016    2.94    0.0033
> UN(4,3)     4.13073864    1.40356157    2.94    0.0033
> UN(4,4)     4.98573864    1.41017984    3.54    0.0004
> Residual         1.00000000 .       .         .
> 
> That's our target, and here is the lme code to get us there.
> 
> library(nlme)
> orth <- as.data.frame(Orthodont)
> m1 <- gls(distance ~ Sex*age,
>          correlation=corSymm(form = ~ 1 | Subject),
>          weights = varIdent(form = ~ 1 | age),
>          data = orth)
> cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
> vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
> covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
> cors*covs
> 
> Now, some explanations will surely help, so let's step through the code
> with comments.
> 
> The Orthodont data is a groupedData object, which can be helpful, but
> sometimes confusing, so coercing to a data.frame removes the formula--you
> can see it with 'formula(Orthodont)'.
> 
> orth <- as.data.frame(Orthodont)
> 
> Since there is no "random" line in the PROC MIXED code, we don't want to
> use 'lme' (also why I removed the formula from the data), but instead use
> 'gls' for generalized least squares.  The 'corSymm' part specifies a
> symmetric correlation matrix with 1 on the diagonal.  Looking at the MIXED
> parameters, the results are given a covariances, not correlations.  Note
> how the variances UN(1,1), UN(2,2,), etc are different, not constant along
> the diagonal.  We use the 'weights' statement to specify different stratum
> variances.
> 
> m1 <- gls(distance ~ Sex*age,
>          correlation=corSymm(form = ~ 1 | Subject),
>          weights = varIdent(form = ~ 1 | age),
>          data = orth)
> 
> Now we have to convert the correlations and standard deviations of lme into
> covariance parameters of MIXED.
> 
> First, extract the correlation matrix from lme.
> 
> cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
> 
> Now, hard-code the stratum std deviations and square them to get
> variances.  Multiply by the square of the residual std dev.  (There's a way
> to extract all these values from the fitted model instead of hand-typing
> them, but I can't find them at the moment.)
> 
> vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
> 
> Now create a matrix of variances and covariances.
> 
> covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
> 
> Finally, multiply the correlations by the covariances.  Let's compare
> results:
> 
> PROC MIXED:
> 
> Row Col1 Col2 Col3 Col4
> 1 5.415 2.716 3.910 2.710
> 2 2.716 4.184 2.927 3.317
> 3 3.910 2.927 6.455 4.130
> 4 2.710 3.317 4.130 4.985
> 
> R> round(cors*covs,3)
>      [,1]  [,2]  [,3]  [,4]
> [1,] 5.425 2.709 3.841 2.715
> [2,] 2.709 4.191 2.975 3.314
> [3,] 3.841 2.975 6.263 4.133
> [4,] 2.715 3.314 4.133 4.986
> 
> In my experience, the small differences in the results are typical for
> unstructured models.
> 
> Hope this helps.
> 
> Kevin Wright
> 
> 
> On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu>wrote:
> 
>> Greetings,
>> 
>> I have been working on R for some time now and I have begun the endeavor of
>> trying to replicate some SAS code in R.  I have scoured the forums but
>> haven't been able to find an answer.  I hope one of you could be so kind as
>> to enlighten me.
>> 
>> I am attempting to replicate a repeated measures experiment using some
>> standard data.  I have posted the SAS code and output directly from a
>> publication as well as my attempts in R to replicate it.  My main issue
>> comes with the 'unstructured' component.
>> 
>> The 'dental' dataset from 'mixedQF' package,
>> equivalent to formixed data in SAS
>> 
>>   distance age Subject    Sex
>> 1       26.0   8     M01   Male
>> 2       25.0  10     M01   Male
>> 3       29.0  12     M01   Male
>> 4       31.0  14     M01   Male
>> 5       21.5   8     M02   Male
>> 6       22.5  10     M02   Male
>> 7       23.0  12     M02   Male
>> 8       26.5  14     M02   Male
>> 9       23.0   8     M03   Male
>> 10      22.5  10     M03   Male
>> 11      24.0  12     M03   Male
>> 12      27.5  14     M03   Male
>> 13      25.5   8     M04   Male
>> 14      27.5  10     M04   Male
>> 15      26.5  12     M04   Male
>> 16      27.0  14     M04   Male
>> 17      20.0   8     M05   Male
>> 18      23.5  10     M05   Male
>> 19      22.5  12     M05   Male
>> 20      26.0  14     M05   Male
>> 21      24.5   8     M06   Male
>> 22      25.5  10     M06   Male
>> 23      27.0  12     M06   Male
>> 24      28.5  14     M06   Male
>> 25      22.0   8     M07   Male
>> 26      22.0  10     M07   Male
>> 27      24.5  12     M07   Male
>> 28      26.5  14     M07   Male
>> 29      24.0   8     M08   Male
>> 30      21.5  10     M08   Male
>> 31      24.5  12     M08   Male
>> 32      25.5  14     M08   Male
>> 33      23.0   8     M09   Male
>> 34      20.5  10     M09   Male
>> 35      31.0  12     M09   Male
>> 36      26.0  14     M09   Male
>> 37      27.5   8     M10   Male
>> 38      28.0  10     M10   Male
>> 39      31.0  12     M10   Male
>> 40      31.5  14     M10   Male
>> 41      23.0   8     M11   Male
>> 42      23.0  10     M11   Male
>> 43      23.5  12     M11   Male
>> 44      25.0  14     M11   Male
>> 45      21.5   8     M12   Male
>> 46      23.5  10     M12   Male
>> 47      24.0  12     M12   Male
>> 48      28.0  14     M12   Male
>> 49      17.0   8     M13   Male
>> 50      24.5  10     M13   Male
>> 51      26.0  12     M13   Male
>> 52      29.5  14     M13   Male
>> 53      22.5   8     M14   Male
>> 54      25.5  10     M14   Male
>> 55      25.5  12     M14   Male
>> 56      26.0  14     M14   Male
>> 57      23.0   8     M15   Male
>> 58      24.5  10     M15   Male
>> 59      26.0  12     M15   Male
>> 60      30.0  14     M15   Male
>> 61      22.0   8     M16   Male
>> 62      21.5  10     M16   Male
>> 63      23.5  12     M16   Male
>> 64      25.0  14     M16   Male
>> 65      21.0   8     F01 Female
>> 66      20.0  10     F01 Female
>> 67      21.5  12     F01 Female
>> 68      23.0  14     F01 Female
>> 69      21.0   8     F02 Female
>> 70      21.5  10     F02 Female
>> 71      24.0  12     F02 Female
>> 72      25.5  14     F02 Female
>> 73      20.5   8     F03 Female
>> 74      24.0  10     F03 Female
>> 75      24.5  12     F03 Female
>> 76      26.0  14     F03 Female
>> 77      23.5   8     F04 Female
>> 78      24.5  10     F04 Female
>> 79      25.0  12     F04 Female
>> 80      26.5  14     F04 Female
>> 81      21.5   8     F05 Female
>> 82      23.0  10     F05 Female
>> 83      22.5  12     F05 Female
>> 84      23.5  14     F05 Female
>> 85      20.0   8     F06 Female
>> 86      21.0  10     F06 Female
>> 87      21.0  12     F06 Female
>> 88      22.5  14     F06 Female
>> 89      21.5   8     F07 Female
>> 90      22.5  10     F07 Female
>> 91      23.0  12     F07 Female
>> 92      25.0  14     F07 Female
>> 93      23.0   8     F08 Female
>> 94      23.0  10     F08 Female
>> 95      23.5  12     F08 Female
>> 96      24.0  14     F08 Female
>> 97      20.0   8     F09 Female
>> 98      21.0  10     F09 Female
>> 99      22.0  12     F09 Female
>> 100     21.5  14     F09 Female
>> 101     16.5   8     F10 Female
>> 102     19.0  10     F10 Female
>> 103     19.0  12     F10 Female
>> 104     19.5  14     F10 Female
>> 105     24.5   8     F11 Female
>> 106     25.0  10     F11 Female
>> 107     28.0  12     F11 Female
>> 108     28.0  14     F11 Female
>> 
>> *Mixed modeling and fixed effect test*
>> SAS
>> proc mixed data=formixed;
>> class gender age person;
>> model y = gender|age;
>> repeated / type=cs sub=person;
>> run;
>> 
>> output of interest to me
>>         Tests of Fixed Effects
>> Source             NDF   DDF    Type III F    Pr > F
>> GENDER           1        25        9.29        0.0054
>> AGE                  3        75       35.35       0.0001
>> GENDER*AGE   3        75        2.36        0.0781
>> 
>> R (nlme package)
>> y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
>> anova(y)
>> 
>>           numDF denDF  F-value p-value
>> (Intercept)     1    75 4123.156  <.0001
>> Sex              1    25    9.292  0.0054
>> age               3    75   40.032  <.0001
>> Sex:age        3    75    2.362  0.0781
>> 
>> Now this isn't exact but it is extremely close, however when I try to
>> replicate the unstructured,
>> 
>> SAS
>> proc mixed data=formixed;
>> class gender age person;
>> model y = gender|age;
>> repeated / type=un sub=person;
>> run;
>> 
>>            Tests of Fixed Effects
>> Source          NDF DDF Type III F Pr > F
>> GENDER         1    25     9.29    0.0054
>> AGE                3    25    34.45   0.0001
>> GENDER*AGE 3    25     2.93    0.0532
>> 
>> R
>> either
>> y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
>> data=dental)
>> anova(y)
>> or
>> z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(), data=dental)
>> anova(z)
>> 
>> gives the output
>> 
>>           numDF denDF  F-value    p-value
>> (Intercept)     1    75     4052.028  <.0001
>> Sex              1    25       8.462      0.0075
>> age               3    75      39.022    <.0001
>> Sex:age        3    75       2.868      0.0421
>> 
>> What am I doing wrong to replicate the unstructured linear mixed model from
>> SAS?
>> 
>> Regards,
>> 
>> Charles
>> 
>>       [[alternative HTML version deleted]]
>> 
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> 
> 
> 
> 
> -- 
> Kevin Wright
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models




More information about the R-sig-mixed-models mailing list