[R-sig-ME] Trouble Replicating Unstructured Mixed Procedure in R
i white
i.m.s.white at ed.ac.uk
Fri Jan 27 14:38:46 CET 2012
Charles,
I guess that SAS is using the Satterthwaite formula to approximate
denominator degrees of freedom (google 'Satterthwaite'). See Kenward and
Roger (1997) Biometrics 53 983-997 for another approach.
Charles Determan Jr wrote:
> Thank you John, I truly appreciate all the advice I have received. At no
> point to I feel entitled for someone to provide this for me. I do hope
> that others may find this thread useful as well. If I may just as one more
> question. Is there an accepted way to calculate the degrees of freedom in
> this case to come to the value of 25? This has been something I have been
> trying to determine.
>
> Again, thank you to all for providing me with all the information you have,
>
> Charles
>
> On Fri, Jan 27, 2012 at 12:10 AM, John Maindonald <
> john.maindonald at anu.edu.au> wrote:
>
>> I've twigged that "unstructured" means a variance-covariance matrix
>> that ignores the time structure, i.e., each element is estimated
>> separately. One can do this in lme4, thus:
>>
>> Orthodont$Age <- factor(Orthodont$age)
>>> orth.lmer <- lmer(distance ~ Sex*Age+((Sex*Age)|Subject),
>> + data=Orthodont)
>>> anova(orth.lmer)
>> Analysis of Variance Table
>> Df Sum Sq Mean Sq F value
>> Sex 1 9.591 9.5905 33.9645
>> Age 3 34.876 11.6252 41.1703
>> Sex:Age 3 2.930 0.9768 3.4594
>>
>> The sum of squares for the Sex*Age interaction agrees with that from SAS.
>> lmer() leaves the user to work out an appropriate df for the F-statistic.
>> One
>> has, in agreement with the SAS output:
>>> 1-pf(2.93, 3, 25)
>> [1] 0.05318945
>>
>> The main effects SS and F's are of course not expected to agree and
>> indeed, in my strong view, true SAS type III SS's are inappropriate.
>>
>> I regard this R analysis however as an inelegant use of the data, with low
>> power. My book with John Braun (3rd edn and if I recall correctly, 2nd)
>> has an analysis that I am prepared to defend that uses lmer() from lme4.
>> Several years ago, I placed on the web a version of the analysis that uses
>> lme() from nlme:
>> http://maths.anu.edu.au/~johnm/r-book/xtras/mlm-lme.pdf
>> I'd forgotten that it was there.
>>
>> This is a list for R users. You will not necessarily find folk here with
>> a high
>> level of SAS expertise, maybe not even enough to understand what the
>> SAS documentation means when it uses the term "unstructured". Charles,
>> I think you've done pretty well in the amount of free advice that you have
>> received. I think too that Steven McKinney's point is well made. Kevin's
>> interpretation of the EULA is probably more or less correct, but the
>> document comes across rather more ferociously than he suggests, and it
>> is not a good starting point for scientific assessment of the respective
>> methodologies. Many of us have other reasons why we are not very
>> interested in SAS does. The implied threat in the EULA provides, even
>> if we are not party to any such agreement, just one more reason why SAS
>> is not to any large extent in our path ahead to the future.
>>
>> John Maindonald email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473 fax : +61 2(6125)5549
>> Centre for Mathematics & Its Applications, Room 1194,
>> John Dedman Mathematical Sciences Building (Building 27)
>> Australian National University, Canberra ACT 0200.
>> http://www.maths.anu.edu.au/~johnm
>>
>> On 27/01/2012, at 4:03 PM, Kevin Wright wrote:
>>
>>> Dear Charles,
>>>
>>> First, I hope you are not put off by the tone of some the responses in
>> this
>>> email thread. Sometimes people have strong opinions and it might come
>>> across aggressively.
>>>
>>> Second, be sure to understand that reproducing a SAS analysis with lme in
>>> no way violates any legal agreements that SAS may have, if for no other
>>> reason than you never signed an agreement with SAS! That bit in the EULA
>>> about decompiling and reverse engineering means that people are
>> prohibited
>>> from creating a new version of PROC MIXED that does the same thing. The
>>> nlme package uses different methods than SAS. E.g. different optimizers,
>>> even uses a log-parameterization deep in the code so that negative
>> variance
>>> components cannot happen.
>>>
>>> Third, by now you've probably figured out that PROC MIXED and lme have
>> very
>>> different ideas about degrees of freedom. Also, the loglikelihoods are
>> on
>>> different scales. For that reason, when I try to reproduce an analysis,
>> I
>>> find the best way to compare is to look at the variance components.
>>>
>>> Here is what PROC MIXED says:
>>>
>>> Cov Parm Estimate Std Error Z Pr > |Z|
>>> UN(1,1) 5.41545455 1.53172185 3.54 0.0004
>>> UN(2,1) 2.71681818 1.09623989 2.48 0.0132
>>> UN(2,2) 4.18477273 1.18363247 3.54 0.0004
>>> UN(3,1) 3.91022727 1.41775367 2.76 0.0058
>>> UN(3,2) 2.92715909 1.19304751 2.45 0.0141
>>> UN(3,3) 6.45573864 1.82595863 3.54 0.0004
>>> UN(4,1) 2.71022727 1.17209851 2.31 0.0208
>>> UN(4,2) 3.31715909 1.12903016 2.94 0.0033
>>> UN(4,3) 4.13073864 1.40356157 2.94 0.0033
>>> UN(4,4) 4.98573864 1.41017984 3.54 0.0004
>>> Residual 1.00000000 . . .
>>>
>>> That's our target, and here is the lme code to get us there.
>>>
>>> library(nlme)
>>> orth <- as.data.frame(Orthodont)
>>> m1 <- gls(distance ~ Sex*age,
>>> correlation=corSymm(form = ~ 1 | Subject),
>>> weights = varIdent(form = ~ 1 | age),
>>> data = orth)
>>> cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
>>> vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
>>> covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
>>> cors*covs
>>>
>>> Now, some explanations will surely help, so let's step through the code
>>> with comments.
>>>
>>> The Orthodont data is a groupedData object, which can be helpful, but
>>> sometimes confusing, so coercing to a data.frame removes the formula--you
>>> can see it with 'formula(Orthodont)'.
>>>
>>> orth <- as.data.frame(Orthodont)
>>>
>>> Since there is no "random" line in the PROC MIXED code, we don't want to
>>> use 'lme' (also why I removed the formula from the data), but instead use
>>> 'gls' for generalized least squares. The 'corSymm' part specifies a
>>> symmetric correlation matrix with 1 on the diagonal. Looking at the
>> MIXED
>>> parameters, the results are given a covariances, not correlations. Note
>>> how the variances UN(1,1), UN(2,2,), etc are different, not constant
>> along
>>> the diagonal. We use the 'weights' statement to specify different
>> stratum
>>> variances.
>>>
>>> m1 <- gls(distance ~ Sex*age,
>>> correlation=corSymm(form = ~ 1 | Subject),
>>> weights = varIdent(form = ~ 1 | age),
>>> data = orth)
>>>
>>> Now we have to convert the correlations and standard deviations of lme
>> into
>>> covariance parameters of MIXED.
>>>
>>> First, extract the correlation matrix from lme.
>>>
>>> cors <- corMatrix(m1$modelStruct$corStruct)[[1]]
>>>
>>> Now, hard-code the stratum std deviations and square them to get
>>> variances. Multiply by the square of the residual std dev. (There's a
>> way
>>> to extract all these values from the fitted model instead of hand-typing
>>> them, but I can't find them at the moment.)
>>>
>>> vars <- c(1.0000000, 0.8788793, 1.0744596, 0.9586878)^2 * 2.329213^2
>>>
>>> Now create a matrix of variances and covariances.
>>>
>>> covs <- outer(vars,vars, function(x,y) sqrt(x)*sqrt(y))
>>>
>>> Finally, multiply the correlations by the covariances. Let's compare
>>> results:
>>>
>>> PROC MIXED:
>>>
>>> Row Col1 Col2 Col3 Col4
>>> 1 5.415 2.716 3.910 2.710
>>> 2 2.716 4.184 2.927 3.317
>>> 3 3.910 2.927 6.455 4.130
>>> 4 2.710 3.317 4.130 4.985
>>>
>>> R> round(cors*covs,3)
>>> [,1] [,2] [,3] [,4]
>>> [1,] 5.425 2.709 3.841 2.715
>>> [2,] 2.709 4.191 2.975 3.314
>>> [3,] 3.841 2.975 6.263 4.133
>>> [4,] 2.715 3.314 4.133 4.986
>>>
>>> In my experience, the small differences in the results are typical for
>>> unstructured models.
>>>
>>> Hope this helps.
>>>
>>> Kevin Wright
>>>
>>>
>>> On Tue, Jan 24, 2012 at 8:32 PM, Charles Determan Jr <deter088 at umn.edu
>>> wrote:
>>>
>>>> Greetings,
>>>>
>>>> I have been working on R for some time now and I have begun the
>> endeavor of
>>>> trying to replicate some SAS code in R. I have scoured the forums but
>>>> haven't been able to find an answer. I hope one of you could be so
>> kind as
>>>> to enlighten me.
>>>>
>>>> I am attempting to replicate a repeated measures experiment using some
>>>> standard data. I have posted the SAS code and output directly from a
>>>> publication as well as my attempts in R to replicate it. My main issue
>>>> comes with the 'unstructured' component.
>>>>
>>>> The 'dental' dataset from 'mixedQF' package,
>>>> equivalent to formixed data in SAS
>>>>
>>>> distance age Subject Sex
>>>> 1 26.0 8 M01 Male
>>>> 2 25.0 10 M01 Male
>>>> 3 29.0 12 M01 Male
>>>> 4 31.0 14 M01 Male
>>>> 5 21.5 8 M02 Male
>>>> 6 22.5 10 M02 Male
>>>> 7 23.0 12 M02 Male
>>>> 8 26.5 14 M02 Male
>>>> 9 23.0 8 M03 Male
>>>> 10 22.5 10 M03 Male
>>>> 11 24.0 12 M03 Male
>>>> 12 27.5 14 M03 Male
>>>> 13 25.5 8 M04 Male
>>>> 14 27.5 10 M04 Male
>>>> 15 26.5 12 M04 Male
>>>> 16 27.0 14 M04 Male
>>>> 17 20.0 8 M05 Male
>>>> 18 23.5 10 M05 Male
>>>> 19 22.5 12 M05 Male
>>>> 20 26.0 14 M05 Male
>>>> 21 24.5 8 M06 Male
>>>> 22 25.5 10 M06 Male
>>>> 23 27.0 12 M06 Male
>>>> 24 28.5 14 M06 Male
>>>> 25 22.0 8 M07 Male
>>>> 26 22.0 10 M07 Male
>>>> 27 24.5 12 M07 Male
>>>> 28 26.5 14 M07 Male
>>>> 29 24.0 8 M08 Male
>>>> 30 21.5 10 M08 Male
>>>> 31 24.5 12 M08 Male
>>>> 32 25.5 14 M08 Male
>>>> 33 23.0 8 M09 Male
>>>> 34 20.5 10 M09 Male
>>>> 35 31.0 12 M09 Male
>>>> 36 26.0 14 M09 Male
>>>> 37 27.5 8 M10 Male
>>>> 38 28.0 10 M10 Male
>>>> 39 31.0 12 M10 Male
>>>> 40 31.5 14 M10 Male
>>>> 41 23.0 8 M11 Male
>>>> 42 23.0 10 M11 Male
>>>> 43 23.5 12 M11 Male
>>>> 44 25.0 14 M11 Male
>>>> 45 21.5 8 M12 Male
>>>> 46 23.5 10 M12 Male
>>>> 47 24.0 12 M12 Male
>>>> 48 28.0 14 M12 Male
>>>> 49 17.0 8 M13 Male
>>>> 50 24.5 10 M13 Male
>>>> 51 26.0 12 M13 Male
>>>> 52 29.5 14 M13 Male
>>>> 53 22.5 8 M14 Male
>>>> 54 25.5 10 M14 Male
>>>> 55 25.5 12 M14 Male
>>>> 56 26.0 14 M14 Male
>>>> 57 23.0 8 M15 Male
>>>> 58 24.5 10 M15 Male
>>>> 59 26.0 12 M15 Male
>>>> 60 30.0 14 M15 Male
>>>> 61 22.0 8 M16 Male
>>>> 62 21.5 10 M16 Male
>>>> 63 23.5 12 M16 Male
>>>> 64 25.0 14 M16 Male
>>>> 65 21.0 8 F01 Female
>>>> 66 20.0 10 F01 Female
>>>> 67 21.5 12 F01 Female
>>>> 68 23.0 14 F01 Female
>>>> 69 21.0 8 F02 Female
>>>> 70 21.5 10 F02 Female
>>>> 71 24.0 12 F02 Female
>>>> 72 25.5 14 F02 Female
>>>> 73 20.5 8 F03 Female
>>>> 74 24.0 10 F03 Female
>>>> 75 24.5 12 F03 Female
>>>> 76 26.0 14 F03 Female
>>>> 77 23.5 8 F04 Female
>>>> 78 24.5 10 F04 Female
>>>> 79 25.0 12 F04 Female
>>>> 80 26.5 14 F04 Female
>>>> 81 21.5 8 F05 Female
>>>> 82 23.0 10 F05 Female
>>>> 83 22.5 12 F05 Female
>>>> 84 23.5 14 F05 Female
>>>> 85 20.0 8 F06 Female
>>>> 86 21.0 10 F06 Female
>>>> 87 21.0 12 F06 Female
>>>> 88 22.5 14 F06 Female
>>>> 89 21.5 8 F07 Female
>>>> 90 22.5 10 F07 Female
>>>> 91 23.0 12 F07 Female
>>>> 92 25.0 14 F07 Female
>>>> 93 23.0 8 F08 Female
>>>> 94 23.0 10 F08 Female
>>>> 95 23.5 12 F08 Female
>>>> 96 24.0 14 F08 Female
>>>> 97 20.0 8 F09 Female
>>>> 98 21.0 10 F09 Female
>>>> 99 22.0 12 F09 Female
>>>> 100 21.5 14 F09 Female
>>>> 101 16.5 8 F10 Female
>>>> 102 19.0 10 F10 Female
>>>> 103 19.0 12 F10 Female
>>>> 104 19.5 14 F10 Female
>>>> 105 24.5 8 F11 Female
>>>> 106 25.0 10 F11 Female
>>>> 107 28.0 12 F11 Female
>>>> 108 28.0 14 F11 Female
>>>>
>>>> *Mixed modeling and fixed effect test*
>>>> SAS
>>>> proc mixed data=formixed;
>>>> class gender age person;
>>>> model y = gender|age;
>>>> repeated / type=cs sub=person;
>>>> run;
>>>>
>>>> output of interest to me
>>>> Tests of Fixed Effects
>>>> Source NDF DDF Type III F Pr > F
>>>> GENDER 1 25 9.29 0.0054
>>>> AGE 3 75 35.35 0.0001
>>>> GENDER*AGE 3 75 2.36 0.0781
>>>>
>>>> R (nlme package)
>>>> y=lme(distance~Sex*age, random=(~1|Subject), data=dental)
>>>> anova(y)
>>>>
>>>> numDF denDF F-value p-value
>>>> (Intercept) 1 75 4123.156 <.0001
>>>> Sex 1 25 9.292 0.0054
>>>> age 3 75 40.032 <.0001
>>>> Sex:age 3 75 2.362 0.0781
>>>>
>>>> Now this isn't exact but it is extremely close, however when I try to
>>>> replicate the unstructured,
>>>>
>>>> SAS
>>>> proc mixed data=formixed;
>>>> class gender age person;
>>>> model y = gender|age;
>>>> repeated / type=un sub=person;
>>>> run;
>>>>
>>>> Tests of Fixed Effects
>>>> Source NDF DDF Type III F Pr > F
>>>> GENDER 1 25 9.29 0.0054
>>>> AGE 3 25 34.45 0.0001
>>>> GENDER*AGE 3 25 2.93 0.0532
>>>>
>>>> R
>>>> either
>>>> y=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(,~1|Subject),
>>>> data=dental)
>>>> anova(y)
>>>> or
>>>> z=lme(distance~Sex*age, random=(~1|Subject), corr=corSymm(),
>> data=dental)
>>>> anova(z)
>>>>
>>>> gives the output
>>>>
>>>> numDF denDF F-value p-value
>>>> (Intercept) 1 75 4052.028 <.0001
>>>> Sex 1 25 8.462 0.0075
>>>> age 3 75 39.022 <.0001
>>>> Sex:age 3 75 2.868 0.0421
>>>>
>>>> What am I doing wrong to replicate the unstructured linear mixed model
>> from
>>>> SAS?
>>>>
>>>> Regards,
>>>>
>>>> Charles
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> R-sig-mixed-models at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>>
>>>
>>>
>>> --
>>> Kevin Wright
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list