[R-sig-ME] Repeated measures mixed model with AR(1) correlation structure in nlme vs SAS Proc Mixed

Mon Mar 28 00:00:47 CEST 2011

  This makes a really nice comparison, thank you.

  It makes me think that there is some difference that we haven't yet
figured out in the way that lme and PROC MIXED are defining the model,
although nothing springs to mind, although I can't see what it would be.

  As far as I can tell, both are fitting a fixed effect of Response ~
Day where Day is a categorical predictor; both are treating Subject as a
random effect (affecting the intercept only); both are assuming AR(1)
autocorrelation in the residuals, applying only within-subject; both are
using REML ... ?

  I agree with your points about taking account of the gap between
time=3 and time=6 -- but I think the main point here is the narrower one
of "what are SAS and R doing differently for this particular (even if
sub-optimal) model"?

  What is the meaning of "Columns in X 5; Columns in Z 0" in the SAS
output;  I would have thought X (fixed effect design matrix) would have
4 columns, Z (random effect design matrix) would have 10 ... ?

On 11-03-27 03:13 PM, Jim Baldwin wrote:
> 
> I've added equivalent loop in SAS to obtain the equivalent output as
> Prof. Bolker has below in R.  The SAS code is as follows:
> 
> 
> *%macro**/loop/*;
>   %doi=*0*%to*100*;
>   %letrho = %SYSEVALF(0.2 + 0.2*&i/100);
>   proc mixed data=sasdata method=REML itdetails;
>     class Subject Day;
>     model Response = Day / solution;
>     ** Set initial values for variance components and*
> *         hold the correlation coefficient to a constant;*
>     parms (&rho) (*0.06330649*) / hold=*1*;
>     repeated / subject = Subject type=ar(*1*);
>     ** Keep a copy of the fit statistics;*
>     ods output fitstatistics=fs1;
>   run;
>   ** Grab just what is needed from the fit statistics dataset;*
>   data fs1;
>     set fs1;
>     keep rho M2LogL;
>     if descr = "-2 Res Log Likelihood";
>     M2LogL = value;
>     rho = ρ
>   run;
>   ** Merge with overall summary;*
>   data fs;
>     set fs fs1;
>         keep M2LogL rho;
>         if rho = *.*then delete;
>   run;
> 
>   %end;
> *%mend*loop;
> 
> ** Initialize dataset to hold rho and -2*logL;*
>   *data*fs; *run*;
> 
> ** Run through loop and print out results;*
>   %*/loop/*;
>   *proc**print*data=fs;
>   *run*;
> 
> 
> The resulting values are
> 
> M2LogL = c(12.1794,12.1641,12.1491,12.1342,12.1196,12.1051,12.0909,12.0769,
>  
> 12.0631,12.0494,12.036,12.0229,12.0099,11.9971,11.9845,11.9722,11.96,11.9481,
> 
>  
> 11.9364,11.9249,11.9136,11.9025,11.8916,11.881,11.8706,11.8603,11.8503,11.8405,
> 
>  
> 11.831,11.8216,11.8125,11.8036,11.7949,11.7864,11.7781,11.7701,11.7623,11.7547,
> 
>  
> 11.7473,11.7402,11.7332,11.7265,11.72,11.7138,11.7078,11.7019,11.6964,11.691,
> 
>  
> 11.6859,11.681,11.6763,11.6719,11.6677,11.6637,11.6599,11.6564,11.6531,11.6501,
> 
>  
> 11.6473,11.6447,11.6423,11.6402,11.6383,11.6366,11.6352,11.634,11.6331,11.6324,
> 
>  
> 11.6319,11.6317,11.6317,11.632,11.6325,11.6332,11.6342,11.6354,11.6369,11.6386,
> 
>  
> 11.6405,11.6427,11.6452,11.6478,11.6508,11.654,11.6574,11.6611,11.665,11.6692,
> 
>  
> 11.6736,11.6783,11.6832,11.6884,11.6939,11.6996,11.7055,11.7117,11.7182,11.7249,
> 
>   11.7319,11.7391,11.7466)
> 
> rho = 0.2 + 0.2*c(0:100)/100
> 
> and these can be added to the last plot from below as
> 
>    lines(rho,M2LogL,col="red")
> 
> One can then see after adding in this curve that the -2 log L values are
> nearly idential above the estimate given by PROC MIXED but become more
> and more different when farther from that value.
> 
> Jim
> 
> Jim Baldwin
> Station Statistician
> Pacific Southwest Research Station
> USDA Forest Service
> Albany, California
> 510-559-6332
> 
> P.S. (This is probably more SAS code than some (all?) of you would have
> ever
> wanted to see in your life but such adversity maybe builds character?)
> 
> 
> 
> *Ben Bolker <bbolker at gmail.com>*
> Sent by: r-sig-mixed-models-bounces at r-project.org
> 
> 03/27/2011 09:25 AM
> 
> 	
> To
> 	r-sig-mixed-models at r-project.org
> cc
> 	
> Subject
> 	Re: [R-sig-ME] Repeated measures mixed model with AR(1) correlation    
>    structure in nlme vs SAS Proc Mixed
> 
> 
> 	
> 
> 
> 
> 
> 
> Simon Bate <simontbate at ...> writes:
> 
>>  Hi all, I don't know if anyone has any thoughts on this, I suspect
>> it is a local convergence issue but am not sure. I have been trying
>> to move from SAS Proc Mixed to R nlme and have an unusual result.  I
>> have several subjects measured at four time points. I want to model
>> the within-subject correlation using an autoregressive
>> structure. I've attached the R and SAS code I'm using along with the
>> results from SAS.  With R lme I get an estimate of the
>> autoregressive parameter phi = 0.2782601, whereas SAS gives me an
>> estimate of 0.3389 Intriguingly if I include a between subject
>> factor, a covariate or delete one of the observations, then the
>> results appear to agree.  I'm surprised the seemingly simpler model
>> if different between the two implementations whereas the more
>> complex models agree.  Any ideas would be most welcome!  Regards,
>> Simon
> 
>  Hmmm.  I do agree it could be a convergence problem of some sort.
>  I agree that it's worth trying to figure out what the difference
> is, but I think it's going to be hard.  I went through some basic
> sanity checking with R (see below) and don't see an obvious problem.
> 
> library(nlme)
> sasdata <- data.frame(Response=c(0.55,0.86,0.21,0.36,0.46,
>                     0.32,0.11,0.24,0.36,0.29,0.48,0.93,
>            0.56,0.67,0.36,0.55,0.51,0.4,0.34,0.51,
>            1,0.61,0.65,0.41,0.99,0.86,0.64,0.86,0.31,
>            0.19,0.21,0.36,0.41,0.47,0.16,0.81,0.9,0.72,0.87,0.02),
>        Subject=c(1,1,1,1,2,2,2,2,3,3,3,3,
>                  4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,
>                  8,8,8,8,9,9,9,9,10,10,10,10),
>    ## or: rep(1:10,each=4)
>         Day=c(1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,
>               1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6))
>    ## or: rep(c(1,2,4,6),4)
> 
> sasdata$Time<-factor(sasdata$Day)
> 
> fit0<-lme(Response~Time, random=~1|Subject,
>          data=sasdata, na.action = (na.omit), method = "REML")
> 
> plot(ACF(fit0),alpha=0.05,grid=TRUE)
> ## it actually looks like AR1 is not a very good model here
> ##   anyway?
> AR1<- update(fit0,
>             correlation=corAR1(form=~as.numeric(Time)|Subject,
>               fixed =FALSE))
> 
> plot(ACF(AR1),alpha=0.05,grid=TRUE)
> ## ACF(AR1) shows that the raw rho(1) is about 0.2
> 
> ## identical (based on order of observations within groups)
> AR1B<- update(fit0,
>             correlation=corAR1(form=~1|Subject,
>               fixed =FALSE))
> 
> 
> ## construct likelihood profile on rho
> rhovec <- seq(-0.99,0.99,by=0.01)
> Lvec <- sapply(rhovec,
>               function(r) {
>                 logLik(update(fit0,
>                      correlation=corAR1(form=~1|Subject,
>                        value=r,
>                        fixed =TRUE)))
>               })
> 
> plot(rhovec,-2*Lvec,type="b")
> abline(h=-2*logLik(AR1)+3.84,lty=2) ## LRT cutoff
> abline(v=c(-0.53,0.82),lty=2)  
> ## CIs on rho from intervals() -- reasonable ballpark
> 
> 
> ## zoom in
> plot(rhovec,-2*Lvec,type="b",ylim=c(11.5,15),xlim=c(-0.5,0.7))
> abline(v=c(0.278,0.339)) ## SAS/R answers
> 
> ## zoom in further
> plot(rhovec,-2*Lvec,type="b",ylim=c(11.6,11.8),xlim=c(0.2,0.4),
>     las=1,bty="l")
> abline(v=c(0.278,0.339),lty=2)
> text(c(0.278,0.339),rep(par("usr")[4],2),c("lme","SAS"),
>         pos=3,xpd=NA)
> abline(h=11.63168,lty=2)
> text(par("usr")[2],11.63168,"SAS",pos=1,xpd=NA)
> 
>  So ...  I don't see anything obviously wrong or weird about
> what R is doing.  You can try
> 
> update(AR1,control=lmeControl(msVerbose=TRUE))
> 
> to see more detail on the optimization, but you will have
> to dig into Pinheiro & Bates 2000 and/or the code and
> documentation for details on the internal parameterizations
> of rho in order to interpret the results ...
> 
> ###
> R Code:
> 
> SAS Code:
> 
> proc mixed;
> class Subject Day;
> model Response = Day / outp=pout;
> repeated Day / subject = Subject type=AR(1);
> run;
> 
> 
> SAS Results:
> 
> 
> Model Information
> 
> Data Set WORK.ALLDATA
> Dependent Variable Response
> Covariance Structure Autoregressive
> Subject Effect Subject
> Estimation Method REML
> Residual Variance Method Profile
> Fixed Effects SE Method Model-Based
> Degrees of Freedom Method Between-Within
> 
> 
> Class Level Information
> 
> Class Levels Values
> Subject 10 1 10 2 3 4 5 6 7 8 9
> Day 4 1 2 3 4
> 
> 
> Dimensions
> 
> Covariance Parameters 2
> Columns in X 5
> Columns in Z 0
> Subjects 10
> Max Obs Per Subject 4
> 
> 
> Number of Observations
> 
> Number of Observations Read 40
> Number of Observations Used 40
> Number of Observations Not Used 0
> 
> 
> Iteration History
> 
> Iteration Evaluations -2 Res Log Like Criterion
> 0 1 14.67045653
> 1 2 11.63168913 0.00000018
> 2 1 11.63168429 0.00000000
> 
> 
> Convergence criteria met.
> 
> 
> 
> Covariance Parameter Estimates
> 
> Cov Parm Subject Estimate
> AR(1) Animal1 0.3389
> Residual 0.06862
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>