[R-sig-ME] Repeated measures mixed model with AR(1) correlation structure in nlme vs SAS Proc Mixed

Mon Mar 28 03:23:00 CEST 2011

  Aha!

  In the specification described here, R includes a G-side random effect
of subject.  (lme cannot handle models with no G-side effect.)

   gls(Response~Time,
          data=sasdata, na.action = (na.omit), method = "REML",
                        correlation=corAR1(form=~1|Subject))

  gives the same answers as SAS, both for the point estimate and for the
likelihood profile.

  Ben Bolker

On 11-03-27 06:57 PM, Jim Baldwin wrote:
> 
> I agree the essential issue is as you state.
> 
> PROC MIXED has "Columns in X: 5" meaning an intercept and a
> value for each one of the time periods.  The fixed effect for the last
> time period "Day = 6" is (internally) set to zero resulting in just 4
> values to
> estimate.  (Using the NOINT - for no intercept - option results in
> "Columns in X: 4" and the same exact variance components.)  
> The "Columns in Z: 0" has to do with PROC MIXED having no
> G-side random effects and only R-side random effects for this
> particular example.  So the 2 variance components (rho and
> sigma) are modeled in E in the formulation
> 
>                Y = X*beta + Z*gamma + E
> 
> Jim
> 
> Jim Baldwin
> Station Statistician
> Pacific Southwest Research Station
> USDA Forest Service
> Albany, California
> 510-559-6332
> 
> 
> *Ben Bolker <bbolker at gmail.com>*
> 
> 03/27/2011 03:00 PM
> 
> 	
> To
> 	Jim Baldwin <jbaldwin at fs.fed.us>
> cc
> 	r-sig-mixed-models at r-project.org
> Subject
> 	Re: [R-sig-ME] Repeated measures mixed model with AR(1) correlation
> structure in nlme vs SAS Proc Mixed
> 
> 
> 	
> 
> 
> 
> 
> 
>   This makes a really nice comparison, thank you.
> 
>  It makes me think that there is some difference that we haven't yet
> figured out in the way that lme and PROC MIXED are defining the model,
> although nothing springs to mind, although I can't see what it would be.
> 
>  As far as I can tell, both are fitting a fixed effect of Response ~
> Day where Day is a categorical predictor; both are treating Subject as a
> random effect (affecting the intercept only); both are assuming AR(1)
> autocorrelation in the residuals, applying only within-subject; both are
> using REML ... ?
> 
>  I agree with your points about taking account of the gap between
> time=3 and time=6 -- but I think the main point here is the narrower one
> of "what are SAS and R doing differently for this particular (even if
> sub-optimal) model"?
> 
>  What is the meaning of "Columns in X 5; Columns in Z 0" in the SAS
> output;  I would have thought X (fixed effect design matrix) would have
> 4 columns, Z (random effect design matrix) would have 10 ... ?
> 
> On 11-03-27 03:13 PM, Jim Baldwin wrote:
>>
>> I've added equivalent loop in SAS to obtain the equivalent output as
>> Prof. Bolker has below in R.  The SAS code is as follows:
>>
>>
>> *%macro**/loop/*;
>>   %doi=*0*%to*100*;
>>   %letrho = %SYSEVALF(0.2 + 0.2*&i/100);
>>   proc mixed data=sasdata method=REML itdetails;
>>     class Subject Day;
>>     model Response = Day / solution;
>>     ** Set initial values for variance components and*
>> *         hold the correlation coefficient to a constant;*
>>     parms (&rho) (*0.06330649*) / hold=*1*;
>>     repeated / subject = Subject type=ar(*1*);
>>     ** Keep a copy of the fit statistics;*
>>     ods output fitstatistics=fs1;
>>   run;
>>   ** Grab just what is needed from the fit statistics dataset;*
>>   data fs1;
>>     set fs1;
>>     keep rho M2LogL;
>>     if descr = "-2 Res Log Likelihood";
>>     M2LogL = value;
>>     rho = ρ
>>   run;
>>   ** Merge with overall summary;*
>>   data fs;
>>     set fs fs1;
>>         keep M2LogL rho;
>>         if rho = *.*then delete;
>>   run;
>>
>>   %end;
>> *%mend*loop;
>>
>> ** Initialize dataset to hold rho and -2*logL;*
>>   *data*fs; *run*;
>>
>> ** Run through loop and print out results;*
>>   %*/loop/*;
>>   *proc**print*data=fs;
>>   *run*;
>>
>>
>> The resulting values are
>>
>> M2LogL =
> c(12.1794,12.1641,12.1491,12.1342,12.1196,12.1051,12.0909,12.0769,
>>  
>>
> 12.0631,12.0494,12.036,12.0229,12.0099,11.9971,11.9845,11.9722,11.96,11.9481,
>>
>>  
>>
> 11.9364,11.9249,11.9136,11.9025,11.8916,11.881,11.8706,11.8603,11.8503,11.8405,
>>
>>  
>>
> 11.831,11.8216,11.8125,11.8036,11.7949,11.7864,11.7781,11.7701,11.7623,11.7547,
>>
>>  
>>
> 11.7473,11.7402,11.7332,11.7265,11.72,11.7138,11.7078,11.7019,11.6964,11.691,
>>
>>  
>>
> 11.6859,11.681,11.6763,11.6719,11.6677,11.6637,11.6599,11.6564,11.6531,11.6501,
>>
>>  
>>
> 11.6473,11.6447,11.6423,11.6402,11.6383,11.6366,11.6352,11.634,11.6331,11.6324,
>>
>>  
>>
> 11.6319,11.6317,11.6317,11.632,11.6325,11.6332,11.6342,11.6354,11.6369,11.6386,
>>
>>  
>>
> 11.6405,11.6427,11.6452,11.6478,11.6508,11.654,11.6574,11.6611,11.665,11.6692,
>>
>>  
>>
> 11.6736,11.6783,11.6832,11.6884,11.6939,11.6996,11.7055,11.7117,11.7182,11.7249,
>>
>>   11.7319,11.7391,11.7466)
>>
>> rho = 0.2 + 0.2*c(0:100)/100
>>
>> and these can be added to the last plot from below as
>>
>>    lines(rho,M2LogL,col="red")
>>
>> One can then see after adding in this curve that the -2 log L values are
>> nearly idential above the estimate given by PROC MIXED but become more
>> and more different when farther from that value.
>>
>> Jim
>>
>> Jim Baldwin
>> Station Statistician
>> Pacific Southwest Research Station
>> USDA Forest Service
>> Albany, California
>> 510-559-6332
>>
>> P.S. (This is probably more SAS code than some (all?) of you would have
>> ever
>> wanted to see in your life but such adversity maybe builds character?)
>>
>>
>>
>> *Ben Bolker <bbolker at gmail.com>*
>> Sent by: r-sig-mixed-models-bounces at r-project.org
>>
>> 03/27/2011 09:25 AM
>>
>>                  
>> To
>>                  r-sig-mixed-models at r-project.org
>> cc
>>                  
>> Subject
>>                  Re: [R-sig-ME] Repeated measures mixed model with
> AR(1) correlation    
>>    structure in nlme vs SAS Proc Mixed
>>
>>
>>                  
>>
>>
>>
>>
>>
>> Simon Bate <simontbate at ...> writes:
>>
>>>  Hi all, I don't know if anyone has any thoughts on this, I suspect
>>> it is a local convergence issue but am not sure. I have been trying
>>> to move from SAS Proc Mixed to R nlme and have an unusual result.  I
>>> have several subjects measured at four time points. I want to model
>>> the within-subject correlation using an autoregressive
>>> structure. I've attached the R and SAS code I'm using along with the
>>> results from SAS.  With R lme I get an estimate of the
>>> autoregressive parameter phi = 0.2782601, whereas SAS gives me an
>>> estimate of 0.3389 Intriguingly if I include a between subject
>>> factor, a covariate or delete one of the observations, then the
>>> results appear to agree.  I'm surprised the seemingly simpler model
>>> if different between the two implementations whereas the more
>>> complex models agree.  Any ideas would be most welcome!  Regards,
>>> Simon
>>
>>  Hmmm.  I do agree it could be a convergence problem of some sort.
>>  I agree that it's worth trying to figure out what the difference
>> is, but I think it's going to be hard.  I went through some basic
>> sanity checking with R (see below) and don't see an obvious problem.
>>
>> library(nlme)
>> sasdata <- data.frame(Response=c(0.55,0.86,0.21,0.36,0.46,
>>                     0.32,0.11,0.24,0.36,0.29,0.48,0.93,
>>            0.56,0.67,0.36,0.55,0.51,0.4,0.34,0.51,
>>            1,0.61,0.65,0.41,0.99,0.86,0.64,0.86,0.31,
>>            0.19,0.21,0.36,0.41,0.47,0.16,0.81,0.9,0.72,0.87,0.02),
>>        Subject=c(1,1,1,1,2,2,2,2,3,3,3,3,
>>                  4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,
>>                  8,8,8,8,9,9,9,9,10,10,10,10),
>>    ## or: rep(1:10,each=4)
>>         Day=c(1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,
>>               1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6))
>>    ## or: rep(c(1,2,4,6),4)
>>
>> sasdata$Time<-factor(sasdata$Day)
>>
>> fit0<-lme(Response~Time, random=~1|Subject,
>>          data=sasdata, na.action = (na.omit), method = "REML")
>>
>> plot(ACF(fit0),alpha=0.05,grid=TRUE)
>> ## it actually looks like AR1 is not a very good model here
>> ##   anyway?
>> AR1<- update(fit0,
>>             correlation=corAR1(form=~as.numeric(Time)|Subject,
>>               fixed =FALSE))
>>
>> plot(ACF(AR1),alpha=0.05,grid=TRUE)
>> ## ACF(AR1) shows that the raw rho(1) is about 0.2
>>
>> ## identical (based on order of observations within groups)
>> AR1B<- update(fit0,
>>             correlation=corAR1(form=~1|Subject,
>>               fixed =FALSE))
>>
>>
>> ## construct likelihood profile on rho
>> rhovec <- seq(-0.99,0.99,by=0.01)
>> Lvec <- sapply(rhovec,
>>               function(r) {
>>                 logLik(update(fit0,
>>                      correlation=corAR1(form=~1|Subject,
>>                        value=r,
>>                        fixed =TRUE)))
>>               })
>>
>> plot(rhovec,-2*Lvec,type="b")
>> abline(h=-2*logLik(AR1)+3.84,lty=2) ## LRT cutoff
>> abline(v=c(-0.53,0.82),lty=2)  
>> ## CIs on rho from intervals() -- reasonable ballpark
>>
>>
>> ## zoom in
>> plot(rhovec,-2*Lvec,type="b",ylim=c(11.5,15),xlim=c(-0.5,0.7))
>> abline(v=c(0.278,0.339)) ## SAS/R answers
>>
>> ## zoom in further
>> plot(rhovec,-2*Lvec,type="b",ylim=c(11.6,11.8),xlim=c(0.2,0.4),
>>     las=1,bty="l")
>> abline(v=c(0.278,0.339),lty=2)
>> text(c(0.278,0.339),rep(par("usr")[4],2),c("lme","SAS"),
>>         pos=3,xpd=NA)
>> abline(h=11.63168,lty=2)
>> text(par("usr")[2],11.63168,"SAS",pos=1,xpd=NA)
>>
>>  So ...  I don't see anything obviously wrong or weird about
>> what R is doing.  You can try
>>
>> update(AR1,control=lmeControl(msVerbose=TRUE))
>>
>> to see more detail on the optimization, but you will have
>> to dig into Pinheiro & Bates 2000 and/or the code and
>> documentation for details on the internal parameterizations
>> of rho in order to interpret the results ...
>>
>> ###
>> R Code:
>>
>> SAS Code:
>>
>> proc mixed;
>> class Subject Day;
>> model Response = Day / outp=pout;
>> repeated Day / subject = Subject type=AR(1);
>> run;
>>
>>
>> SAS Results:
>>
>>
>> Model Information
>>
>> Data Set WORK.ALLDATA
>> Dependent Variable Response
>> Covariance Structure Autoregressive
>> Subject Effect Subject
>> Estimation Method REML
>> Residual Variance Method Profile
>> Fixed Effects SE Method Model-Based
>> Degrees of Freedom Method Between-Within
>>
>>
>> Class Level Information
>>
>> Class Levels Values
>> Subject 10 1 10 2 3 4 5 6 7 8 9
>> Day 4 1 2 3 4
>>
>>
>> Dimensions
>>
>> Covariance Parameters 2
>> Columns in X 5
>> Columns in Z 0
>> Subjects 10
>> Max Obs Per Subject 4
>>
>>
>> Number of Observations
>>
>> Number of Observations Read 40
>> Number of Observations Used 40
>> Number of Observations Not Used 0
>>
>>
>> Iteration History
>>
>> Iteration Evaluations -2 Res Log Like Criterion
>> 0 1 14.67045653
>> 1 2 11.63168913 0.00000018
>> 2 1 11.63168429 0.00000000
>>
>>
>> Convergence criteria met.
>>
>>
>>
>> Covariance Parameter Estimates
>>
>> Cov Parm Subject Estimate
>> AR(1) Animal1 0.3389
>> Residual 0.06862
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
> 
>