[R-sig-ME] Repeated measures mixed model with AR(1) correlation structure in nlme vs SAS Proc Mixed
Ben Bolker
bbolker at gmail.com
Mon Mar 28 03:23:00 CEST 2011
Aha!
In the specification described here, R includes a G-side random effect
of subject. (lme cannot handle models with no G-side effect.)
gls(Response~Time,
data=sasdata, na.action = (na.omit), method = "REML",
correlation=corAR1(form=~1|Subject))
gives the same answers as SAS, both for the point estimate and for the
likelihood profile.
Ben Bolker
On 11-03-27 06:57 PM, Jim Baldwin wrote:
>
> I agree the essential issue is as you state.
>
> PROC MIXED has "Columns in X: 5" meaning an intercept and a
> value for each one of the time periods. The fixed effect for the last
> time period "Day = 6" is (internally) set to zero resulting in just 4
> values to
> estimate. (Using the NOINT - for no intercept - option results in
> "Columns in X: 4" and the same exact variance components.)
> The "Columns in Z: 0" has to do with PROC MIXED having no
> G-side random effects and only R-side random effects for this
> particular example. So the 2 variance components (rho and
> sigma) are modeled in E in the formulation
>
> Y = X*beta + Z*gamma + E
>
> Jim
>
> Jim Baldwin
> Station Statistician
> Pacific Southwest Research Station
> USDA Forest Service
> Albany, California
> 510-559-6332
>
>
> *Ben Bolker <bbolker at gmail.com>*
>
> 03/27/2011 03:00 PM
>
>
> To
> Jim Baldwin <jbaldwin at fs.fed.us>
> cc
> r-sig-mixed-models at r-project.org
> Subject
> Re: [R-sig-ME] Repeated measures mixed model with AR(1) correlation
> structure in nlme vs SAS Proc Mixed
>
>
>
>
>
>
>
>
> This makes a really nice comparison, thank you.
>
> It makes me think that there is some difference that we haven't yet
> figured out in the way that lme and PROC MIXED are defining the model,
> although nothing springs to mind, although I can't see what it would be.
>
> As far as I can tell, both are fitting a fixed effect of Response ~
> Day where Day is a categorical predictor; both are treating Subject as a
> random effect (affecting the intercept only); both are assuming AR(1)
> autocorrelation in the residuals, applying only within-subject; both are
> using REML ... ?
>
> I agree with your points about taking account of the gap between
> time=3 and time=6 -- but I think the main point here is the narrower one
> of "what are SAS and R doing differently for this particular (even if
> sub-optimal) model"?
>
> What is the meaning of "Columns in X 5; Columns in Z 0" in the SAS
> output; I would have thought X (fixed effect design matrix) would have
> 4 columns, Z (random effect design matrix) would have 10 ... ?
>
> On 11-03-27 03:13 PM, Jim Baldwin wrote:
>>
>> I've added equivalent loop in SAS to obtain the equivalent output as
>> Prof. Bolker has below in R. The SAS code is as follows:
>>
>>
>> *%macro**/loop/*;
>> %doi=*0*%to*100*;
>> %letrho = %SYSEVALF(0.2 + 0.2*&i/100);
>> proc mixed data=sasdata method=REML itdetails;
>> class Subject Day;
>> model Response = Day / solution;
>> ** Set initial values for variance components and*
>> * hold the correlation coefficient to a constant;*
>> parms (&rho) (*0.06330649*) / hold=*1*;
>> repeated / subject = Subject type=ar(*1*);
>> ** Keep a copy of the fit statistics;*
>> ods output fitstatistics=fs1;
>> run;
>> ** Grab just what is needed from the fit statistics dataset;*
>> data fs1;
>> set fs1;
>> keep rho M2LogL;
>> if descr = "-2 Res Log Likelihood";
>> M2LogL = value;
>> rho = ρ
>> run;
>> ** Merge with overall summary;*
>> data fs;
>> set fs fs1;
>> keep M2LogL rho;
>> if rho = *.*then delete;
>> run;
>>
>> %end;
>> *%mend*loop;
>>
>> ** Initialize dataset to hold rho and -2*logL;*
>> *data*fs; *run*;
>>
>> ** Run through loop and print out results;*
>> %*/loop/*;
>> *proc**print*data=fs;
>> *run*;
>>
>>
>> The resulting values are
>>
>> M2LogL =
> c(12.1794,12.1641,12.1491,12.1342,12.1196,12.1051,12.0909,12.0769,
>>
>>
> 12.0631,12.0494,12.036,12.0229,12.0099,11.9971,11.9845,11.9722,11.96,11.9481,
>>
>>
>>
> 11.9364,11.9249,11.9136,11.9025,11.8916,11.881,11.8706,11.8603,11.8503,11.8405,
>>
>>
>>
> 11.831,11.8216,11.8125,11.8036,11.7949,11.7864,11.7781,11.7701,11.7623,11.7547,
>>
>>
>>
> 11.7473,11.7402,11.7332,11.7265,11.72,11.7138,11.7078,11.7019,11.6964,11.691,
>>
>>
>>
> 11.6859,11.681,11.6763,11.6719,11.6677,11.6637,11.6599,11.6564,11.6531,11.6501,
>>
>>
>>
> 11.6473,11.6447,11.6423,11.6402,11.6383,11.6366,11.6352,11.634,11.6331,11.6324,
>>
>>
>>
> 11.6319,11.6317,11.6317,11.632,11.6325,11.6332,11.6342,11.6354,11.6369,11.6386,
>>
>>
>>
> 11.6405,11.6427,11.6452,11.6478,11.6508,11.654,11.6574,11.6611,11.665,11.6692,
>>
>>
>>
> 11.6736,11.6783,11.6832,11.6884,11.6939,11.6996,11.7055,11.7117,11.7182,11.7249,
>>
>> 11.7319,11.7391,11.7466)
>>
>> rho = 0.2 + 0.2*c(0:100)/100
>>
>> and these can be added to the last plot from below as
>>
>> lines(rho,M2LogL,col="red")
>>
>> One can then see after adding in this curve that the -2 log L values are
>> nearly idential above the estimate given by PROC MIXED but become more
>> and more different when farther from that value.
>>
>> Jim
>>
>> Jim Baldwin
>> Station Statistician
>> Pacific Southwest Research Station
>> USDA Forest Service
>> Albany, California
>> 510-559-6332
>>
>> P.S. (This is probably more SAS code than some (all?) of you would have
>> ever
>> wanted to see in your life but such adversity maybe builds character?)
>>
>>
>>
>> *Ben Bolker <bbolker at gmail.com>*
>> Sent by: r-sig-mixed-models-bounces at r-project.org
>>
>> 03/27/2011 09:25 AM
>>
>>
>> To
>> r-sig-mixed-models at r-project.org
>> cc
>>
>> Subject
>> Re: [R-sig-ME] Repeated measures mixed model with
> AR(1) correlation
>> structure in nlme vs SAS Proc Mixed
>>
>>
>>
>>
>>
>>
>>
>>
>> Simon Bate <simontbate at ...> writes:
>>
>>> Hi all, I don't know if anyone has any thoughts on this, I suspect
>>> it is a local convergence issue but am not sure. I have been trying
>>> to move from SAS Proc Mixed to R nlme and have an unusual result. I
>>> have several subjects measured at four time points. I want to model
>>> the within-subject correlation using an autoregressive
>>> structure. I've attached the R and SAS code I'm using along with the
>>> results from SAS. With R lme I get an estimate of the
>>> autoregressive parameter phi = 0.2782601, whereas SAS gives me an
>>> estimate of 0.3389 Intriguingly if I include a between subject
>>> factor, a covariate or delete one of the observations, then the
>>> results appear to agree. I'm surprised the seemingly simpler model
>>> if different between the two implementations whereas the more
>>> complex models agree. Any ideas would be most welcome! Regards,
>>> Simon
>>
>> Hmmm. I do agree it could be a convergence problem of some sort.
>> I agree that it's worth trying to figure out what the difference
>> is, but I think it's going to be hard. I went through some basic
>> sanity checking with R (see below) and don't see an obvious problem.
>>
>> library(nlme)
>> sasdata <- data.frame(Response=c(0.55,0.86,0.21,0.36,0.46,
>> 0.32,0.11,0.24,0.36,0.29,0.48,0.93,
>> 0.56,0.67,0.36,0.55,0.51,0.4,0.34,0.51,
>> 1,0.61,0.65,0.41,0.99,0.86,0.64,0.86,0.31,
>> 0.19,0.21,0.36,0.41,0.47,0.16,0.81,0.9,0.72,0.87,0.02),
>> Subject=c(1,1,1,1,2,2,2,2,3,3,3,3,
>> 4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,
>> 8,8,8,8,9,9,9,9,10,10,10,10),
>> ## or: rep(1:10,each=4)
>> Day=c(1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,
>> 1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6,1,2,4,6))
>> ## or: rep(c(1,2,4,6),4)
>>
>> sasdata$Time<-factor(sasdata$Day)
>>
>> fit0<-lme(Response~Time, random=~1|Subject,
>> data=sasdata, na.action = (na.omit), method = "REML")
>>
>> plot(ACF(fit0),alpha=0.05,grid=TRUE)
>> ## it actually looks like AR1 is not a very good model here
>> ## anyway?
>> AR1<- update(fit0,
>> correlation=corAR1(form=~as.numeric(Time)|Subject,
>> fixed =FALSE))
>>
>> plot(ACF(AR1),alpha=0.05,grid=TRUE)
>> ## ACF(AR1) shows that the raw rho(1) is about 0.2
>>
>> ## identical (based on order of observations within groups)
>> AR1B<- update(fit0,
>> correlation=corAR1(form=~1|Subject,
>> fixed =FALSE))
>>
>>
>> ## construct likelihood profile on rho
>> rhovec <- seq(-0.99,0.99,by=0.01)
>> Lvec <- sapply(rhovec,
>> function(r) {
>> logLik(update(fit0,
>> correlation=corAR1(form=~1|Subject,
>> value=r,
>> fixed =TRUE)))
>> })
>>
>> plot(rhovec,-2*Lvec,type="b")
>> abline(h=-2*logLik(AR1)+3.84,lty=2) ## LRT cutoff
>> abline(v=c(-0.53,0.82),lty=2)
>> ## CIs on rho from intervals() -- reasonable ballpark
>>
>>
>> ## zoom in
>> plot(rhovec,-2*Lvec,type="b",ylim=c(11.5,15),xlim=c(-0.5,0.7))
>> abline(v=c(0.278,0.339)) ## SAS/R answers
>>
>> ## zoom in further
>> plot(rhovec,-2*Lvec,type="b",ylim=c(11.6,11.8),xlim=c(0.2,0.4),
>> las=1,bty="l")
>> abline(v=c(0.278,0.339),lty=2)
>> text(c(0.278,0.339),rep(par("usr")[4],2),c("lme","SAS"),
>> pos=3,xpd=NA)
>> abline(h=11.63168,lty=2)
>> text(par("usr")[2],11.63168,"SAS",pos=1,xpd=NA)
>>
>> So ... I don't see anything obviously wrong or weird about
>> what R is doing. You can try
>>
>> update(AR1,control=lmeControl(msVerbose=TRUE))
>>
>> to see more detail on the optimization, but you will have
>> to dig into Pinheiro & Bates 2000 and/or the code and
>> documentation for details on the internal parameterizations
>> of rho in order to interpret the results ...
>>
>> ###
>> R Code:
>>
>> SAS Code:
>>
>> proc mixed;
>> class Subject Day;
>> model Response = Day / outp=pout;
>> repeated Day / subject = Subject type=AR(1);
>> run;
>>
>>
>> SAS Results:
>>
>>
>> Model Information
>>
>> Data Set WORK.ALLDATA
>> Dependent Variable Response
>> Covariance Structure Autoregressive
>> Subject Effect Subject
>> Estimation Method REML
>> Residual Variance Method Profile
>> Fixed Effects SE Method Model-Based
>> Degrees of Freedom Method Between-Within
>>
>>
>> Class Level Information
>>
>> Class Levels Values
>> Subject 10 1 10 2 3 4 5 6 7 8 9
>> Day 4 1 2 3 4
>>
>>
>> Dimensions
>>
>> Covariance Parameters 2
>> Columns in X 5
>> Columns in Z 0
>> Subjects 10
>> Max Obs Per Subject 4
>>
>>
>> Number of Observations
>>
>> Number of Observations Read 40
>> Number of Observations Used 40
>> Number of Observations Not Used 0
>>
>>
>> Iteration History
>>
>> Iteration Evaluations -2 Res Log Like Criterion
>> 0 1 14.67045653
>> 1 2 11.63168913 0.00000018
>> 2 1 11.63168429 0.00000000
>>
>>
>> Convergence criteria met.
>>
>>
>>
>> Covariance Parameter Estimates
>>
>> Cov Parm Subject Estimate
>> AR(1) Animal1 0.3389
>> Residual 0.06862
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>
>
More information about the R-sig-mixed-models
mailing list