[R-sig-ME] corAR1 question

Thu Aug 3 19:40:35 CEST 2017

Dear list,

After some searching  I think I know which coding of the time covariate 
corAR1 expects ... and where the strange results come from in case a 
coding system like 2, 4, 6, 8 is used.

For anyone interested, here are my findings.

The thing is that in Pinheiro and Bates, it is not very explicitly told 
(and for anywhone familiar with AR1 models, this might not be necessary 
at all...) how the time variable one uses has to coded, in case of corAR1.

Suppose we have 4 time points. Then one would normally code: 1, 2, 3, 4 
(or 1, 2, 4 if a person has a missing for the third time point). One 
could also use 10,11,12,13 as time values, this would lead to exactly 
the same results. Note that adjacent integer values must be used, no 
gaps in between are allowed.

However, with the "wave2" values 2, 4, 6, 8 for each person, there are 
no observations in the data for the intermediate timepoints 3, 5, 7! And 
this probably causes a problem and unexpected results like a different 
loglikelihood and a correlation phi1=0.

My initial idea was that with the time intervals being twice as large 
for "wave2" ("wave2" is coded as 2, 4, 6, 8, whereas  "wave" was coded 
as 1, 2, 3, 4) than for "wave", the only difference between using 
"wave2" and "wave" in

correlation=corAR1(form = ~ wave|id)

versus

correlation=corAR1(form = ~ wave|id)

would be another value for the estimated correlation phi, but that both 
specification would have the same loglikelihood and fixed effects (only 
the intercept in my example). Now this is exactly what is found when 
running:

correlation=corCAR1(form = ~ wave2|id)

With this continuous-time corCAR1 specification, the loglikelihood is 
exactly equal to the one of the corAR1(form = ~wave|id) and what is 
more: the estimated phi value equals the square root of the phi value 
corAR1(form = ~wave|id). And this makes sense! If "wave3" would be coded 
as 3, 6, 9, 12
then corCAR1(form= ~wave3|id) would again lead to the same loglikelihood 
as for corAR1(form=~wave|id) and the phi value now equals the third root 
of the phi in corAR1(form = ~wave|id).

Kind regards,

Ben.

On 3-8-2017 14:07, Ben Pelzer wrote:
> Sorry list, stupid me, again a mistake, I've been working too long I 
> guess....
> Below, I'll give it a new try....
>
> The frequency table of "wave"  was again incorrect, as it should have 
> n=35 for each of its values 1, 2, 3, 4.
>
> Thansk for your patience.
>
> *-----------------------------
>
> Dear list,
>
> In dataframe "opposites" there are data from 35 individuals observed at
> each of 4 consecutive timepoints, leading to 140 cases in total. The
> time variable is called "wave", which takes values  1, 2, 3 and 4 for
> each individual.
>
> With the folowing syntax, I've tried to estimate a model with the corr.
> matrix of the residuals having a AR1 structure:
>
> opposites <-
> read.table("https://stats.idre.ucla.edu/stat/r/examples/alda/data/opposites_pp.txt",header=TRUE,sep=",") 
>
> library(nlme)
>
> table(opposites$wave)
> auto1 <- gls(opp~1,opposites, correlation=corAR1(form = ~ wave|id),
> method="REML")
> summary(auto1)
> corMatrix(auto1$modelStruct$corStruct)[[5]]
>
> This seems to work fine, and produces as output:
>
>> table(opposites$wave)
>   1  2  3  4
> 35 35 35 35
>
>> auto1 <- gls(opp~1,opposites, correlation=corAR1(form = ~ wave|id), 
> method="REML") > summary(auto1)
> Generalized least squares fit by REML
>    Model: opp ~ 1
>    Data: opposites
>         AIC      BIC    logLik
>    1405.478 1414.282 -699.7392
>
> Correlation Structure: AR(1)
>   Formula: ~wave | id
>   Parameter estimate(s):
>      Phi
> 0.75515
>
> Coefficients:
>                 Value Std.Error  t-value p-value
> (Intercept) 205.0916  7.153622 28.66962       0
>
> Standardized residuals:
>           Min           Q1          Med           Q3          Max
> -2.561070599 -0.666429529 -0.001817216  0.518961088  2.081296002
>
> Residual standard error: 50.40533
> Degrees of freedom: 140 total; 139 residual
>
>> corMatrix(auto1$modelStruct$corStruct)[[5]]
>            [,1]      [,2]      [,3]      [,4]
> [1,] 1.0000000 0.7551500 0.5702515 0.4306254
> [2,] 0.7551500 1.0000000 0.7551500 0.5702515
> [3,] 0.5702515 0.7551500 1.0000000 0.7551500
> [4,] 0.4306254 0.5702515 0.7551500 1.0000000
>
>
> Next, I multiplied the "wave" variable by 2 to try to understand which
> influence this would have on the estimate of phi parameter. This
> resulted in:
>
>> opposites$wave2 <- 2*opposites$wave > table(opposites$wave2)
>   2  4  6  8
> 35 35 35 35
>
>> auto2 <- gls(opp~1,opposites, correlation=corAR1(form = ~ wave2|id), 
> method="REML") > summary(auto2)
> Generalized least squares fit by REML
>    Model: opp ~ 1
>    Data: opposites
>       AIC      BIC    logLik
>    1473.5 1482.303 -733.7498
>
> Correlation Structure: ARMA(1,0)
>   Formula: ~wave2 | id
>   Parameter estimate(s):
> Phi1
>     0
>
> Coefficients:
>                 Value Std.Error  t-value p-value
> (Intercept) 204.8143  3.940234 51.98023       0
>
> Standardized residuals:
>           Min           Q1          Med           Q3          Max
> -2.762981486 -0.714569461  0.003983449  0.567028639  2.256164210
>
> Residual standard error: 46.62148
> Degrees of freedom: 140 total; 139 residual
>> corMatrix(auto2$modelStruct$corStruct)[[5]]
>       [,1] [,2] [,3] [,4]
> [1,]    1    0    0    0
> [2,]    0    1    0    0
> [3,]    0    0    1    0
> [4,]    0    0    0    1
>
> My question is about this second output with wave2 as the time
> indicator. I do not see why the Phi1 estimate has now changed to zero,
> as if there would be no residual correlation at all between the time
> points. I must admit that I was not sure which value to expect for phi1.
>
> A related question is about using corAR1 in case the individuals in my
> data would have been observed at different timepoints. E.g. person A at
> day 1, 2, 3, 4 but person B at day 1 , 4, 5, 8. Can corAR1 still be used
> given such unequally spaced data? Some documents on the internet say
> that in such situation corCAR1 should be used? From Bates and Pinhiero's
> book it seems to me that corCAR1 is applicable in case of continuously
> code time points, like 1.7, 2.5 etc., while for integer coded time the
> corAR1 method can still be used (this apart form the fact that corAR1
> can also give negative phi values in contrast with corCAR1).
>
> Thanks for any help!
>
> Ben Pelzer.
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models