[R] data order affects glmmPQL

Wed Jan 11 04:23:45 CET 2006

	  The correlation between the predictions from your two model fits is 
0.95.  This suggests to me that the differences between the two sets of 
answers have little practical importance, and anyone who disagrees may 
be trying to read more from the results than can actually be supported 
by the data.  It should be fairly easy to select the apparent "best" 
from among several such answers being the one that had a higher 
log(likelihood).  This pushes me to prefer "fit.bar" with a 
log(likelihood) of -32.31 to "fit.foo" with -33.05.

	  I agree that the differences are somewhat disturbing, but you are 
dealing with the output from an iterative solution of a notoriously 
difficult problem, and the standard wisdom is that it is wise to try 
several sets of starting values.  By modifying the order of the 
observations in the data.frame, you have effectively done that.

	  You could probably reduce these differences by creating a local copy 
of "glmmPQL" and modifying the code so a user could provide starting 
values and so you could restart the iteration for either fit with the 
results of the other, possibly iterating to tighter tolerances.

	  hope this helps.
	  spencer graves
p.s.  To got this correlation, I modified the latter portion of your 
script as follows:

fit.foo <- fit.model(foo, poisson)
o.bar <- order(id, score, test, coder)
bar <- foo[o.bar,] # Reorder data frame
fit.bar <- fit.model(bar, poisson)
predict.foo <- predict(fit.foo)
predict.bar <- predict(fit.bar)

cor(predict.foo[o.bar], predict.bar)
[1] 0.9544287

Jack Tanner wrote:

> Is it to be expected that the way a data frame is sorted should affect the 
> model fit by glmmPQL?
> 
> Example:
> 
> library(MASS)
> library(nlme)
> 
> fit.model <- function(il, model.family) {
>   cs <- Initialize(corSymm(form=~1|id), data=il)
>   glmmPQL(score~test+coder, random=~1|id, # 
> control=lmeControl(msMaxIter=100),
>           family=model.family, data=il, correlation=cs)
> }
> 
> score <- 
> c(1,8,1,3,4,4,2,5,3,6,0,3,1,5,0,5,1,11,1,2,4,5,2,4,1,6,1,2,8,16,5,16,3,15,3,12,4,9,2,4,1,8,2,6,4,11,2,9,3,17,2,6)
> id <- rep(1:13, rep(4, 13))
> test <- gl(2, 1, length(score), labels=c("pre", "post"))
> coder <- gl(2, 2, length(score), labels=c("two", "three"))
> 
> foo <- data.frame(id, score, test, coder) # Define data frame
> print(summary(fit.model(foo, poisson)))
> 
> bar <- foo[order(id, score, test, coder),] # Reorder data frame
> print(summary(fit.model(bar, poisson)))
> 
> The two summaries are clearly different. Is this to be expected? Is there a 
> canonical way one should order a data frame before passing it to glmmPQL?
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html