[R] data order affects glmmPQL
Spencer Graves
spencer.graves at pdf.com
Wed Jan 11 04:23:45 CET 2006
The correlation between the predictions from your two model fits is
0.95. This suggests to me that the differences between the two sets of
answers have little practical importance, and anyone who disagrees may
be trying to read more from the results than can actually be supported
by the data. It should be fairly easy to select the apparent "best"
from among several such answers being the one that had a higher
log(likelihood). This pushes me to prefer "fit.bar" with a
log(likelihood) of -32.31 to "fit.foo" with -33.05.
I agree that the differences are somewhat disturbing, but you are
dealing with the output from an iterative solution of a notoriously
difficult problem, and the standard wisdom is that it is wise to try
several sets of starting values. By modifying the order of the
observations in the data.frame, you have effectively done that.
You could probably reduce these differences by creating a local copy
of "glmmPQL" and modifying the code so a user could provide starting
values and so you could restart the iteration for either fit with the
results of the other, possibly iterating to tighter tolerances.
hope this helps.
spencer graves
p.s. To got this correlation, I modified the latter portion of your
script as follows:
fit.foo <- fit.model(foo, poisson)
o.bar <- order(id, score, test, coder)
bar <- foo[o.bar,] # Reorder data frame
fit.bar <- fit.model(bar, poisson)
predict.foo <- predict(fit.foo)
predict.bar <- predict(fit.bar)
cor(predict.foo[o.bar], predict.bar)
[1] 0.9544287
Jack Tanner wrote:
> Is it to be expected that the way a data frame is sorted should affect the
> model fit by glmmPQL?
>
> Example:
>
> library(MASS)
> library(nlme)
>
> fit.model <- function(il, model.family) {
> cs <- Initialize(corSymm(form=~1|id), data=il)
> glmmPQL(score~test+coder, random=~1|id, #
> control=lmeControl(msMaxIter=100),
> family=model.family, data=il, correlation=cs)
> }
>
> score <-
> c(1,8,1,3,4,4,2,5,3,6,0,3,1,5,0,5,1,11,1,2,4,5,2,4,1,6,1,2,8,16,5,16,3,15,3,12,4,9,2,4,1,8,2,6,4,11,2,9,3,17,2,6)
> id <- rep(1:13, rep(4, 13))
> test <- gl(2, 1, length(score), labels=c("pre", "post"))
> coder <- gl(2, 2, length(score), labels=c("two", "three"))
>
> foo <- data.frame(id, score, test, coder) # Define data frame
> print(summary(fit.model(foo, poisson)))
>
> bar <- foo[order(id, score, test, coder),] # Reorder data frame
> print(summary(fit.model(bar, poisson)))
>
> The two summaries are clearly different. Is this to be expected? Is there a
> canonical way one should order a data frame before passing it to glmmPQL?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list