[R] data order affects glmmPQL
Jack Tanner
ihok at hotmail.com
Thu Jan 12 02:41:34 CET 2006
>From: Spencer Graves The correlation between the predictions from your
>two model fits is 0.95. This suggests to me that the differences between
>the two sets of answers have little practical importance, and anyone who
>disagrees may be trying to read more from the results than can actually be
>supported by the data. It should be fairly easy to select the apparent
>"best" from among several such answers being the one that had a higher
>log(likelihood). This pushes me to prefer "fit.bar" with a log(likelihood)
>of -32.31 to "fit.foo" with -33.05.
>
> I agree that the differences are somewhat disturbing, but you are
>dealing with the output from an iterative solution of a notoriously
>difficult problem, and the standard wisdom is that it is wise to try
>several sets of starting values. By modifying the order of the
>observations in the data.frame, you have effectively done that.
Spencer, thank you for setting my mind at ease. Still, I suspect there's a
bug here, as the convergence procedure halts entirely when I sort the data
yet another way. See
http://article.gmane.org/gmane.comp.lang.r.general/53559 .
Also, I wonder if it's appropriate to simply cherry-pick a model based on
logLik, since there's no final test that of goodness of fit that happens on
independent data after one has picked a model in this way.
More information about the R-help
mailing list