[R] data order affects glmmPQL

Thu Jan 12 02:41:34 CET 2006

>From: Spencer Graves 	  The correlation between the predictions from your 
>two model fits is 0.95.  This suggests to me that the differences between 
>the two sets of answers have little practical importance, and anyone who 
>disagrees may be trying to read more from the results than can actually be 
>supported by the data.  It should be fairly easy to select the apparent 
>"best" from among several such answers being the one that had a higher 
>log(likelihood).  This pushes me to prefer "fit.bar" with a log(likelihood) 
>of -32.31 to "fit.foo" with -33.05.
>
>	  I agree that the differences are somewhat disturbing, but you are 
>dealing with the output from an iterative solution of a notoriously 
>difficult problem, and the standard wisdom is that it is wise to try 
>several sets of starting values.  By modifying the order of the 
>observations in the data.frame, you have effectively done that.

Spencer, thank you for setting my mind at ease. Still, I suspect there's a 
bug here, as the convergence procedure halts entirely when I sort the data 
yet another way. See  
http://article.gmane.org/gmane.comp.lang.r.general/53559 .

Also, I wonder if it's appropriate to simply cherry-pick a model based on 
logLik, since there's no final test that of goodness of fit that happens on 
independent data after one has picked a model in this way.