[R] glmmPQL "error" message (was 'data order affects glmmPQL')

Spencer Graves spencer.graves at pdf.com
Thu Jan 12 04:03:08 CET 2006

	  1.  The function "glmmPQL" is in the MASS package, as can be seen by 
looking at the top line in the help file for "glmmPQL".  To find the 
maintainer, type 'help(package="MASS")'.  The results say, "Maintainer: 
Brian Ripley <ripley at stats.ox.ac.uk>".

	  2.  It is generally NOT "appropriate to simply cherry-pick a model 
based on logLik", as you suggested.  However, your example does NOT 
involve this issue, because you are making multiple attempts to fit the 
same model to the same data set.  With any iterative algorithm, it is 
considered legitimate to try fitting the same model with the same data 
with different starting values and select the one with the largest 
log(likelihood), considering that all others had not adequately 
converged.  In this case, the algorithm runs and produces similar but 
different answers when the order is changed.  Since the model does not 
seem to consider anything that would theoretically be affected by the 
sort order, it seems to me that this is crudely equivalent to changing 
the starting values, as I mentioned before.  Therefore, I would consider 
it quite legitimate to pick the fit with the highest logLik.

	  3.  I agree it is disturbing when glmmPQL generates "Error in 
lme.formula(fixed = zz ~ test + coder, random = ~1 | id, data =
list( :  false convergence (8)".  If it were my problem, I might make 
local compies of glmmPQL and lme.formula and trace through the code line 
by line using "debug" until I developed an idea about how I might change 
the code to get it past this error and on to something close to 

	  Hope this helps.
	  spencer graves

Jack Tanner wrote:

>> From: Spencer Graves       The correlation between the predictions 
>> from your two model fits is 0.95.  This suggests to me that the 
>> differences between the two sets of answers have little practical 
>> importance, and anyone who disagrees may be trying to read more from 
>> the results than can actually be supported by the data.  It should be 
>> fairly easy to select the apparent "best" from among several such 
>> answers being the one that had a higher log(likelihood).  This pushes 
>> me to prefer "fit.bar" with a log(likelihood) of -32.31 to "fit.foo" 
>> with -33.05.
>>       I agree that the differences are somewhat disturbing, but you 
>> are dealing with the output from an iterative solution of a 
>> notoriously difficult problem, and the standard wisdom is that it is 
>> wise to try several sets of starting values.  By modifying the order 
>> of the observations in the data.frame, you have effectively done that.
> Spencer, thank you for setting my mind at ease. Still, I suspect there's 
> a bug here, as the convergence procedure halts entirely when I sort the 
> data yet another way. See  
> http://article.gmane.org/gmane.comp.lang.r.general/53559 .
> Also, I wonder if it's appropriate to simply cherry-pick a model based 
> on logLik, since there's no final test that of goodness of fit that 
> happens on independent data after one has picked a model in this way.

More information about the R-help mailing list