[R] glmmPQL "error" message (was 'data order affects glmmPQL')
spencer.graves at pdf.com
Thu Jan 12 04:03:08 CET 2006
1. The function "glmmPQL" is in the MASS package, as can be seen by
looking at the top line in the help file for "glmmPQL". To find the
maintainer, type 'help(package="MASS")'. The results say, "Maintainer:
Brian Ripley <ripley at stats.ox.ac.uk>".
2. It is generally NOT "appropriate to simply cherry-pick a model
based on logLik", as you suggested. However, your example does NOT
involve this issue, because you are making multiple attempts to fit the
same model to the same data set. With any iterative algorithm, it is
considered legitimate to try fitting the same model with the same data
with different starting values and select the one with the largest
log(likelihood), considering that all others had not adequately
converged. In this case, the algorithm runs and produces similar but
different answers when the order is changed. Since the model does not
seem to consider anything that would theoretically be affected by the
sort order, it seems to me that this is crudely equivalent to changing
the starting values, as I mentioned before. Therefore, I would consider
it quite legitimate to pick the fit with the highest logLik.
3. I agree it is disturbing when glmmPQL generates "Error in
lme.formula(fixed = zz ~ test + coder, random = ~1 | id, data =
list( : false convergence (8)". If it were my problem, I might make
local compies of glmmPQL and lme.formula and trace through the code line
by line using "debug" until I developed an idea about how I might change
the code to get it past this error and on to something close to
Hope this helps.
Jack Tanner wrote:
>> From: Spencer Graves The correlation between the predictions
>> from your two model fits is 0.95. This suggests to me that the
>> differences between the two sets of answers have little practical
>> importance, and anyone who disagrees may be trying to read more from
>> the results than can actually be supported by the data. It should be
>> fairly easy to select the apparent "best" from among several such
>> answers being the one that had a higher log(likelihood). This pushes
>> me to prefer "fit.bar" with a log(likelihood) of -32.31 to "fit.foo"
>> with -33.05.
>> I agree that the differences are somewhat disturbing, but you
>> are dealing with the output from an iterative solution of a
>> notoriously difficult problem, and the standard wisdom is that it is
>> wise to try several sets of starting values. By modifying the order
>> of the observations in the data.frame, you have effectively done that.
> Spencer, thank you for setting my mind at ease. Still, I suspect there's
> a bug here, as the convergence procedure halts entirely when I sort the
> data yet another way. See
> http://article.gmane.org/gmane.comp.lang.r.general/53559 .
> Also, I wonder if it's appropriate to simply cherry-pick a model based
> on logLik, since there's no final test that of goodness of fit that
> happens on independent data after one has picked a model in this way.
More information about the R-help