[R] Optimisation and NaN Errors using clm() and clmm()
Thomas Foxley
thomasfoxley at aol.com
Thu Apr 18 18:38:22 CEST 2013
Rune,
Thank you very much for your response.
I don't actually have the models that failed to converge from the first
(glmulti) part as they were not saved with the confidence set. glmulti
generates thousands of models so it seems reasonable that a few of these
may not converge.
The clmm() model I provided was just an example - not all models have 17
parameters. There were only one or two that produced errors (the example
I gave being one of them), perhaps overparameterisation is the root of
the problem.
Regarding incomplete data - there are only 103 (of 314) records where I
have data for every predictor. The number of observations included will
obviously vary for different models, models with fewer predictors will
include more observations. glmulti acts as a wrapper for another
function, meaning (in this case) na's are treated as they would be in
clm(). Is there a way around this (apart from filling in the missing
data)? I believe its possible to limit model complexity in the glmulti
call - which may or may not increase the number of observations - how
would this affect interpretation of the results?
Thanks again,
Tom
On 16/04/13 07:54, Rune Haubo wrote:
> On 15 April 2013 13:18, Thomas <thomasfoxley at aol.com> wrote:
>> Dear List,
>>
>> I am using both the clm() and clmm() functions from the R package 'ordinal'.
>>
>> I am fitting an ordinal dependent variable with 5 categories to 9 continuous predictors, all of which have been normalised (mean subtracted then divided by standard deviation), using a probit link function. From this global model I am generating a confidence set of 200 models using clm() and the 'glmulti' R package. This produces these errors:
>>
>> /> model.2.10 <- glmulti(as.factor(dependent) ~ predictor_1*predictor_2*predictor_3*predictor_4*predictor_5*predictor_6*predictor_7*predictor_8*predictor_9, data = database, fitfunc = clm, link = "probit", method = "g", crit = aicc, confsetsize = 200, marginality = TRUE)
>> ...
>> After 670 generations:
>> Best model: as.factor(dependent)~1+predictor_1+predictor_2+predictor_3+predictor_4+predictor_5+predictor_6+predictor_8+predictor_9+predictor_4:predictor_3+predictor_6:predictor_2+predictor_8:predictor_5+predictor_9:predictor_1+predictor_9:predictor_4+predictor_9:predictor_5+predictor_9:predictor_6
>> Crit= 183.716706496392
>> Mean crit= 202.022138576506
>> Improvements in best and average IC have bebingo en below the specified goals.
>> Algorithm is declared to have converged.
>> Completed.
>> There were 24 warnings (use warnings() to see them)
>>> warnings()
>> Warning messages:
>> 1: optimization failed: step factor reduced below minimum
>> 2: optimization failed: step factor reduced below minimum
>> 3: optimization failed: step factor reduced below minimum/
>> etc.
>>
>>
>> I am then re-fitting each of the 200 models with the clmm() function, with 2 random factors (family nested within order). I get this error in a few of the re-fitted models:
>>
>> /> model.2.glmm.2 <- clmm(as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + predictor_3 + predictor_6 + predictor_7 + predictor_8 + predictor_9 + predictor_6:predictor_2 + predictor_7:predictor_2 + predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + predictor_9:predictor_2 + predictor_9:predictor_3 + predictor_9:predictor_6 + predictor_9:predictor_7 + predictor_9:predictor_8+ (1|order/family), link = "probit", data = database)
>>> summary(model.2.glmm.2)
>>>
>> Cumulative Link Mixed Model fitted with the Laplace approximation
>>
>> formula: as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + predictor_3 + predictor_6 + predictor_7 + predictor_8 + predictor_9 + predictor_6:predictor_2 + predictor_7:predictor_2 +
>> predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + predictor_9:predictor_2 +
>> predictor_9:predictor_3 + predictor_9:predictor_6 + predictor_9:predictor_7 + predictor_9:predictor_8 + (1 | order/family)
>> data: database
>>
>> link threshold nobs logLik AIC niter max.grad cond.H
>> probit flexible 103 -65.56 173.13 58(3225) 8.13e-06 4.3e+03
>>
>> Random effects:
>> Var Std.Dev
>> family:order 7.493e-11 8.656e-06
>> order 1.917e-12 1.385e-06
>> Number of groups: family:order 12, order 4
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> predictor_1 0.40802 0.78685 0.519 0.6041
>> predictor_2 0.02431 0.26570 0.092 0.9271
>> predictor_3 -0.84486 0.32056 -2.636 0.0084 **
>> predictor_6 0.65392 0.34348 1.904 0.0569 .
>> predictor_7 0.71730 0.29596 2.424 0.0154 *
>> predictor_8 -1.37692 0.75660 -1.820 0.0688 .
>> predictor_9 0.15642 0.28969 0.540 0.5892
>> predictor_2:predictor_6 -0.46880 0.18829 -2.490 0.0128 *
>> predictor_2:predictor_7 4.97365 0.82692 6.015 1.80e-09 ***
>> predictor_3:predictor_7 -1.13192 0.46639 -2.427 0.0152 *
>> predictor_2:predictor_8 -5.52913 0.88476 -6.249 4.12e-10 ***
>> predictor_1:predictor_9 4.28519 NA NA NA
>> predictor_2:predictor_9 -0.26558 0.10541 -2.520 0.0117 *
>> predictor_3:predictor_9 -1.49790 NA NA NA
>> predictor_6:predictor_9 -1.31538 NA NA NA
>> predictor_7:predictor_9 -4.41998 NA NA NA
>> predictor_8:predictor_9 3.99709 NA NA NA
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> Threshold coefficients:
>> Estimate Std. Error z value
>> 0|1 -0.2236 0.3072 -0.728
>> 1|2 1.4229 0.3634 3.915
>> (211 observations deleted due to missingness)
>> Warning message:
>> In sqrt(diag(vc)[1:npar]) : NaNs produced/
>>
> This warning is due to a (near) singular variance-covariance matrix of
> the model parameters, which in turn is due to the fact that the model
> converged to a boundary solution: both random effects variance
> parameters are zero. If you exclude the random terms and refit the
> model with clm, the variance-covariance matrix will probably be well
> defined and standard errors can be computed.
>
> Another thing is that you are fitting 17 regression parameters and 2
> random effect terms (which in the end do not count) to only 103
> observations. I would be worried about overfitting or perhaps even
> non-fitting. I think I would also be concerned about the 211
> observations that are incomplete, and I would be careful with
> automatic model selection/averaging etc. on incomplete data (though I
> don't know how/if glmulti actually deals with that).
>
>> I have tried a number of different approaches, each has its own problems. I have fixed these using various suggestions from online forums (eg https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q1/015328.html, https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/016165.html) and this is as good as I can get it.
>>
>> After the first stage (generating the model set with glmulti) I tested every model in the confidence set individually - there were no errors - but there was clearly a problem during the model selection process. Should I be worried?
> I don't know - I don't use glmulti or automatic model selection
> regularly, so I don't know what the consequences might be.
>
> The question seems to be what caused the potential non-convergences
> for some of the models that were not chosen. If they didn't converge
> because the models are not identifiable, then I suppose all is ok, but
> if they are relevant models that should have converged, then there
> might be a problem. However, if a model does not converge, there is
> usually a good reason for it, so I am not particularly worried that
> there are relevant models among those that did not converge. Without
> considering a particular model, it is hard to tell why it might not
> have converged, but if you can pinpoint the models that trigger the
> warnings/errors, I would be happy to take further look at them.
>
> Hope this helps,
> Rune
>
>> No errors appear in the top 5% of re-fitted models (which are the only ones I will be using) however I am concerned that errors may be indicative of a problem with my approach.
>>
>> A further worry is that the errors might be removing models that could otherwise be included.
>>
>>
>> Any help would be much appreciated.
>>
>> Tom
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list