[R-sig-ME] lme4, failure to converge with a range of optimisers, trust the fitted model anyway?

Sun Apr 5 02:41:47 CEST 2015

On Sat, Apr 04, 2015 at 07:36:04PM -0400, Ben Bolker wrote:
> > So, perhaps I could aggregate the individual level data to the
> > cluster level, and do without the random term for cluster? I mean,
> > calculate the proportions of yes in each cluster and use that as
> > the dependent variable.
> > 
> > This would, I assume, require that each cluster was given a weight 
> > that corresponded to the number of individuals in it - or I would
> > not be able to say anything about probabilities at the individual
> > level, right?
> 
>  I would say pretty much any time you can aggregate without losing
> information, you should, for ease of computation (this accords with
> the Murtaugh 2007 "Simplicity and complexity" paper that I cite here
> on a regular basis).  If all of your covariates are at the cluster
> level

[ Thanks for your advice, it's much appreciated. ]

The covariates are not all at the cluster level. Actually only one
covariate is at the cluster level, the others are at the country
level.

> then this reduces to a binomial GLM (you can account for
> overdispersion either by using a quasi-binomial model in glm() or by
> staying with glmer() and keeping the random effect of cluster (now an
> observation-level random effect).

Since I still have the country level to account for, I need to use glmer().

I have created an aggregated data set with a variable for the
proportion deprived=TRUE in each cluster, and a variable for the
number of cases in that cluster (to use as weight).

Perhaps this is not a question specific for mixed models, but what
family and link function would be appropriate now? This was the best I
could come up with:

library(car)
lm1 <- lmer(logit(Prop.deprived.in.cluster) ~ (1|Country) + QoG + GDPLog + Rural * KilledPerMillion5Log, data = my.small.df, weights = my.small.df$weight)

Linear mixed model fit by REML ['lmerMod']
Formula: logit(Prop.deprived.in.cluster) ~ (1 | Country) + QoG + GDPLog +  
    Rural * KilledPerMillion5Log
   Data: my.small.df
Weights: my.small.df$weight

REML criterion at convergence: 157709.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.9983 -0.5626 -0.0622  0.4371  6.1171 

Random effects:
 Groups   Name        Variance Std.Dev.
 Country  (Intercept)   1.768   1.33   
 Residual             139.292  11.80   
Number of obs: 38028, groups:  Country, 65

Fixed effects:
                              Estimate Std. Error t value
(Intercept)                    1.25542    1.90567    0.66
QoG                            0.39120    0.40850    0.96
GDPLog                        -0.51799    0.23031   -2.25
Ruralyes                       1.91545    0.02232   85.80
KilledPerMillion5Log           0.06190    0.28557    0.22
Ruralyes:KilledPerMillion5Log  0.35303    0.03728    9.47

Correlation of Fixed Effects:
            (Intr) QoG    GDPLog Rurlys KlPM5L
QoG          0.571                            
GDPLog      -0.989 -0.486                     
Ruralyes    -0.013  0.001  0.006              
KlldPrMll5L -0.013  0.239 -0.009  0.045       
Rrlys:KPM5L  0.004 -0.001  0.000 -0.519 -0.084

my.small.df is available here if someone wants to have a go at it

http://hansekbrand.se/code/my.small.df.RData (1.1 MB)