[R-sig-ME] Specifying a (simple?) two level model

Thu Jun 30 10:41:09 CEST 2011

Dear Hans,

I would rather fit (0 + cluster|country). 1 + cluster will use the first cluster as reference and then calculate the differences for the other clusters. Whereas 0+cluster will directly estimate the effect of each cluster. Therefore the variance-covariance matrix of the random effect will be easier to interpret.

However with 22 clusters, the variance-covariance matrix will be 22x22. Which is large and thus takes time to fit.

A second problem is that you have complete separation in your dataset: some clusters in some countries have only 0 or only 1. That creates numerical problems: logit(0) = -Inf and logit(1) = Inf

Best regards,

Thierry

> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
> bounces at r-project.org] Namens Hans Ekbrand
> Verzonden: donderdag 30 juni 2011 8:50
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] Specifying a (simple?) two level model
> 
> Hi this is my first post to the list. Am new to mixed models, but I think I have
> managed to specify my rather simple modelling problem correct. The problem I
> have is that the computation never seems to finish (I waited for 10 hours before
> giving up).
> 
> I am trying to model how risks of poverty vary with labour market position,
> while letting the effects of labour market position vary over countries.
> 
> Here is a sample of the dataset, if you want to try it out
> 
> > print(load(url("http://code.cjb.net/temp/pov.temp.RData")))
> [1] "poverty.risks"
> > str(poverty.risks)
> 'data.frame':	161348 obs. of  3 variables:
>  $ poverty.third.year: logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
>  $ country           : Factor w/ 22 levels "sweden","unitedkingdom",..: 1 1 1 1 1 1 1
> 1 1 1 ...
>  $ cluster           : Factor w/ 22 levels "Unemployed - Unemployed",..: 16 20 16 16
> 18 20 16 1 16 2 ...
> 
> Labour market position is a factor that summaries a history of labourmarket
> positions for three year, where "Unemploed - Unemployed"
> means that the individual was unemployed at time0 and at time1.
> 
> Here my specification:
> 
> my.fit <- glmer(poverty.third.year ~ cluster + (1 + cluster | country), family =
> binomial("logit"), data = poverty.risks)
> 
> I saw, in Bates Chapter 2, that you could split the random terms in (1 | cluster) +
> (1 | country). Also, am not sure wether or not to include cluster as fixed term. If
> I split the random terms and skip cluster as a fixed term, then the computation
> takes only a few seconds.
> 
> my.fit <- glmer(poverty.third.year ~ 1 + (1 | cluster) + (1 | country), family =
> binomial("logit"), data = poverty.risks)
> 
> summary(fit)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: poverty.third.year ~ 1 + (1 | cluster) + (1 | country)
>    Data: poverty.risks
>     AIC    BIC logLik deviance
>  103922 103952 -51958   103916
> Random effects:
>  Groups  Name        Variance Std.Dev.
>  cluster (Intercept) 0.54046  0.73516
>  country (Intercept) 0.17247  0.41530
> Number of obs: 161348, groups: cluster, 22; country, 22
> 
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -2.0760     0.1807  -11.49   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> My understanding, which I hope is wrong, is that this model does not compute
> country specific poverty risks for each cluster.
> 
> If the first model is the wright one for me, then for how long would it be
> reasonable to wait the computation to terminate?
> 
> --
> Hans Ekbrand <hans at sociologi.cjb.net>
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models