[R-sig-ME] Specifying a (simple?) two level model
ONKELINX, Thierry
Thierry.ONKELINX at inbo.be
Thu Jun 30 10:41:09 CEST 2011
Dear Hans,
I would rather fit (0 + cluster|country). 1 + cluster will use the first cluster as reference and then calculate the differences for the other clusters. Whereas 0+cluster will directly estimate the effect of each cluster. Therefore the variance-covariance matrix of the random effect will be easier to interpret.
However with 22 clusters, the variance-covariance matrix will be 22x22. Which is large and thus takes time to fit.
A second problem is that you have complete separation in your dataset: some clusters in some countries have only 0 or only 1. That creates numerical problems: logit(0) = -Inf and logit(1) = Inf
Best regards,
Thierry
> -----Oorspronkelijk bericht-----
> Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-
> bounces at r-project.org] Namens Hans Ekbrand
> Verzonden: donderdag 30 juni 2011 8:50
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] Specifying a (simple?) two level model
>
> Hi this is my first post to the list. Am new to mixed models, but I think I have
> managed to specify my rather simple modelling problem correct. The problem I
> have is that the computation never seems to finish (I waited for 10 hours before
> giving up).
>
> I am trying to model how risks of poverty vary with labour market position,
> while letting the effects of labour market position vary over countries.
>
> Here is a sample of the dataset, if you want to try it out
>
> > print(load(url("http://code.cjb.net/temp/pov.temp.RData")))
> [1] "poverty.risks"
> > str(poverty.risks)
> 'data.frame': 161348 obs. of 3 variables:
> $ poverty.third.year: logi FALSE FALSE FALSE FALSE TRUE FALSE ...
> $ country : Factor w/ 22 levels "sweden","unitedkingdom",..: 1 1 1 1 1 1 1
> 1 1 1 ...
> $ cluster : Factor w/ 22 levels "Unemployed - Unemployed",..: 16 20 16 16
> 18 20 16 1 16 2 ...
>
> Labour market position is a factor that summaries a history of labourmarket
> positions for three year, where "Unemploed - Unemployed"
> means that the individual was unemployed at time0 and at time1.
>
> Here my specification:
>
> my.fit <- glmer(poverty.third.year ~ cluster + (1 + cluster | country), family =
> binomial("logit"), data = poverty.risks)
>
> I saw, in Bates Chapter 2, that you could split the random terms in (1 | cluster) +
> (1 | country). Also, am not sure wether or not to include cluster as fixed term. If
> I split the random terms and skip cluster as a fixed term, then the computation
> takes only a few seconds.
>
> my.fit <- glmer(poverty.third.year ~ 1 + (1 | cluster) + (1 | country), family =
> binomial("logit"), data = poverty.risks)
>
> summary(fit)
> Generalized linear mixed model fit by the Laplace approximation
> Formula: poverty.third.year ~ 1 + (1 | cluster) + (1 | country)
> Data: poverty.risks
> AIC BIC logLik deviance
> 103922 103952 -51958 103916
> Random effects:
> Groups Name Variance Std.Dev.
> cluster (Intercept) 0.54046 0.73516
> country (Intercept) 0.17247 0.41530
> Number of obs: 161348, groups: cluster, 22; country, 22
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -2.0760 0.1807 -11.49 <2e-16 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> My understanding, which I hope is wrong, is that this model does not compute
> country specific poverty risks for each cluster.
>
> If the first model is the wright one for me, then for how long would it be
> reasonable to wait the computation to terminate?
>
> --
> Hans Ekbrand <hans at sociologi.cjb.net>
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list