[R-sig-ME] code for multiple membership models?

Mon Jun 27 21:55:36 CEST 2011

My apologies for the delay on responding to this question and to a
similar question off-list by Jeroen Ooms.  I have been immersed in
another development effort which ultimately will feed back into lme4
and, unfortunately, I don't multiplex well.

The basic approach to multiple membership models is to build the model
from one of the factors representing the membership, such as area
here, but not fit it - just return the numeric representation.
Specifying doFit=FALSE in a call to glmer in the lme4a package
provides the raw structure without doing the optimization.  Inside
that structure is the sparse matrix Z.  You then add the sparse
matrices from the other factors representing the membership,
recalculate the Cholesky factor L (because the structure of Z has
changed) and go on with the optimization.

I enclose a first cut at this.  I didn't use weights so the results
are not directly comparable to those in the manual from the Multilevel
Modeling group.

On Fri, Jun 24, 2011 at 7:18 AM, Malcolm Fairbrother
<m.fairbrother at bristol.ac.uk> wrote:
> Dear Doug, Jarrod, and/or perhaps others,
>
> A couple posters have previously asked about multiple membership models, and Doug has said he could provide code to fit such models (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q2/003860.html), if given some sample data.
>
> Jarrod has similarly suggested that MCMCglmm can handle multiple membership (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/006195.html), though I can't find any reference to multiple membership models in the MCMCglmm documentation. (Maybe I missed it, however.)
>
> Here's is a simple sample dataset (Scottish Lip Cancer):
> library(foreign); lips <- read.dta("http://www.bristol.ac.uk/cmm/media/runmlwin/lips1.dta")[,c(1,3,5,9:30)]
>
> I would be very grateful for any indication of how to fit a multiple membership model to such data. In this instance, the outcome ("obs") is Poisson-distributed (number of instances of lip cancer in an area), and "perc_aff" is a covariate (percentage of the population working in outdoor activities). Each unit ("area") is observed only once, but a mixed model could be useful for modelling the spatial dependence in the data, where each area is taken to be nested in a set of neighbours (identified in columns 4-14). Each area has anywhere from 1 to 11 neighbours, so each neighbour's share of the overall "neighbour" classification would have to be weighted somewhere from 1/11 to 1. Weights are given in columns 14-25.
>
> So a starting model is:
> summary(M1 <- glm(obs ~ 1, lips, family=poisson))
>
> Then:
> library(lme4)
> (M2 <- lmer(obs ~ 1 + (1 | neigh1), lips, family=poisson))
>
> However, the random effect in M2 should be replaced by a random effect for "neighbours" (not just "neigh1") where each of the 1 to 11 neighbours "counts equally" towards the classification, not just the area's first neighbour.
>
> Can this indeed be done in lme4 and/or MCMCglmm? If so, can you please show how? A similar analysis of this dataset, using MLwiN, is described in chapter 17 of
> http://www.bristol.ac.uk/cmm/software/mlwin/download/mcmc-09.pdf.
>
> Thanks,
> Malcolm
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
-------------- next part --------------

R version 2.14.0 Under development (unstable) (2011-06-24 r56210)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(lme4a)
Loading required package: Matrix
Loading required package: lattice

Attaching package: ?Matrix?

The following object(s) are masked from ?package:base?:

    det

Loading required package: minqa
Loading required package: Rcpp
Loading required package: MatrixModels

Attaching package: ?MatrixModels?

The following object(s) are masked from ?package:stats?:

    getCall

> library(foreign)
> ## convert the neighXX columns to factors with a consistent set of levels
> ## drop the weights columns
> lips <- within(read.dta("http://www.bristol.ac.uk/cmm/media/runmlwin/lips1.dta")[,c(1,3,5,9:30)],
+            {
+                area <- factor(area)
+                neigh1 <- factor(neigh1, levels=1:56)
+                neigh2 <- factor(neigh2, levels=1:56)
+                neigh3 <- factor(neigh3, levels=1:56)
+                neigh4 <- factor(neigh4, levels=1:56)
+                neigh5 <- factor(neigh5, levels=1:56)
+                neigh6 <- factor(neigh6, levels=1:56)
+                neigh7 <- factor(neigh7, levels=1:56)
+                neigh8 <- factor(neigh8, levels=1:56)
+                neigh9 <- factor(neigh9, levels=1:56)
+                neigh10 <- factor(neigh10, levels=1:56)
+                neigh11 <- factor(neigh11, levels=1:56)
+            })[ , 1:14]
> ## truncate the names
> names(lips)[4:14] <- paste(sprintf("n%02d", 1:11))
> str(lips)
'data.frame':	56 obs. of  14 variables:
 $ area    : Factor w/ 56 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ obs     : int  9 39 11 9 15 8 26 7 6 20 ...
 $ perc_aff: int  16 16 10 24 10 24 10 7 7 16 ...
 $ n01     : Factor w/ 56 levels "1","2","3","4",..: 5 7 6 18 1 3 2 6 1 2 ...
 $ n02     : Factor w/ 56 levels "1","2","3","4",..: 9 10 12 20 11 8 10 NA 11 7 ...
 $ n03     : Factor w/ 56 levels "1","2","3","4",..: 11 NA NA 28 12 NA 13 NA 17 16 ...
 $ n04     : Factor w/ 56 levels "1","2","3","4",..: 19 NA NA NA 13 NA 16 NA 19 22 ...
 $ n05     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA 19 NA 17 NA 23 NA ...
 $ n06     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA 29 NA ...
 $ n07     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
 $ n08     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
 $ n09     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
 $ n10     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
 $ n11     : Factor w/ 56 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
> 
> ## Convert each nXX column to a sparse model matrix.  We can almost do
> ## this with as(lips$n01, "sparseMatrix") but that drops unused levels
> ## and we must keep them.  Instead we use the hidden function behind
> ## that coercion function.
> 
> Zl <- lapply(names(lips[4:14]), # list of sparse model matrices, one for each column
+              function(nm) Matrix:::fac2sparse(lips[[nm]], "d", drop=FALSE))
> ZZ <- Reduce("+", Zl[-1], Zl[[1]])
> str(ZZ)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:264] 4 8 10 18 6 9 5 11 17 19 ...
  ..@ p       : int [1:57] 0 4 6 8 11 16 18 23 24 30 ...
  ..@ Dim     : int [1:2] 56 56
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:56] "1" "2" "3" "4" ...
  .. ..$ : NULL
  ..@ x       : num [1:264] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ factors : list()
> 
> ## generate, but do not fit, the model structure
> mstr <- glmer(obs ~ perc_aff + (1|area), lips, poisson, doFit=FALSE)
> str(mstr at re)
Formal class 'reTrms' [package "lme4a"] with 9 slots
  ..@ flist :List of 1
  .. ..$ area: Factor w/ 56 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. ..- attr(*, "assign")= int 1
  ..@ cnms  :List of 1
  .. ..$ area: chr "(Intercept)"
  ..@ L     :Formal class 'dCHMsimpl' [package "Matrix"] with 10 slots
  .. .. ..@ x       : num [1:56] 1.41 1.41 1.41 1.41 1.41 ...
  .. .. ..@ p       : int [1:57] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ i       : int [1:56] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ nz      : int [1:56] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..@ nxt     : int [1:58] 1 2 3 4 5 6 7 8 9 10 ...
  .. .. ..@ prv     : int [1:58] 57 0 1 2 3 4 5 6 7 8 ...
  .. .. ..@ colcount: int [1:56] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..@ perm    : int [1:56] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ type    : int [1:4] 2 1 0 1
  .. .. ..@ Dim     : int [1:2] 56 56
  ..@ Lambda:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:56] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ p       : int [1:57] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ Dim     : int [1:2] 56 56
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ x       : num [1:56] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..@ factors : list()
  ..@ Lind  : int [1:56] 1 1 1 1 1 1 1 1 1 1 ...
  ..@ Zt    :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. ..@ i       : int [1:56] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ p       : int [1:57] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ Dim     : int [1:2] 56 56
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : chr [1:56] "1" "2" "3" "4" ...
  .. .. .. ..$ : NULL
  .. .. ..@ x       : num [1:56] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..@ factors : list()
  ..@ lower : num 0
  ..@ theta : num 1
  ..@ u     : num [1:56] 0 0 0 0 0 0 0 0 0 0 ...
>                                         # replace the Zt matrix and L factor
> re <- mstr at re
> re at Zt <- ZZ
> re at L <- Cholesky(tcrossprod(ZZ), LDL=FALSE, Imult=1)
> mstr at re <- re
>                                         # fit and summarize the model
> summary(ans <- lme4a:::PIRLSest(mstr, verbose=2L, control=list(), nAGQ=1L))
npt = 3 , n =  1 
rhobeg =  0.2 , rhoend =  2e-07 
   0.020:   5:      164.386;0.600000 
  0.0020:  12:      159.497;0.420054 
 0.00020:  17:      159.485;0.427453 
 2.0e-05:  19:      159.485;0.427058 
 2.0e-06:  20:      159.485;0.427026 
 2.0e-07:  22:      159.485;0.427026 
At return
 25:     159.48455: 0.427025
npt = 5 , n =  3 
rhobeg =  0.2 , rhoend =  2e-07 
   0.020:   6:      159.484;0.427025  2.45348 -0.00103278 
  0.0020:   9:      159.484;0.427025  2.45348 -0.00103278 
 0.00020:  17:      159.462;0.428080  2.43929 -0.00209994 
 2.0e-05:  51:      159.456;0.426677  2.42444 -0.00112120 
 2.0e-06:  77:      159.456;0.426957  2.42220 -0.00100983 
 2.0e-07:  94:      159.456;0.426955  2.42214 -0.00100766 
At return
110:     159.45636: 0.426955  2.42214 -0.00100739
Generalized linear mixed model fit by maximum likelihood ['summary.mer']
 Family: poisson 
Formula: obs ~ perc_aff + (1 | area) 
   Data: lips 
     AIC      BIC   logLik deviance 
165.4564 171.5324 -79.7282 159.4564 

Random effects:
 Groups Name        Variance Std.Dev.
 area   (Intercept) 0.1823   0.427   
Number of obs: 56, groups: area, 56

Fixed effects:
             Estimate Std. Error z value
(Intercept)  2.422140   0.244911   9.890
perc_aff    -0.001007   0.015505  -0.065

Correlation of Fixed Effects:
         (Intr)
perc_aff -0.646
> 
> 
> proc.time()
   user  system elapsed 
 10.730   0.110  11.611