[R-sig-ME] Zhang 2011 (re)analysis

Tue Nov 1 11:21:08 CET 2011

I compared the respiratory data with the original publication; Davis,
C.S. (1991, Statistics in Medicine, 10, 1959-1980). From this it
appears that
data("respiratory", package = "geepack")
contains exactly the data reported in Appendix I of Davis (1991).

Note that for work with lme4, it is recommended that unique id numbers
are assigned to subjects of the second center, e.g.:
respiratory$id <- ifelse(respiratory$center==1, respiratory$id,
respiratory$id+56)

The second version of these data:
data("respiratory", package = "HSAUR")
differs from the original ones in two respects:
(1) male and female labels are exchanged
(2) for record 428, the outcome/status should be "poor" (0), not "good" (1)

So now we should be ready to replicate Zhang et al. (2011) :)

Reinhold

PS: The page numbers for Zhang et al. (2011) are 2562-2572.
PPS:  The Everitt & Hothorn-Book was published 2006 (not 2002, as I
wrote earlier); the HSAUR package goes with this book.

On Mon, Oct 31, 2011 at 6:44 AM, Ben Bolker <bbolker at gmail.com> wrote:
> On 11-10-31 07:49 PM, Saang-Yoon Hyun wrote:
>> Hi,
>> Could you show the reference: e.g., authors, year, tilte, journal,
>> pages, etc.?   If you could share its PDF version of the paper, it would
>> be more helpful.
>> Thank you,
>> Saang-Yoon
>
>  DOI: 10.1002/sim.4265
> On fitting generalized linear mixed-effects models for binary responses
> using different statistical packages
> Hui Zhang, Naiji Lu, Changyong Feng, Sally W. Thurston,
> Yinglin Xia, Liang Zhua and Xin M. Tu
>
>  Statistics in Medicine.  Apparently no page numbers (online publication?)
>
>>
>> On Mon, Oct 31, 2011 at 12:15 PM, Reinhold Kliegl
>> <reinhold.kliegl at gmail.com <mailto:reinhold.kliegl at gmail.com>> wrote:
>>
>>     Here is some comparison between glm,  glmer (using lme4 with Laplace)
>>     and sabreR (which uses AGHQ, if I recall correctly).
>>     The glm analysis replicates exactly the results reported in Everitt
>>     and Hothorn (2002, Table 13.1, around page 175).  Thus, I am pretty
>>     sure I am using the correct data.
>>
>>      sabreR gives estimates both for the standard homogeneous model,
>>     replicating the glm(),  as well as a random effects model, replicating
>>     glmer() pretty closely, I think
>>
>>     Reinhold
>>
>>
>>     > # HSAUR contains respiratory data
>>     > data("respiratory", package = "HSAUR")
>>     >
>>     > # Convert pretest to covariate "baseline"
>>     > resp <- subset(respiratory, month > "0" )
>>     > resp$baseline <- rep(subset(respiratory, month == "0")$status, each=4)
>>     >
>>     > # Numeric variants of factors
>>     > resp$centr.b <- as.integer(resp$centre) - 1
>>     > resp$treat.b <- as.integer(resp$treatment) - 1
>>     > resp$sex.b <- as.integer(resp$sex)-1
>>     > resp$pre.b <- as.integer(resp$baseline) - 1
>>     > resp$status.b <- as.integer(resp$status) - 1
>>     > resp$id <- as.integer(resp$subject)
>>     >
>>     > # Pooled estimate
>>     > summary(m_glm.b <-glm(status.b ~ centr.b + treat.b + sex.b + pre.b
>>     + age, family="binomial", data=resp))
>>
>>     Call:
>>     glm(formula = status.b ~ centr.b + treat.b + sex.b + pre.b +
>>        age, family = "binomial", data = resp)
>>
>>     Deviance Residuals:
>>        Min       1Q   Median       3Q      Max
>>     -2.3146  -0.8551   0.4336   0.8953   1.9246
>>
>>     Coefficients:
>>                 Estimate Std. Error z value Pr(>|z|)
>>     (Intercept) -0.900171   0.337653  -2.666  0.00768 **
>>     centr.b      0.671601   0.239567   2.803  0.00506 **
>>     treat.b      1.299216   0.236841   5.486 4.12e-08 ***
>>     sex.b        0.119244   0.294671   0.405  0.68572
>>     pre.b        1.882029   0.241290   7.800 6.20e-15 ***
>>     age         -0.018166   0.008864  -2.049  0.04043 *
>>     ---
>>     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>>     (Dispersion parameter for binomial family taken to be 1)
>>
>>        Null deviance: 608.93  on 443  degrees of freedom
>>     Residual deviance: 483.22  on 438  degrees of freedom
>>     AIC: 495.22
>>
>>     Number of Fisher Scoring iterations: 4
>>
>>     >
>>     > # glmer
>>     > library(lme4)
>>     > print(m_glmer_4.L <- glmer(status ~ centre + treatment + sex +
>>     baseline + age + (1|subject),
>>     +       family=binomial,data=resp), cor=FALSE)
>>     Generalized linear mixed model fit by the Laplace approximation
>>     Formula: status ~ centre + treatment + sex + baseline + age + (1 |
>>     subject)
>>       Data: resp
>>      AIC   BIC logLik deviance
>>      443 471.7 -214.5      429
>>     Random effects:
>>      Groups  Name        Variance Std.Dev.
>>      subject (Intercept) 3.8647 1.9659 <tel:3.8647%20%20%201.9659>
>>     Number of obs: 444, groups: subject, 111
>>
>>     Fixed effects:
>>                       Estimate Std. Error z value Pr(>|z|)
>>     (Intercept)        -1.64438    0.75829  -2.169   0.0301 *
>>     centre2             1.04382    0.53193   1.962   0.0497 *
>>     treatmenttreatment  2.15746    0.51757   4.168 3.07e-05 ***
>>     sexmale             0.20194    0.66117   0.305   0.7600
>>     baselinegood        3.06990    0.52608   5.835 5.37e-09 ***
>>     age                -0.02540    0.01998  -1.271   0.2037
>>     ---
>>     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>     >
>>     > # Pooled and clustered
>>     > library(sabreR)
>>     > attach(resp)
>>     > m_sabreR <- sabre(status.b ~ centr.b + treat.b + sex.b + pre.b +
>>     age, case=id)
>>     > print(m_sabreR)
>>     # ... deleted some output
>>     (Standard Homogenous Model)
>>        Parameter              Estimate         Std. Err.        Z-score
>>        ____________________________________________________________________
>>        (intercept)           -0.90017          0.33765          -2.6660
>>        centr.b                0.67160          0.23957           2.8034
>>        treat.b                 1.2992          0.23684           5.4856
>>        sex.b                  0.11924          0.29467          0.40467
>>        pre.b                   1.8820          0.24129           7.7999
>>        age                   -0.18166E-01      0.88644E-02      -2.0493
>>
>>
>>
>>     (Random Effects Model)
>>        Parameter              Estimate         Std. Err.        Z-score
>>        ____________________________________________________________________
>>        (intercept)            -1.6642          0.84652          -1.9660
>>        centr.b                0.99044          0.56561           1.7511
>>        treat.b                 2.1265          0.57198           3.7177
>>        sex.b                  0.18166          0.70814          0.25653
>>        pre.b                   2.9987          0.60174           4.9834
>>        age                   -0.22949E-01      0.21337E-01      -1.0755
>>        scale                   1.9955          0.32093           6.2180
>>
>>     # ... deleted some output
>>     > detach()
>>
>>     On Mon, Oct 31, 2011 at 2:10 AM, Ben Bolker <bbolker at gmail.com
>>     <mailto:bbolker at gmail.com>> wrote:
>>     > On 11-10-31 05:22 AM, Reinhold Kliegl wrote:
>>     >> One problem appears to be that 111 id's are renumbered from 1 to 55
>>     >> (56) in the two groups.
>>     >> Unfortunately, it also appears that there is no unique mapping to
>>     >> treatment groups. So there are some subjects with 8 values
>>     assigned to
>>     >> one of the groups.
>>     >
>>     >  Thanks.  It looks like IDs are nested within center (not within
>>     > treatment).  That doesn't seem to change the story very much (as
>>     far as
>>     > , though (Zhang et al don't report estimated random-effect
>>     variances ...)
>>     >
>>     >
>>     >
>>     >>> library(geepack)
>>     >>> data(respiratory)
>>     >>> resp1 <- respiratory
>>     >>> resp1 <- transform(resp1,
>>     >> +                          center=factor(center),
>>     >> +                          id=factor(id))
>>     >>>
>>     >>> str(resp1)
>>     >> 'data.frame': 444 obs. of  8 variables:
>>     >>  $ center  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
>>     >>  $ id      : Factor w/ 56 levels "1","2","3","4",..: 1 1 1 1 2 2
>>     2 2 3 3 ...
>>     >>  $ treat   : Factor w/ 2 levels "A","P": 2 2 2 2 2 2 2 2 1 1 ...
>>     >>  $ sex     : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
>>     >>  $ age     : int  46 46 46 46 28 28 28 28 23 23 ...
>>     >>  $ baseline: int  0 0 0 0 0 0 0 0 1 1 ...
>>     >>  $ visit   : int  1 2 3 4 1 2 3 4 1 2 ...
>>     >>  $ outcome : int  0 0 0 0 0 0 0 0 1 1 ...
>>     >>> detach("package:geepack") ## allow detaching of doBy
>>     >>> detach("package:doBy")    ## allow detaching of lme4
>>     >>
>>     >> The data appear also in the HSAUR package, here the 111 subjects
>>     >> identified with 5 months (visits) each.  I suspect month 0 was
>>     used as
>>     >> baseline.
>>     >>> library(HSAUR)
>>     >>> data(respiratory)
>>     >>> resp2 <- respiratory
>>     >>>
>>     >>> str(resp2)
>>     >> 'data.frame': 555 obs. of  7 variables:
>>     >>  $ centre   : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
>>     >>  $ treatment: Factor w/ 2 levels "placebo","treatment": 1 1 1 1 1
>>     1 1 1 1 1 ...
>>     >>  $ sex      : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1
>>     1 1 ...
>>     >>  $ age      : num  46 46 46 46 46 28 28 28 28 28 ...
>>     >>  $ status   : Factor w/ 2 levels "poor","good": 1 1 1 1 1 1 1 1 1
>>     1 ...
>>     >>  $ month    : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 1 2 3 4
>>     5 1 2 3 4 5 ...
>>     >>  $ subject  : Factor w/ 111 levels "1","2","3","4",..: 1 1 1 1 1
>>     2 2 2 2 2 ...
>>     >>
>>     >> Reinhold
>>     >>
>>     >> On Sun, Oct 30, 2011 at 10:00 PM, Ben Bolker <bbolker at gmail.com
>>     <mailto:bbolker at gmail.com>> wrote:
>>     >>>
>>     >>>  There's a fairly recent paper by Zhang et al (2011) of interest to
>>     >>> folks on this list
>>     >>>
>>     >>> DOI: 10.1002/sim.4265
>>     >>>
>>     >>>  In response to a post on the AD Model Builder users' list, I took a
>>     >>> quick shot at re-doing some of their results (they have extensive
>>     >>> simulation results, which I haven't tried to replicate yet, and an
>>     >>> analysis of binary data from Davis (1991) which is included (I
>>     *think*
>>     >>> it's the same data set -- the description and size of the data
>>     set match
>>     >>> exactly) in the geepack data set).
>>     >>>
>>     >>>  If anyone's interested, my results so far are posted at
>>     >>>
>>     >>> http://glmm.wikidot.com/local--files/examples/Zhang_reanalysis.Rnw
>>     >>> http://glmm.wikidot.com/local--files/examples/Zhang_reanalysis.pdf
>>     >>>
>>     >>>  So far the R approaches I've tried agree closely with each
>>     other and
>>     >>> with glmmADMB (except MASS::glmmPQL, which I expected to be
>>     different --
>>     >>> the rest all use either Laplace approx. or AGHQ).  They *don't*
>>     agree
>>     >>> with the results Zhang et al got, yet -- I'm sure there's
>>     something I'm
>>     >>> missing in the contrasts or otherwise ...
>>     >>>
>>     >>>  Suggestions or improvements are welcome.
>>     >>>
>>     >>>  cheers
>>     >>>    Ben Bolker
>>     >>>
>>     >>> _______________________________________________
>>     >>> R-sig-mixed-models at r-project.org
>>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>>     >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>     >>>
>>     >
>>     >
>>
>>     _______________________________________________
>>     R-sig-mixed-models at r-project.org
>>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>