[R-sig-ME] Zhang 2011 (re)analysis

Tue Nov 1 14:05:54 CET 2011

On 11-11-01 06:21 AM, Reinhold Kliegl wrote:
> I compared the respiratory data with the original publication; Davis,
> C.S. (1991, Statistics in Medicine, 10, 1959-1980). From this it
> appears that
> data("respiratory", package = "geepack")
> contains exactly the data reported in Appendix I of Davis (1991).
> 
> Note that for work with lme4, it is recommended that unique id numbers
> are assigned to subjects of the second center, e.g.:
> respiratory$id <- ifelse(respiratory$center==1, respiratory$id,
> respiratory$id+56)

 Another way to do this is to make a new "id nested within center"
variable, i.e. idctr <- interaction(id,center) ...

> The second version of these data:
> data("respiratory", package = "HSAUR")
> differs from the original ones in two respects:
> (1) male and female labels are exchanged
> (2) for record 428, the outcome/status should be "poor" (0), not "good" (1)
> 
> So now we should be ready to replicate Zhang et al. (2011) :)

  thanks!

  Ben Bolker

> 
> Reinhold
> 
> PS: The page numbers for Zhang et al. (2011) are 2562-2572.
> PPS:  The Everitt & Hothorn-Book was published 2006 (not 2002, as I
> wrote earlier); the HSAUR package goes with this book.
> 
> On Mon, Oct 31, 2011 at 6:44 AM, Ben Bolker <bbolker at gmail.com> wrote:
>> On 11-10-31 07:49 PM, Saang-Yoon Hyun wrote:
>>> Hi,
>>> Could you show the reference: e.g., authors, year, tilte, journal,
>>> pages, etc.?   If you could share its PDF version of the paper, it would
>>> be more helpful.
>>> Thank you,
>>> Saang-Yoon
>>
>>  DOI: 10.1002/sim.4265
>> On fitting generalized linear mixed-effects models for binary responses
>> using different statistical packages
>> Hui Zhang, Naiji Lu, Changyong Feng, Sally W. Thurston,
>> Yinglin Xia, Liang Zhua and Xin M. Tu
>>
>>  Statistics in Medicine.  Apparently no page numbers (online publication?)
>>
>>>
>>> On Mon, Oct 31, 2011 at 12:15 PM, Reinhold Kliegl
>>> <reinhold.kliegl at gmail.com <mailto:reinhold.kliegl at gmail.com>> wrote:
>>>
>>>     Here is some comparison between glm,  glmer (using lme4 with Laplace)
>>>     and sabreR (which uses AGHQ, if I recall correctly).
>>>     The glm analysis replicates exactly the results reported in Everitt
>>>     and Hothorn (2002, Table 13.1, around page 175).  Thus, I am pretty
>>>     sure I am using the correct data.
>>>
>>>      sabreR gives estimates both for the standard homogeneous model,
>>>     replicating the glm(),  as well as a random effects model, replicating
>>>     glmer() pretty closely, I think
>>>
>>>     Reinhold
>>>
>>>
>>>     > # HSAUR contains respiratory data
>>>     > data("respiratory", package = "HSAUR")
>>>     >
>>>     > # Convert pretest to covariate "baseline"
>>>     > resp <- subset(respiratory, month > "0" )
>>>     > resp$baseline <- rep(subset(respiratory, month == "0")$status, each=4)
>>>     >
>>>     > # Numeric variants of factors
>>>     > resp$centr.b <- as.integer(resp$centre) - 1
>>>     > resp$treat.b <- as.integer(resp$treatment) - 1
>>>     > resp$sex.b <- as.integer(resp$sex)-1
>>>     > resp$pre.b <- as.integer(resp$baseline) - 1
>>>     > resp$status.b <- as.integer(resp$status) - 1
>>>     > resp$id <- as.integer(resp$subject)
>>>     >
>>>     > # Pooled estimate
>>>     > summary(m_glm.b <-glm(status.b ~ centr.b + treat.b + sex.b + pre.b
>>>     + age, family="binomial", data=resp))
>>>
>>>     Call:
>>>     glm(formula = status.b ~ centr.b + treat.b + sex.b + pre.b +
>>>        age, family = "binomial", data = resp)
>>>
>>>     Deviance Residuals:
>>>        Min       1Q   Median       3Q      Max
>>>     -2.3146  -0.8551   0.4336   0.8953   1.9246
>>>
>>>     Coefficients:
>>>                 Estimate Std. Error z value Pr(>|z|)
>>>     (Intercept) -0.900171   0.337653  -2.666  0.00768 **
>>>     centr.b      0.671601   0.239567   2.803  0.00506 **
>>>     treat.b      1.299216   0.236841   5.486 4.12e-08 ***
>>>     sex.b        0.119244   0.294671   0.405  0.68572
>>>     pre.b        1.882029   0.241290   7.800 6.20e-15 ***
>>>     age         -0.018166   0.008864  -2.049  0.04043 *
>>>     ---
>>>     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>
>>>     (Dispersion parameter for binomial family taken to be 1)
>>>
>>>        Null deviance: 608.93  on 443  degrees of freedom
>>>     Residual deviance: 483.22  on 438  degrees of freedom
>>>     AIC: 495.22
>>>
>>>     Number of Fisher Scoring iterations: 4
>>>
>>>     >
>>>     > # glmer
>>>     > library(lme4)
>>>     > print(m_glmer_4.L <- glmer(status ~ centre + treatment + sex +
>>>     baseline + age + (1|subject),
>>>     +       family=binomial,data=resp), cor=FALSE)
>>>     Generalized linear mixed model fit by the Laplace approximation
>>>     Formula: status ~ centre + treatment + sex + baseline + age + (1 |
>>>     subject)
>>>       Data: resp
>>>      AIC   BIC logLik deviance
>>>      443 471.7 -214.5      429
>>>     Random effects:
>>>      Groups  Name        Variance Std.Dev.
>>>      subject (Intercept) 3.8647 1.9659 <tel:3.8647%20%20%201.9659>
>>>     Number of obs: 444, groups: subject, 111
>>>
>>>     Fixed effects:
>>>                       Estimate Std. Error z value Pr(>|z|)
>>>     (Intercept)        -1.64438    0.75829  -2.169   0.0301 *
>>>     centre2             1.04382    0.53193   1.962   0.0497 *
>>>     treatmenttreatment  2.15746    0.51757   4.168 3.07e-05 ***
>>>     sexmale             0.20194    0.66117   0.305   0.7600
>>>     baselinegood        3.06990    0.52608   5.835 5.37e-09 ***
>>>     age                -0.02540    0.01998  -1.271   0.2037
>>>     ---
>>>     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>>     >
>>>     > # Pooled and clustered
>>>     > library(sabreR)
>>>     > attach(resp)
>>>     > m_sabreR <- sabre(status.b ~ centr.b + treat.b + sex.b + pre.b +
>>>     age, case=id)
>>>     > print(m_sabreR)
>>>     # ... deleted some output
>>>     (Standard Homogenous Model)
>>>        Parameter              Estimate         Std. Err.        Z-score
>>>        ____________________________________________________________________
>>>        (intercept)           -0.90017          0.33765          -2.6660
>>>        centr.b                0.67160          0.23957           2.8034
>>>        treat.b                 1.2992          0.23684           5.4856
>>>        sex.b                  0.11924          0.29467          0.40467
>>>        pre.b                   1.8820          0.24129           7.7999
>>>        age                   -0.18166E-01      0.88644E-02      -2.0493
>>>
>>>
>>>
>>>     (Random Effects Model)
>>>        Parameter              Estimate         Std. Err.        Z-score
>>>        ____________________________________________________________________
>>>        (intercept)            -1.6642          0.84652          -1.9660
>>>        centr.b                0.99044          0.56561           1.7511
>>>        treat.b                 2.1265          0.57198           3.7177
>>>        sex.b                  0.18166          0.70814          0.25653
>>>        pre.b                   2.9987          0.60174           4.9834
>>>        age                   -0.22949E-01      0.21337E-01      -1.0755
>>>        scale                   1.9955          0.32093           6.2180
>>>
>>>     # ... deleted some output
>>>     > detach()
>>>
>>>     On Mon, Oct 31, 2011 at 2:10 AM, Ben Bolker <bbolker at gmail.com
>>>     <mailto:bbolker at gmail.com>> wrote:
>>>     > On 11-10-31 05:22 AM, Reinhold Kliegl wrote:
>>>     >> One problem appears to be that 111 id's are renumbered from 1 to 55
>>>     >> (56) in the two groups.
>>>     >> Unfortunately, it also appears that there is no unique mapping to
>>>     >> treatment groups. So there are some subjects with 8 values
>>>     assigned to
>>>     >> one of the groups.
>>>     >
>>>     >  Thanks.  It looks like IDs are nested within center (not within
>>>     > treatment).  That doesn't seem to change the story very much (as
>>>     far as
>>>     > , though (Zhang et al don't report estimated random-effect
>>>     variances ...)
>>>     >
>>>     >
>>>     >
>>>     >>> library(geepack)
>>>     >>> data(respiratory)
>>>     >>> resp1 <- respiratory
>>>     >>> resp1 <- transform(resp1,
>>>     >> +                          center=factor(center),
>>>     >> +                          id=factor(id))
>>>     >>>
>>>     >>> str(resp1)
>>>     >> 'data.frame': 444 obs. of  8 variables:
>>>     >>  $ center  : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
>>>     >>  $ id      : Factor w/ 56 levels "1","2","3","4",..: 1 1 1 1 2 2
>>>     2 2 3 3 ...
>>>     >>  $ treat   : Factor w/ 2 levels "A","P": 2 2 2 2 2 2 2 2 1 1 ...
>>>     >>  $ sex     : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
>>>     >>  $ age     : int  46 46 46 46 28 28 28 28 23 23 ...
>>>     >>  $ baseline: int  0 0 0 0 0 0 0 0 1 1 ...
>>>     >>  $ visit   : int  1 2 3 4 1 2 3 4 1 2 ...
>>>     >>  $ outcome : int  0 0 0 0 0 0 0 0 1 1 ...
>>>     >>> detach("package:geepack") ## allow detaching of doBy
>>>     >>> detach("package:doBy")    ## allow detaching of lme4
>>>     >>
>>>     >> The data appear also in the HSAUR package, here the 111 subjects
>>>     >> identified with 5 months (visits) each.  I suspect month 0 was
>>>     used as
>>>     >> baseline.
>>>     >>> library(HSAUR)
>>>     >>> data(respiratory)
>>>     >>> resp2 <- respiratory
>>>     >>>
>>>     >>> str(resp2)
>>>     >> 'data.frame': 555 obs. of  7 variables:
>>>     >>  $ centre   : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
>>>     >>  $ treatment: Factor w/ 2 levels "placebo","treatment": 1 1 1 1 1
>>>     1 1 1 1 1 ...
>>>     >>  $ sex      : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1
>>>     1 1 ...
>>>     >>  $ age      : num  46 46 46 46 46 28 28 28 28 28 ...
>>>     >>  $ status   : Factor w/ 2 levels "poor","good": 1 1 1 1 1 1 1 1 1
>>>     1 ...
>>>     >>  $ month    : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 1 2 3 4
>>>     5 1 2 3 4 5 ...
>>>     >>  $ subject  : Factor w/ 111 levels "1","2","3","4",..: 1 1 1 1 1
>>>     2 2 2 2 2 ...
>>>     >>
>>>     >> Reinhold
>>>     >>
>>>     >> On Sun, Oct 30, 2011 at 10:00 PM, Ben Bolker <bbolker at gmail.com
>>>     <mailto:bbolker at gmail.com>> wrote:
>>>     >>>
>>>     >>>  There's a fairly recent paper by Zhang et al (2011) of interest to
>>>     >>> folks on this list
>>>     >>>
>>>     >>> DOI: 10.1002/sim.4265
>>>     >>>
>>>     >>>  In response to a post on the AD Model Builder users' list, I took a
>>>     >>> quick shot at re-doing some of their results (they have extensive
>>>     >>> simulation results, which I haven't tried to replicate yet, and an
>>>     >>> analysis of binary data from Davis (1991) which is included (I
>>>     *think*
>>>     >>> it's the same data set -- the description and size of the data
>>>     set match
>>>     >>> exactly) in the geepack data set).
>>>     >>>
>>>     >>>  If anyone's interested, my results so far are posted at
>>>     >>>
>>>     >>> http://glmm.wikidot.com/local--files/examples/Zhang_reanalysis.Rnw
>>>     >>> http://glmm.wikidot.com/local--files/examples/Zhang_reanalysis.pdf
>>>     >>>
>>>     >>>  So far the R approaches I've tried agree closely with each
>>>     other and
>>>     >>> with glmmADMB (except MASS::glmmPQL, which I expected to be
>>>     different --
>>>     >>> the rest all use either Laplace approx. or AGHQ).  They *don't*
>>>     agree
>>>     >>> with the results Zhang et al got, yet -- I'm sure there's
>>>     something I'm
>>>     >>> missing in the contrasts or otherwise ...
>>>     >>>
>>>     >>>  Suggestions or improvements are welcome.
>>>     >>>
>>>     >>>  cheers
>>>     >>>    Ben Bolker
>>>     >>>
>>>     >>> _______________________________________________
>>>     >>> R-sig-mixed-models at r-project.org
>>>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>     >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>     >>>
>>>     >
>>>     >
>>>
>>>     _______________________________________________
>>>     R-sig-mixed-models at r-project.org
>>>     <mailto:R-sig-mixed-models at r-project.org> mailing list
>>>     https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>>
>>>
>>
>>