[R-sig-ME] Zhang 2011 (re)analysis
Reinhold Kliegl
reinhold.kliegl at gmail.com
Tue Nov 1 11:21:08 CET 2011
I compared the respiratory data with the original publication; Davis,
C.S. (1991, Statistics in Medicine, 10, 1959-1980). From this it
appears that
data("respiratory", package = "geepack")
contains exactly the data reported in Appendix I of Davis (1991).
Note that for work with lme4, it is recommended that unique id numbers
are assigned to subjects of the second center, e.g.:
respiratory$id <- ifelse(respiratory$center==1, respiratory$id,
respiratory$id+56)
The second version of these data:
data("respiratory", package = "HSAUR")
differs from the original ones in two respects:
(1) male and female labels are exchanged
(2) for record 428, the outcome/status should be "poor" (0), not "good" (1)
So now we should be ready to replicate Zhang et al. (2011) :)
Reinhold
PS: The page numbers for Zhang et al. (2011) are 2562-2572.
PPS: The Everitt & Hothorn-Book was published 2006 (not 2002, as I
wrote earlier); the HSAUR package goes with this book.
On Mon, Oct 31, 2011 at 6:44 AM, Ben Bolker <bbolker at gmail.com> wrote:
> On 11-10-31 07:49 PM, Saang-Yoon Hyun wrote:
>> Hi,
>> Could you show the reference: e.g., authors, year, tilte, journal,
>> pages, etc.? If you could share its PDF version of the paper, it would
>> be more helpful.
>> Thank you,
>> Saang-Yoon
>
> DOI: 10.1002/sim.4265
> On fitting generalized linear mixed-effects models for binary responses
> using different statistical packages
> Hui Zhang, Naiji Lu, Changyong Feng, Sally W. Thurston,
> Yinglin Xia, Liang Zhua and Xin M. Tu
>
> Statistics in Medicine. Apparently no page numbers (online publication?)
>
>>
>> On Mon, Oct 31, 2011 at 12:15 PM, Reinhold Kliegl
>> <reinhold.kliegl at gmail.com <mailto:reinhold.kliegl at gmail.com>> wrote:
>>
>> Here is some comparison between glm, glmer (using lme4 with Laplace)
>> and sabreR (which uses AGHQ, if I recall correctly).
>> The glm analysis replicates exactly the results reported in Everitt
>> and Hothorn (2002, Table 13.1, around page 175). Thus, I am pretty
>> sure I am using the correct data.
>>
>> sabreR gives estimates both for the standard homogeneous model,
>> replicating the glm(), as well as a random effects model, replicating
>> glmer() pretty closely, I think
>>
>> Reinhold
>>
>>
>> > # HSAUR contains respiratory data
>> > data("respiratory", package = "HSAUR")
>> >
>> > # Convert pretest to covariate "baseline"
>> > resp <- subset(respiratory, month > "0" )
>> > resp$baseline <- rep(subset(respiratory, month == "0")$status, each=4)
>> >
>> > # Numeric variants of factors
>> > resp$centr.b <- as.integer(resp$centre) - 1
>> > resp$treat.b <- as.integer(resp$treatment) - 1
>> > resp$sex.b <- as.integer(resp$sex)-1
>> > resp$pre.b <- as.integer(resp$baseline) - 1
>> > resp$status.b <- as.integer(resp$status) - 1
>> > resp$id <- as.integer(resp$subject)
>> >
>> > # Pooled estimate
>> > summary(m_glm.b <-glm(status.b ~ centr.b + treat.b + sex.b + pre.b
>> + age, family="binomial", data=resp))
>>
>> Call:
>> glm(formula = status.b ~ centr.b + treat.b + sex.b + pre.b +
>> age, family = "binomial", data = resp)
>>
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -2.3146 -0.8551 0.4336 0.8953 1.9246
>>
>> Coefficients:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -0.900171 0.337653 -2.666 0.00768 **
>> centr.b 0.671601 0.239567 2.803 0.00506 **
>> treat.b 1.299216 0.236841 5.486 4.12e-08 ***
>> sex.b 0.119244 0.294671 0.405 0.68572
>> pre.b 1.882029 0.241290 7.800 6.20e-15 ***
>> age -0.018166 0.008864 -2.049 0.04043 *
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>
>> (Dispersion parameter for binomial family taken to be 1)
>>
>> Null deviance: 608.93 on 443 degrees of freedom
>> Residual deviance: 483.22 on 438 degrees of freedom
>> AIC: 495.22
>>
>> Number of Fisher Scoring iterations: 4
>>
>> >
>> > # glmer
>> > library(lme4)
>> > print(m_glmer_4.L <- glmer(status ~ centre + treatment + sex +
>> baseline + age + (1|subject),
>> + family=binomial,data=resp), cor=FALSE)
>> Generalized linear mixed model fit by the Laplace approximation
>> Formula: status ~ centre + treatment + sex + baseline + age + (1 |
>> subject)
>> Data: resp
>> AIC BIC logLik deviance
>> 443 471.7 -214.5 429
>> Random effects:
>> Groups Name Variance Std.Dev.
>> subject (Intercept) 3.8647 1.9659 <tel:3.8647%20%20%201.9659>
>> Number of obs: 444, groups: subject, 111
>>
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) -1.64438 0.75829 -2.169 0.0301 *
>> centre2 1.04382 0.53193 1.962 0.0497 *
>> treatmenttreatment 2.15746 0.51757 4.168 3.07e-05 ***
>> sexmale 0.20194 0.66117 0.305 0.7600
>> baselinegood 3.06990 0.52608 5.835 5.37e-09 ***
>> age -0.02540 0.01998 -1.271 0.2037
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > # Pooled and clustered
>> > library(sabreR)
>> > attach(resp)
>> > m_sabreR <- sabre(status.b ~ centr.b + treat.b + sex.b + pre.b +
>> age, case=id)
>> > print(m_sabreR)
>> # ... deleted some output
>> (Standard Homogenous Model)
>> Parameter Estimate Std. Err. Z-score
>> ____________________________________________________________________
>> (intercept) -0.90017 0.33765 -2.6660
>> centr.b 0.67160 0.23957 2.8034
>> treat.b 1.2992 0.23684 5.4856
>> sex.b 0.11924 0.29467 0.40467
>> pre.b 1.8820 0.24129 7.7999
>> age -0.18166E-01 0.88644E-02 -2.0493
>>
>>
>>
>> (Random Effects Model)
>> Parameter Estimate Std. Err. Z-score
>> ____________________________________________________________________
>> (intercept) -1.6642 0.84652 -1.9660
>> centr.b 0.99044 0.56561 1.7511
>> treat.b 2.1265 0.57198 3.7177
>> sex.b 0.18166 0.70814 0.25653
>> pre.b 2.9987 0.60174 4.9834
>> age -0.22949E-01 0.21337E-01 -1.0755
>> scale 1.9955 0.32093 6.2180
>>
>> # ... deleted some output
>> > detach()
>>
>> On Mon, Oct 31, 2011 at 2:10 AM, Ben Bolker <bbolker at gmail.com
>> <mailto:bbolker at gmail.com>> wrote:
>> > On 11-10-31 05:22 AM, Reinhold Kliegl wrote:
>> >> One problem appears to be that 111 id's are renumbered from 1 to 55
>> >> (56) in the two groups.
>> >> Unfortunately, it also appears that there is no unique mapping to
>> >> treatment groups. So there are some subjects with 8 values
>> assigned to
>> >> one of the groups.
>> >
>> > Thanks. It looks like IDs are nested within center (not within
>> > treatment). That doesn't seem to change the story very much (as
>> far as
>> > , though (Zhang et al don't report estimated random-effect
>> variances ...)
>> >
>> >
>> >
>> >>> library(geepack)
>> >>> data(respiratory)
>> >>> resp1 <- respiratory
>> >>> resp1 <- transform(resp1,
>> >> + center=factor(center),
>> >> + id=factor(id))
>> >>>
>> >>> str(resp1)
>> >> 'data.frame': 444 obs. of 8 variables:
>> >> $ center : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
>> >> $ id : Factor w/ 56 levels "1","2","3","4",..: 1 1 1 1 2 2
>> 2 2 3 3 ...
>> >> $ treat : Factor w/ 2 levels "A","P": 2 2 2 2 2 2 2 2 1 1 ...
>> >> $ sex : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
>> >> $ age : int 46 46 46 46 28 28 28 28 23 23 ...
>> >> $ baseline: int 0 0 0 0 0 0 0 0 1 1 ...
>> >> $ visit : int 1 2 3 4 1 2 3 4 1 2 ...
>> >> $ outcome : int 0 0 0 0 0 0 0 0 1 1 ...
>> >>> detach("package:geepack") ## allow detaching of doBy
>> >>> detach("package:doBy") ## allow detaching of lme4
>> >>
>> >> The data appear also in the HSAUR package, here the 111 subjects
>> >> identified with 5 months (visits) each. I suspect month 0 was
>> used as
>> >> baseline.
>> >>> library(HSAUR)
>> >>> data(respiratory)
>> >>> resp2 <- respiratory
>> >>>
>> >>> str(resp2)
>> >> 'data.frame': 555 obs. of 7 variables:
>> >> $ centre : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
>> >> $ treatment: Factor w/ 2 levels "placebo","treatment": 1 1 1 1 1
>> 1 1 1 1 1 ...
>> >> $ sex : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1
>> 1 1 ...
>> >> $ age : num 46 46 46 46 46 28 28 28 28 28 ...
>> >> $ status : Factor w/ 2 levels "poor","good": 1 1 1 1 1 1 1 1 1
>> 1 ...
>> >> $ month : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 1 2 3 4
>> 5 1 2 3 4 5 ...
>> >> $ subject : Factor w/ 111 levels "1","2","3","4",..: 1 1 1 1 1
>> 2 2 2 2 2 ...
>> >>
>> >> Reinhold
>> >>
>> >> On Sun, Oct 30, 2011 at 10:00 PM, Ben Bolker <bbolker at gmail.com
>> <mailto:bbolker at gmail.com>> wrote:
>> >>>
>> >>> There's a fairly recent paper by Zhang et al (2011) of interest to
>> >>> folks on this list
>> >>>
>> >>> DOI: 10.1002/sim.4265
>> >>>
>> >>> In response to a post on the AD Model Builder users' list, I took a
>> >>> quick shot at re-doing some of their results (they have extensive
>> >>> simulation results, which I haven't tried to replicate yet, and an
>> >>> analysis of binary data from Davis (1991) which is included (I
>> *think*
>> >>> it's the same data set -- the description and size of the data
>> set match
>> >>> exactly) in the geepack data set).
>> >>>
>> >>> If anyone's interested, my results so far are posted at
>> >>>
>> >>> http://glmm.wikidot.com/local--files/examples/Zhang_reanalysis.Rnw
>> >>> http://glmm.wikidot.com/local--files/examples/Zhang_reanalysis.pdf
>> >>>
>> >>> So far the R approaches I've tried agree closely with each
>> other and
>> >>> with glmmADMB (except MASS::glmmPQL, which I expected to be
>> different --
>> >>> the rest all use either Laplace approx. or AGHQ). They *don't*
>> agree
>> >>> with the results Zhang et al got, yet -- I'm sure there's
>> something I'm
>> >>> missing in the contrasts or otherwise ...
>> >>>
>> >>> Suggestions or improvements are welcome.
>> >>>
>> >>> cheers
>> >>> Ben Bolker
>> >>>
>> >>> _______________________________________________
>> >>> R-sig-mixed-models at r-project.org
>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> >>>
>> >
>> >
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org
>> <mailto:R-sig-mixed-models at r-project.org> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>
>>
>
>
More information about the R-sig-mixed-models
mailing list