[R-sig-ME] Logistic regression with spatial autocorrelation structure

Tue Feb 1 19:08:52 CET 2011

On Tue, Feb 1, 2011 at 12:05 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Mon, Jan 31, 2011 at 11:45 AM, Dale Steele <dale.w.steele at gmail.com> wrote:
>> Dear mixed-modeling  experts,
>>
>> I'm interested in modeling the probability of appendicitis in patients
>> with abdominal pain.
>>
>> The R binary data file 'http://www.ped-em.org/appy.rda' contains
>> the following variables from a pilot study of 138 children with
>> abdominal pain.
>
> Thank you for providing the data.
>
>> 'dx'        eventual diagnosis:   0=no appendicitis, 1=appendicitis
>> 'gender'       Male/Female
>> 'wbc'       total white blood cell count
>> 'priorprob'    Clinical predicted probability of appendicitis
>> 'doc'       doctor who assigned 'priorprob'
>>
>> After taking a history and performing a physical examination, the ER doctor
>> was asked to make a vertical mark on a 100 mm horizontal line to represent
>> her estimate of the (percent) probability that the patient had appendicitis.
>>
>> My initial thought was to fit a multiple logistic regression model:
>>
>> m1 <- glm(dx ~ gender + priorprob + wbc + doc, family=binomial, data=appy)
>>
>> However, it seems likely that each doctor interpreted the probability scale
>> differently.  The 23 doctors evaluated from 1 to 17 patients each. I'm
>> not primarily interest in predictions by a specific clinician.  Thus,
>> it seems to make sense to fit a generalized linear mixed model.
>>
>> At this point I get muddled. Have I correctly specified a random
>> intercept model (m2) and a random intercept/random slope model (m3)?
>> Are there other sensible models?
>
> Yes, you have correctly specified them.  However, if you check the
> output from the fits
>> (m2 <- glmer(dx ~ priorprob + gender + wbc + (1 | doc), appy, binomial))
> Generalized linear mixed model fit by the Laplace approximation
> Formula: dx ~ priorprob + gender + wbc + (1 | doc)
>   Data: appy
>   AIC   BIC logLik deviance
>  108.1 122.7 -49.03    98.07
> Random effects:
>  Groups Name        Variance Std.Dev.
>  doc    (Intercept) 0.82922  0.91062
> Number of obs: 138, groups: doc, 23
>
> Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) -8.28862    1.45666  -5.690 1.27e-08
> priorprob    0.05025    0.01291   3.892 9.96e-05
> genderMale   1.18705    0.56752   2.092   0.0365
> wbc          0.34130    0.07090   4.814 1.48e-06
>
> Correlation of Fixed Effects:
>           (Intr) prrprb gndrMl
> priorprob  -0.775
> genderMale -0.293  0.130
> wbc        -0.790  0.353  0.040
>> (m3 <- glmer(dx ~ priorprob + gender + wbc + (priorprob | doc), appy, binomial))
> Generalized linear mixed model fit by the Laplace approximation
> Formula: dx ~ priorprob + gender + wbc + (priorprob | doc)
>   Data: appy
>   AIC BIC logLik deviance
>  111.5 132 -48.76    97.52
> Random effects:
>  Groups Name        Variance   Std.Dev. Corr
>  doc    (Intercept) 0.02897010 0.170206
>        priorprob   0.00016799 0.012961 1.000
> Number of obs: 138, groups: doc, 23
>
> Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) -8.23977    1.44408  -5.706 1.16e-08
> priorprob    0.04957    0.01328   3.733 0.000189
> genderMale   1.16593    0.57481   2.028 0.042521
> wbc          0.34485    0.07056   4.887 1.02e-06
>
> Correlation of Fixed Effects:
>           (Intr) prrprb gndrMl
> priorprob  -0.757
> genderMale -0.299  0.141
> wbc        -0.800  0.350  0.030
>> anova(m2, m3)
> Data: appy
> Models:
> m2: dx ~ priorprob + gender + wbc + (1 | doc)
> m3: dx ~ priorprob + gender + wbc + (priorprob | doc)
>   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
> m2  5 108.07 122.70 -49.034
> m3  7 111.52 132.01 -48.759 0.5501      2     0.7595
>> (m4 <- update(m2, . ~ . - gender))  # check if gender is significant, using LRT
> Generalized linear mixed model fit by the Laplace approximation
> Formula: dx ~ priorprob + wbc + (1 | doc)
>   Data: appy
>   AIC   BIC logLik deviance
>  110.5 122.2 -51.24    102.5
> Random effects:
>  Groups Name        Variance Std.Dev.
>  doc    (Intercept) 0.63458  0.7966
> Number of obs: 138, groups: doc, 23
>
> Fixed effects:
>            Estimate Std. Error z value Pr(>|z|)
> (Intercept) -7.62797    1.33760  -5.703 1.18e-08
> priorprob    0.04834    0.01229   3.934 8.35e-05
> wbc          0.34395    0.06771   5.080 3.77e-07
>
> Correlation of Fixed Effects:
>          (Intr) prrprb
> priorprob -0.786
> wbc       -0.813  0.357
>> anova(m4, m3)
> Data: appy
> Models:
> m4: dx ~ priorprob + wbc + (1 | doc)
> m3: dx ~ priorprob + gender + wbc + (priorprob | doc)
>   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
> m4  4 110.48 122.19 -51.240
> m3  7 111.52 132.01 -48.759 4.9639      3     0.1745

Sorry, that test should have been

> anova(m4, m2)
Data: appy
Models:
m4: dx ~ priorprob + wbc + (1 | doc)
m2: dx ~ priorprob + gender + wbc + (1 | doc)
   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
m4  4 110.48 122.19 -51.240
m2  5 108.07 122.70 -49.034 4.4138      1    0.03565

> you will see that the random intercept/random slope model produces a
> degenerate fit (estimated correlation of the within-doctor random
> effects is -1.000), which is not significantly better than the random
> intercept fit.  I would therefore use m2. (Because the fixed-effect
> for gender had a z-value close to 2 I performed the more reliable
> likelihood-ratio test, just to check.)
>
>
>> library(lme4)
>> m2 <- glmer(dx ~ priorprob + gender + wbc + (1 | doc),
>>            family=binomial, data=appy)
>>
>> m3 <- glmer(dx ~ priorprob + gender  + wbc + (priorprob | doc),
>>            family=binomial, data=appy)
>>
>> My ultimate goal is to estimate the probability of appendicitis
>> (and a prediction interval), given a specific 'gender', 'wbc' and
>> 'priorprob' assigned by a doctor with similar diagnostic ability to
>> those who participated in our pilot study. I'm stuck on how to code this
>> prediction.
>
> The prediction will be based on the fixed-effects only.  Because it
> applies to a doctor not in this study, we assign the random effect for
> that doctor to be zero, which is the expected value in the absence of
> any information on that doctor.
>
> The fixef extractor returns the estimates of the fixed-effects
> parameters.  For a female patient with median prior probability (55%)
> and median white blood cell count (9.45) the estimated linear
> predictor (eta = -2.299) corresponds to a probability of 9.1%
>
>
>> fixef(m2)
> (Intercept)   priorprob  genderMale         wbc
>  -8.2886183   0.0502540   1.1870539   0.3412998
>> eta <- crossprod(c(1, 55, 0, 9.45), fixef(m2))
>> eta
>          [,1]
> [1,] -2.299365
>> binomial()$linkinv(eta)
>           [,1]
> [1,] 0.09117557
>