[R-sig-ME] Logistic regression with spatial autocorrelation structure

Douglas Bates bates at stat.wisc.edu
Tue Feb 1 19:05:54 CET 2011

On Mon, Jan 31, 2011 at 11:45 AM, Dale Steele <dale.w.steele at gmail.com> wrote:
> Dear mixed-modeling  experts,
>
> I'm interested in modeling the probability of appendicitis in patients
> with abdominal pain.
>
> The R binary data file 'http://www.ped-em.org/appy.rda' contains
> the following variables from a pilot study of 138 children with
> abdominal pain.

Thank you for providing the data.

> 'dx'        eventual diagnosis:   0=no appendicitis, 1=appendicitis
> 'gender'       Male/Female
> 'wbc'       total white blood cell count
> 'priorprob'    Clinical predicted probability of appendicitis
> 'doc'       doctor who assigned 'priorprob'
>
> After taking a history and performing a physical examination, the ER doctor
> was asked to make a vertical mark on a 100 mm horizontal line to represent
> her estimate of the (percent) probability that the patient had appendicitis.
>
> My initial thought was to fit a multiple logistic regression model:
>
> m1 <- glm(dx ~ gender + priorprob + wbc + doc, family=binomial, data=appy)
>
> However, it seems likely that each doctor interpreted the probability scale
> differently.  The 23 doctors evaluated from 1 to 17 patients each. I'm
> not primarily interest in predictions by a specific clinician.  Thus,
> it seems to make sense to fit a generalized linear mixed model.
>
> At this point I get muddled. Have I correctly specified a random
> intercept model (m2) and a random intercept/random slope model (m3)?
> Are there other sensible models?

Yes, you have correctly specified them.  However, if you check the
output from the fits
> (m2 <- glmer(dx ~ priorprob + gender + wbc + (1 | doc), appy, binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: dx ~ priorprob + gender + wbc + (1 | doc)
Data: appy
AIC   BIC logLik deviance
108.1 122.7 -49.03    98.07
Random effects:
Groups Name        Variance Std.Dev.
doc    (Intercept) 0.82922  0.91062
Number of obs: 138, groups: doc, 23

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.28862    1.45666  -5.690 1.27e-08
priorprob    0.05025    0.01291   3.892 9.96e-05
genderMale   1.18705    0.56752   2.092   0.0365
wbc          0.34130    0.07090   4.814 1.48e-06

Correlation of Fixed Effects:
(Intr) prrprb gndrMl
priorprob  -0.775
genderMale -0.293  0.130
wbc        -0.790  0.353  0.040
> (m3 <- glmer(dx ~ priorprob + gender + wbc + (priorprob | doc), appy, binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: dx ~ priorprob + gender + wbc + (priorprob | doc)
Data: appy
AIC BIC logLik deviance
111.5 132 -48.76    97.52
Random effects:
Groups Name        Variance   Std.Dev. Corr
doc    (Intercept) 0.02897010 0.170206
priorprob   0.00016799 0.012961 1.000
Number of obs: 138, groups: doc, 23

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.23977    1.44408  -5.706 1.16e-08
priorprob    0.04957    0.01328   3.733 0.000189
genderMale   1.16593    0.57481   2.028 0.042521
wbc          0.34485    0.07056   4.887 1.02e-06

Correlation of Fixed Effects:
(Intr) prrprb gndrMl
priorprob  -0.757
genderMale -0.299  0.141
wbc        -0.800  0.350  0.030
> anova(m2, m3)
Data: appy
Models:
m2: dx ~ priorprob + gender + wbc + (1 | doc)
m3: dx ~ priorprob + gender + wbc + (priorprob | doc)
Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
m2  5 108.07 122.70 -49.034
m3  7 111.52 132.01 -48.759 0.5501      2     0.7595
> (m4 <- update(m2, . ~ . - gender))  # check if gender is significant, using LRT
Generalized linear mixed model fit by the Laplace approximation
Formula: dx ~ priorprob + wbc + (1 | doc)
Data: appy
AIC   BIC logLik deviance
110.5 122.2 -51.24    102.5
Random effects:
Groups Name        Variance Std.Dev.
doc    (Intercept) 0.63458  0.7966
Number of obs: 138, groups: doc, 23

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.62797    1.33760  -5.703 1.18e-08
priorprob    0.04834    0.01229   3.934 8.35e-05
wbc          0.34395    0.06771   5.080 3.77e-07

Correlation of Fixed Effects:
(Intr) prrprb
priorprob -0.786
wbc       -0.813  0.357
> anova(m4, m3)
Data: appy
Models:
m4: dx ~ priorprob + wbc + (1 | doc)
m3: dx ~ priorprob + gender + wbc + (priorprob | doc)
Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
m4  4 110.48 122.19 -51.240
m3  7 111.52 132.01 -48.759 4.9639      3     0.1745

you will see that the random intercept/random slope model produces a
degenerate fit (estimated correlation of the within-doctor random
effects is -1.000), which is not significantly better than the random
intercept fit.  I would therefore use m2. (Because the fixed-effect
for gender had a z-value close to 2 I performed the more reliable
likelihood-ratio test, just to check.)

> library(lme4)
> m2 <- glmer(dx ~ priorprob + gender + wbc + (1 | doc),
>            family=binomial, data=appy)
>
> m3 <- glmer(dx ~ priorprob + gender  + wbc + (priorprob | doc),
>            family=binomial, data=appy)
>
> My ultimate goal is to estimate the probability of appendicitis
> (and a prediction interval), given a specific 'gender', 'wbc' and
> 'priorprob' assigned by a doctor with similar diagnostic ability to
> those who participated in our pilot study. I'm stuck on how to code this
> prediction.

The prediction will be based on the fixed-effects only.  Because it
applies to a doctor not in this study, we assign the random effect for
that doctor to be zero, which is the expected value in the absence of
any information on that doctor.

The fixef extractor returns the estimates of the fixed-effects
parameters.  For a female patient with median prior probability (55%)
and median white blood cell count (9.45) the estimated linear
predictor (eta = -2.299) corresponds to a probability of 9.1%

> fixef(m2)
(Intercept)   priorprob  genderMale         wbc
-8.2886183   0.0502540   1.1870539   0.3412998
> eta <- crossprod(c(1, 55, 0, 9.45), fixef(m2))
> eta
[,1]
[1,] -2.299365