[R-sig-ME] Logistic regression with spatial autocorrelation structure

Mon Feb 7 20:55:51 CET 2011

On Mon, Feb 7, 2011 at 1:26 PM,  <Dale.W.Steele at gmail.com> wrote:
> Suspect I'm not asking the question well. I'm clear on how to get a point
> estimate for
> the predicted probability in both cases. How much uncertainty
> is inherent in these prediction(s)? Not yet clear to me how to obtain
> something like a 95% confidence
> interval. Best. --Dale

That is considerably more complex and I'm not quite sure how to answer
it in the general case.  In fact, I don't really know how the standard
error of a predicted probability would be calculated even for a
fixed-effects GLM.

> On Feb 7, 2011 1:56pm, Douglas Bates <bates at stat.wisc.edu> wrote:
>> On Mon, Feb 7, 2011 at 10:58 AM,  Dale.W.Steele at gmail.com> wrote:
>>
>> > Dear Prof. Bates - Thanks for your response. How would I calculate a 95%
>>
>> > confidence interval for
>>
>> > the probability of appendicitis in a new patient given a particular set
>> > of
>>
>> > covariates for
>>
>>
>>
>> > 1) patients enrolled by a specific clinician (eg. doc == "18")?
>>
>> > 2) some future clinician?
>>
>>
>>
>> Didn't I answer 2 in my previous message in the part about
>>
>>
>>
>> > fixef(m2)
>>
>> (Intercept)   priorprob  genderMale         wbc
>>
>>  -8.2886183   0.0502540   1.1870539   0.3412998
>>
>> > eta
>> > eta
>>
>>         [,1]
>>
>> [1,] -2.299365
>>
>> > binomial()$linkinv(eta)
>>
>>          [,1]
>>
>> [1,] 0.09117557
>>
>>
>>
>> It is certainly possible to automate the process a bit more with a
>>
>> predict method but, for the time being, I think that should be the way
>>
>> to go about it.
>>
>>
>>
>> If you have a simple random effect for doc, as in model m2, then to
>>
>> get a prediction for a specific physician you add the random effect
>>
>> for, say, doc == "18", to the value of eta before transforming it.
>>
>> You can get the conditional modes of the random effects with the ranef
>>
>> extractor.
>>
>>
>>
>> > Thanks. --Dale
>>
>> >
>>
>> > On Feb 1, 2011 1:08pm, Douglas Bates bates at stat.wisc.edu> wrote:
>>
>> >> On Tue, Feb 1, 2011 at 12:05 PM, Douglas Bates bates at stat.wisc.edu>
>> >> wrote:
>>
>> >>
>>
>> >> > On Mon, Jan 31, 2011 at 11:45 AM, Dale Steele
>> >> > dale.w.steele at gmail.com>
>>
>> >> > wrote:
>>
>> >>
>>
>> >> >> Dear mixed-modeling  experts,
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> I'm interested in modeling the probability of appendicitis in
>> >> >> patients
>>
>> >>
>>
>> >> >> with abdominal pain.
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> The R binary data file 'http://www.ped-em.org/appy.rda' contains
>>
>> >>
>>
>> >> >> the following variables from a pilot study of 138 children with
>>
>> >>
>>
>> >> >> abdominal pain.
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Thank you for providing the data.
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> >> 'dx'        eventual diagnosis:   0=no appendicitis, 1=appendicitis
>>
>> >>
>>
>> >> >> 'gender'       Male/Female
>>
>> >>
>>
>> >> >> 'wbc'       total white blood cell count
>>
>> >>
>>
>> >> >> 'priorprob'    Clinical predicted probability of appendicitis
>>
>> >>
>>
>> >> >> 'doc'       doctor who assigned 'priorprob'
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> After taking a history and performing a physical examination, the ER
>>
>> >> >> doctor
>>
>> >>
>>
>> >> >> was asked to make a vertical mark on a 100 mm horizontal line to
>>
>> >> >> represent
>>
>> >>
>>
>> >> >> her estimate of the (percent) probability that the patient had
>>
>> >> >> appendicitis.
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> My initial thought was to fit a multiple logistic regression model:
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> m1
>>
>> >> >>
>>
>> >>
>>
>> >> >> However, it seems likely that each doctor interpreted the
>> >> >> probability
>>
>> >> >> scale
>>
>> >>
>>
>> >> >> differently.  The 23 doctors evaluated from 1 to 17 patients each.
>> >> >> I'm
>>
>> >>
>>
>> >> >> not primarily interest in predictions by a specific clinician.
>> >> >>  Thus,
>>
>> >>
>>
>> >> >> it seems to make sense to fit a generalized linear mixed model.
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> At this point I get muddled. Have I correctly specified a random
>>
>> >>
>>
>> >> >> intercept model (m2) and a random intercept/random slope model (m3)?
>>
>> >>
>>
>> >> >> Are there other sensible models?
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Yes, you have correctly specified them.  However, if you check the
>>
>> >>
>>
>> >> > output from the fits
>>
>> >>
>>
>> >> >> (m2
>>
>> >> > Generalized linear mixed model fit by the Laplace approximation
>>
>> >>
>>
>> >> > Formula: dx ~ priorprob + gender + wbc + (1 | doc)
>>
>> >>
>>
>> >> >   Data: appy
>>
>> >>
>>
>> >> >   AIC   BIC logLik deviance
>>
>> >>
>>
>> >> >  108.1 122.7 -49.03    98.07
>>
>> >>
>>
>> >> > Random effects:
>>
>> >>
>>
>> >> >  Groups Name        Variance Std.Dev.
>>
>> >>
>>
>> >> >  doc    (Intercept) 0.82922  0.91062
>>
>> >>
>>
>> >> > Number of obs: 138, groups: doc, 23
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Fixed effects:
>>
>> >>
>>
>> >> >            Estimate Std. Error z value Pr(>|z|)
>>
>> >>
>>
>> >> > (Intercept) -8.28862    1.45666  -5.690 1.27e-08
>>
>> >>
>>
>> >> > priorprob    0.05025    0.01291   3.892 9.96e-05
>>
>> >>
>>
>> >> > genderMale   1.18705    0.56752   2.092   0.0365
>>
>> >>
>>
>> >> > wbc          0.34130    0.07090   4.814 1.48e-06
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Correlation of Fixed Effects:
>>
>> >>
>>
>> >> >           (Intr) prrprb gndrMl
>>
>> >>
>>
>> >> > priorprob  -0.775
>>
>> >>
>>
>> >> > genderMale -0.293  0.130
>>
>> >>
>>
>> >> > wbc        -0.790  0.353  0.040
>>
>> >>
>>
>> >> >> (m3
>>
>> >> > Generalized linear mixed model fit by the Laplace approximation
>>
>> >>
>>
>> >> > Formula: dx ~ priorprob + gender + wbc + (priorprob | doc)
>>
>> >>
>>
>> >> >   Data: appy
>>
>> >>
>>
>> >> >   AIC BIC logLik deviance
>>
>> >>
>>
>> >> >  111.5 132 -48.76    97.52
>>
>> >>
>>
>> >> > Random effects:
>>
>> >>
>>
>> >> >  Groups Name        Variance   Std.Dev. Corr
>>
>> >>
>>
>> >> >  doc    (Intercept) 0.02897010 0.170206
>>
>> >>
>>
>> >> >        priorprob   0.00016799 0.012961 1.000
>>
>> >>
>>
>> >> > Number of obs: 138, groups: doc, 23
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Fixed effects:
>>
>> >>
>>
>> >> >            Estimate Std. Error z value Pr(>|z|)
>>
>> >>
>>
>> >> > (Intercept) -8.23977    1.44408  -5.706 1.16e-08
>>
>> >>
>>
>> >> > priorprob    0.04957    0.01328   3.733 0.000189
>>
>> >>
>>
>> >> > genderMale   1.16593    0.57481   2.028 0.042521
>>
>> >>
>>
>> >> > wbc          0.34485    0.07056   4.887 1.02e-06
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Correlation of Fixed Effects:
>>
>> >>
>>
>> >> >           (Intr) prrprb gndrMl
>>
>> >>
>>
>> >> > priorprob  -0.757
>>
>> >>
>>
>> >> > genderMale -0.299  0.141
>>
>> >>
>>
>> >> > wbc        -0.800  0.350  0.030
>>
>> >>
>>
>> >> >> anova(m2, m3)
>>
>> >>
>>
>> >> > Data: appy
>>
>> >>
>>
>> >> > Models:
>>
>> >>
>>
>> >> > m2: dx ~ priorprob + gender + wbc + (1 | doc)
>>
>> >>
>>
>> >> > m3: dx ~ priorprob + gender + wbc + (priorprob | doc)
>>
>> >>
>>
>> >> >   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
>>
>> >>
>>
>> >> > m2  5 108.07 122.70 -49.034
>>
>> >>
>>
>> >> > m3  7 111.52 132.01 -48.759 0.5501      2     0.7595
>>
>> >>
>>
>> >> >> (m4
>>
>> >> > Generalized linear mixed model fit by the Laplace approximation
>>
>> >>
>>
>> >> > Formula: dx ~ priorprob + wbc + (1 | doc)
>>
>> >>
>>
>> >> >   Data: appy
>>
>> >>
>>
>> >> >   AIC   BIC logLik deviance
>>
>> >>
>>
>> >> >  110.5 122.2 -51.24    102.5
>>
>> >>
>>
>> >> > Random effects:
>>
>> >>
>>
>> >> >  Groups Name        Variance Std.Dev.
>>
>> >>
>>
>> >> >  doc    (Intercept) 0.63458  0.7966
>>
>> >>
>>
>> >> > Number of obs: 138, groups: doc, 23
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Fixed effects:
>>
>> >>
>>
>> >> >            Estimate Std. Error z value Pr(>|z|)
>>
>> >>
>>
>> >> > (Intercept) -7.62797    1.33760  -5.703 1.18e-08
>>
>> >>
>>
>> >> > priorprob    0.04834    0.01229   3.934 8.35e-05
>>
>> >>
>>
>> >> > wbc          0.34395    0.06771   5.080 3.77e-07
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > Correlation of Fixed Effects:
>>
>> >>
>>
>> >> >          (Intr) prrprb
>>
>> >>
>>
>> >> > priorprob -0.786
>>
>> >>
>>
>> >> > wbc       -0.813  0.357
>>
>> >>
>>
>> >> >> anova(m4, m3)
>>
>> >>
>>
>> >> > Data: appy
>>
>> >>
>>
>> >> > Models:
>>
>> >>
>>
>> >> > m4: dx ~ priorprob + wbc + (1 | doc)
>>
>> >>
>>
>> >> > m3: dx ~ priorprob + gender + wbc + (priorprob | doc)
>>
>> >>
>>
>> >> >   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
>>
>> >>
>>
>> >> > m4  4 110.48 122.19 -51.240
>>
>> >>
>>
>> >> > m3  7 111.52 132.01 -48.759 4.9639      3     0.1745
>>
>> >>
>>
>> >>
>>
>> >>
>>
>> >> Sorry, that test should have been
>>
>> >>
>>
>> >>
>>
>> >>
>>
>> >> > anova(m4, m2)
>>
>> >>
>>
>> >> Data: appy
>>
>> >>
>>
>> >> Models:
>>
>> >>
>>
>> >> m4: dx ~ priorprob + wbc + (1 | doc)
>>
>> >>
>>
>> >> m2: dx ~ priorprob + gender + wbc + (1 | doc)
>>
>> >>
>>
>> >>   Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)
>>
>> >>
>>
>> >> m4  4 110.48 122.19 -51.240
>>
>> >>
>>
>> >> m2  5 108.07 122.70 -49.034 4.4138      1    0.03565
>>
>> >>
>>
>> >>
>>
>> >>
>>
>> >> > you will see that the random intercept/random slope model produces a
>>
>> >>
>>
>> >> > degenerate fit (estimated correlation of the within-doctor random
>>
>> >>
>>
>> >> > effects is -1.000), which is not significantly better than the random
>>
>> >>
>>
>> >> > intercept fit.  I would therefore use m2. (Because the fixed-effect
>>
>> >>
>>
>> >> > for gender had a z-value close to 2 I performed the more reliable
>>
>> >>
>>
>> >> > likelihood-ratio test, just to check.)
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> >> library(lme4)
>>
>> >>
>>
>> >> >> m2
>>
>> >> >>            family=binomial, data=appy)
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> m3
>>
>> >> >>            family=binomial, data=appy)
>>
>> >>
>>
>> >> >>
>>
>> >>
>>
>> >> >> My ultimate goal is to estimate the probability of appendicitis
>>
>> >>
>>
>> >> >> (and a prediction interval), given a specific 'gender', 'wbc' and
>>
>> >>
>>
>> >> >> 'priorprob' assigned by a doctor with similar diagnostic ability to
>>
>> >>
>>
>> >> >> those who participated in our pilot study. I'm stuck on how to code
>>
>> >> >> this
>>
>> >>
>>
>> >> >> prediction.
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > The prediction will be based on the fixed-effects only.  Because it
>>
>> >>
>>
>> >> > applies to a doctor not in this study, we assign the random effect
>> >> > for
>>
>> >>
>>
>> >> > that doctor to be zero, which is the expected value in the absence of
>>
>> >>
>>
>> >> > any information on that doctor.
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> > The fixef extractor returns the estimates of the fixed-effects
>>
>> >>
>>
>> >> > parameters.  For a female patient with median prior probability (55%)
>>
>> >>
>>
>> >> > and median white blood cell count (9.45) the estimated linear
>>
>> >>
>>
>> >> > predictor (eta = -2.299) corresponds to a probability of 9.1%
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> >
>>
>> >>
>>
>> >> >> fixef(m2)
>>
>> >>
>>
>> >> > (Intercept)   priorprob  genderMale         wbc
>>
>> >>
>>
>> >> >  -8.2886183   0.0502540   1.1870539   0.3412998
>>
>> >>
>>
>> >> >> eta
>>
>> >> >> eta
>>
>> >>
>>
>> >> >          [,1]
>>
>> >>
>>
>> >> > [1,] -2.299365
>>
>> >>
>>
>> >> >> binomial()$linkinv(eta)
>>
>> >>
>>
>> >> >           [,1]
>>
>> >>
>>
>> >> > [1,] 0.09117557
>>
>> >>
>>
>> >> >
>>
>> >>
>>