[R] Logistic regression : dicrepancies between glm and nls ?
Emmanuel Charpentier
charpent at bacbuc.dyndns.org
Fri Dec 14 16:47:14 CET 2001
Dear list,
I'm trying to learn how to use nlme to be able to fit ad analyse
mixed-model logistic regressions. In order to keep things simple, I
started by the simplest possible model : a one (fixed-effect ...)
continuous variable. This problem is, of course, solved by glm, but I
wanted to look at a "hand-made" nls fit, in order to be able to
"generalize" to nlme fits.
> ## Let's start with the simplest model : one fixed-effect continuous
variable
>
> ## Dummy data
>
> size<-500
>
> logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable
>
> alpha<-0.5
>
> beta<-2
>
> ## Simulate a response
> ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx))
>
> logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/
+ (1+exp(alpha+beta*logdata$x))))
>
> ## Realty check : are the data "reasonably random" ?
>
> table(logdata$y)
0 1
251 249
>
> by(logdata$x,logdata$y,summary)
INDICES: 0
Min. 1st Qu. Median Mean 3rd Qu. Max.
-9.993 -7.243 -4.594 -4.873 -2.678 1.081
------------------------------------------------------------
INDICES: 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.526 1.791 5.008 4.663 7.235 9.983
So far, no reason to wail ...
> ## another reality check : what's the "classical" logistic regression ?
>
> logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit))
>
> ## nls should give the same estimates, up to convergence discrepancies
>
> logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata,
+ start=list(a=0,b=1))
>
> ## let's see ...
>
> summary(logdata.glm)
Call:
glm(formula = y ~ x, family = binomial(link = logit), data = logdata)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.4394149 -0.0308105 -0.0001991 0.0082031 2.0389572
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9044 0.2849 3.174 0.00150 **
x 1.8675 0.2739 6.818 9.2e-12 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 693.14 on 499 degrees of freedom
Residual deviance: 100.76 on 498 degrees of freedom
AIC: 104.76
Number of Fisher Scoring iterations: 8
>
> summary(logdata.nls)
Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x))
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 0.8979 0.1285 6.99 8.86e-12 ***
b 1.5806 0.1474 10.73 < 2e-16 ***
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Residual standard error: 0.1811 on 498 degrees of freedom
Correlation of Parameter Estimates:
a
b 0.5964
Hmmm ... the alpha estimators are quite close to each other, but the
beta estimators are quite different. Furthermore, the standard errors
are quite different.
Further simulation work showed that :
a) the alpha estimators can be much more different than they are in the
present example ;
b) the "biases" (differences between estimators) do *not* depend of
initial values choosen for the nls estimation ;
c) they depend on the size of the sample (the larger the sample, the
spaller the "biases") ;
d) the standard error estimates given by glm are lrger than those given
by nls.
Can someone explain to me why those two methods of fitting a (quite)
simple model give so different results ?
I *think* that two methods should end um with the same estimators (at
least asymptotically). Where an why am I wrong ?
Sincerely,
Emmanuel Charpentier
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list