[R] logistic regression
Mikhail Spivakov
ensdev.box at gmail.com
Wed Jun 25 12:01:58 CEST 2008
Dear Bill,
Thanks for your reply!
I agree that the residual deviance suggests that the model odds = f(A,B,C,D)
is totally saturated given the chunk of data TYPE=="a".
But since I've actually got three chunks of data (TYPE=={"a","b","c"}) that
together form the whole, my model really looks like this:
f1(A,B,C,D) if TYPE=="a",
odds (SUCCESS|TYPE) = f2(A,B,C,D) if TYPE=="b",
f3(A,B,C,D) if TYPE=="c",
odds(SUCCESS|TYPE=="a")+odds(SUCCESS|TYPE=="b")+odds(SUCCESS|TYPE=="c") = 1.
I reasoned that a better way of dealing with this would be to model odds =
f(TYPE, A,B,C,D), but I'm not sure if I was right. What do you think?
Many thanks!
Mikhail
Bill.Venables wrote:
>
> It looks like A*B*C*D is a complete, totally saturated model, (the
> residual deviance is effectively zero, and the residual degrees of
> freedom is exactly zero - this is a clue). So when you try to put even
> more parameters into the model and even higher way interactions,
> something has to give.
>
> I find 3-factor interactions are about as much as I can think about
> without getting a bit giddy. Do you really need 4- and 5-factor
> interactions? If so, your only option is to get more data.
>
>
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary): +61 7 3826 7304
> Mobile: +61 4 8819 4402
> Home Phone: +61 7 3286 7700
> mailto:Bill.Venables at csiro.au
> http://www.cmis.csiro.au/bill.venables/
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Mikhail Spivakov
> Sent: Wednesday, 25 June 2008 9:31 AM
> To: r-help at r-project.org
> Subject: [R] logistic regression
>
>
> Hi everyone,
>
> I'm sorry if this turns out to be more a statistical question than one
> specifically about R - but would greatly appreciate your advice anyway.
>
> I've been using a logistic regression model to look at the relationship
> between a binary outcome (say, the odds of picking n white balls from a
> bag
> containing m balls in total) and a variety of other binary parameters:
>
> _________________________________________________________________
>
>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> summary(a.fit)
>
> glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family =
> binomial(link = "logit"), data = a)
>
> Deviance Residuals:
> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> Coefficients:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -0.69751 0.02697 -25.861 <2.00E-16 ***
> A -0.02911 0.05451 -0.534 0.593335
> B 0.39842 0.06871 5.798 6.70E-09 ***
> C 0.829 0.06745 12.29 <2.00E-16 ***
> D 0.05928 0.11133 0.532 0.594401
> A:B -0.44053 0.13807 -3.191 0.001419 **
> A:C -0.49596 0.13664 -3.63 0.000284 ***
> B:C -0.62194 0.14164 -4.391 1.13E-05 ***
> A:D -0.4031 0.2279 -1.769 0.076938 .
> B:D -0.60238 0.25978 -2.319 0.020407 *
> C:D -0.58467 0.27195 -2.15 0.031558 *
> A:B:C 0.5006 0.27364 1.829 0.067335 .
> A:B:D 0.51868 0.4683 1.108 0.268049
> A:C:D 0.32882 0.51226 0.642 0.520943
> B:C:D 0.56301 0.49903 1.128 0.259231
> A:B:C:D -0.32115 0.87969 -0.365 0.715059
>
> ---
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 2.2185e+02 on 15 degrees of freedom
> Residual deviance: 1.0385e-12 on 0 degrees of freedom
> AIC: 124.50
>
> Number of Fisher Scoring iterations: 3
>
> _________________________________________________________________
>
> This seems to produce sensible results given the actual data.
> However, there are actually three types of balls in the experiment and I
> need to model the relationship between the odds of picking each of the
> type
> and the parameters A,B,C,D. So what I do now is split the initial data
> table
> and just run glm three times:
>
>>all
>
> [fictional data]
>
> TYPE WHITE ALL A B C D
> a 100 400 1 0 0 0
> b 200 600 1 0 0 0
> c 10 300 1 0 0 0
> ....
> a 30 100 1 1 1 1
> b 50 200 1 1 1 1
> c 20 120 1 1 1 1
>
>> a<-all[all$type=="a",]
>> b<-all[all$type=="b",]
>> c<-all[all$type=="c",]
>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>
> But it seems to me that I should be able to incorporate TYPE into the
> model.
>
> Something like:
>
>>summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W
> HITE,ALL-WHITE)~TYPE*A*B*C*D))
>
> [please see the output below]
>
> However, when I do this, it does not seem to give an expected result.
> Is this not the right way to do it?
> Or this is actually less powerful than running the three models
> separately?
>
> Will greatly appreciate your advice!
>
> Many thanks
> Mikhail
>
> -----
>
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -8.90E-01 1.91E-02 -46.553 <2.00E-16
> ***
> TYPE1 1.93E-01 2.47E-02 7.822 5.18E-15 ***
> TYPE2 1.19E+00 2.42E-02 49.108 <2.00E-16 ***
> A 1.89E-01 3.34E-02 5.665 1.47E-08 ***
> B 1.60E-01 4.41E-02 3.627 0.000286 ***
> C 2.24E-02 4.91E-02 0.455 0.64906
> D 1.96E-01 6.58E-02 2.982 0.002868 **
> TYPE1:A -2.19E-01 4.59E-02 -4.759 1.95E-06 ***
> TYPE2:A -9.08E-01 4.50E-02 -20.178 <2.00E-16 ***
> TYPE1:C 2.39E-01 5.93E-02 4.022 5.77E-05 ***
> TYPE2:B -1.82E+00 6.46E-02 -28.178 <2.00E-16 ***
> A:B -2.26E-01 8.52E-02 -2.649 0.008066 **
> TYPE1:C 8.07E-01 6.27E-02 12.87 <2.00E-16 ***
> TYPE2:C -2.51E+00 7.83E-02 -32.039 <2.00E-16 ***
> A:C -1.70E-01 9.51E-02 -1.783 0.074512 .
> B:C -3.01E-01 1.12E-01 -2.698 0.006985 **
> TYPE1:D -1.37E-01 9.20E-02 -1.489 0.136548
> TYPE2:D -1.13E+00 9.19E-02 -12.329 <2.00E-16 ***
> A:D -2.11E-01 1.27E-01 -1.655 0.097953 .
> B:D -2.15E-01 1.55E-01 -1.387 0.165472
> C:D -5.51E-01 2.76E-01 -1.997 0.045829 *
> TYPE1:A:B -2.15E-01 1.17E-01 -1.84 0.065714
> .
>
>
> TYPE2:A:B 7.21E-01 1.28E-01 5.635 1.75E-08
> ***
> TYPE1:A:C -3.26E-01 1.24E-01 -2.643 0.008221
> **
> TYPE2:A:C 9.70E-01 1.53E-01 6.36 2.02E-10
> ***
> TYPE1:B:C -3.21E-01 1.38E-01 -2.321 0.020313
> *
> TYPE2:B:C 1.35E+00 1.89E-01 7.133 9.85E-13
> ***
> A:B:C 1.80E-01 2.11E-01 0.852 0.394425
> TYPE1:A:D -1.92E-01 1.83E-01 -1.05 0.293758
> TYPE2:A:D 6.76E-01 1.80E-01 3.75 0.000177
> ***
> TYPE1:B:D -3.87E-01 2.16E-01 -1.796 0.072443
> .
> TYPE2:B:D 1.09E+00 2.30E-01 4.709 2.49E-06
> ***
> A:B:D 1.92E-01 2.73E-01 0.702 0.482512
> TYPE1:C:D -3.33E-02 3.18E-01 -0.105 0.916465
> TYPE2:C:D 1.20E-01 5.05E-01 0.238 0.811914
> A:C:D -7.37E+00 1.74E+04 -4.23E-04 0.999663
> B:C:D 3.14E-01 4.92E-01 0.638 0.523254
> TYPE1:A:B:C 3.21E-01 2.64E-01 1.218 0.223336
> TYPE2:A:B:C -8.43E-01 3.59E-01 -2.351 0.018747
> *
> TYPE1:A:B:D 3.27E-01 3.84E-01 0.85 0.3952
> TYPE2:A:B:D -6.59E-01 4.08E-01 -1.617 0.105883
> TYPE1:A:C:D 7.69E+00 1.74E+04 4.42E-04 0.999648
>
> TYPE2:A:C:D -1.60E+01 3.48E+04 -4.58E-04 0.999634
>
> TYPE1:B:C:D 2.49E-01 5.70E-01 0.437 0.662288
> TYPE2:B:C:D -7.08E-01 8.97E-01 -0.789 0.430007
> A:B:C:D 9.08E-03 2.47E+04 3.67E-07 1
> TYPE1:A:B:C:D -3.30E-01 2.47E+04 -1.34E-05 0.999989
> TYPE2:A:B:C:D 1.10E+00 4.94E+04 2.22E-05 0.999982
> --
> View this message in context:
> http://www.nabble.com/logistic-regression-tp18102137p18102137.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/logistic-regression-tp18102137p18108808.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list