[R] logistic regression

Wed Jun 25 12:01:58 CEST 2008

Dear Bill,

Thanks for your reply!

I agree that the residual deviance suggests that the model odds = f(A,B,C,D)
is totally saturated given the chunk of data TYPE=="a". 
But since I've actually got three chunks of data (TYPE=={"a","b","c"}) that
together form the whole, my model really looks like this:

                                       f1(A,B,C,D) if TYPE=="a",
odds (SUCCESS|TYPE) =   f2(A,B,C,D) if TYPE=="b",
                                       f3(A,B,C,D) if TYPE=="c",                                      
odds(SUCCESS|TYPE=="a")+odds(SUCCESS|TYPE=="b")+odds(SUCCESS|TYPE=="c") = 1.

I reasoned that a better way of dealing with this would be to model odds =
f(TYPE, A,B,C,D), but I'm not sure if I was right. What do you think? 

Many thanks!
Mikhail

Bill.Venables wrote:
> 
> It looks like A*B*C*D is a complete, totally saturated model, (the
> residual deviance is effectively zero, and the residual degrees of
> freedom is exactly zero - this is a clue).  So when you try to put even
> more parameters into the model and even higher way interactions,
> something has to give.   
> 
> I find 3-factor interactions are about as much as I can think about
> without getting a bit giddy.  Do you really need 4- and 5-factor
> interactions?  If so, your only option is to get more data.
> 
> 
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary):  +61 7 3826 7304
> Mobile:                         +61 4 8819 4402
> Home Phone:                     +61 7 3286 7700
> mailto:Bill.Venables at csiro.au
> http://www.cmis.csiro.au/bill.venables/ 
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Mikhail Spivakov
> Sent: Wednesday, 25 June 2008 9:31 AM
> To: r-help at r-project.org
> Subject: [R] logistic regression
> 
> 
> Hi everyone,
> 
> I'm sorry if this turns out to be more a statistical question than one
> specifically about R - but would greatly appreciate your advice anyway.
> 
> I've been using a logistic regression model to look at the relationship
> between a binary outcome (say, the odds of picking n white balls from a
> bag
> containing m balls in total) and a variety of other binary parameters:
> 
> _________________________________________________________________
> 
>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> summary(a.fit)
> 
> glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family =
> binomial(link = "logit"), data = a)
> 
> Deviance Residuals: 
>  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
> 
> Coefficients:
> 	Estimate	Std.	Error	z value	Pr(>|z|)
> (Intercept)	-0.69751	0.02697	-25.861	<2.00E-16	***
> A	-0.02911	0.05451	-0.534	0.593335	
> B	0.39842	0.06871	5.798	6.70E-09	***
> C	0.829	0.06745	12.29	<2.00E-16	***
> D	0.05928	0.11133	0.532	0.594401	
> A:B	-0.44053	0.13807	-3.191	0.001419	**
> A:C	-0.49596	0.13664	-3.63	0.000284	***
> B:C	-0.62194	0.14164	-4.391	1.13E-05	***
> A:D	-0.4031	0.2279	-1.769	0.076938	.
> B:D	-0.60238	0.25978	-2.319	0.020407	*
> C:D	-0.58467	0.27195	-2.15	0.031558	*
> A:B:C	0.5006	0.27364	1.829	0.067335	.
> A:B:D	0.51868	0.4683	1.108	0.268049	
> A:C:D	0.32882	0.51226	0.642	0.520943	
> B:C:D	0.56301	0.49903	1.128	0.259231	
> A:B:C:D	-0.32115	0.87969	-0.365	0.715059	
> 
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 2.2185e+02  on 15  degrees of freedom
> Residual deviance: 1.0385e-12  on  0  degrees of freedom
> AIC: 124.50
> 
> Number of Fisher Scoring iterations: 3
> 
> _________________________________________________________________
> 
> This seems to produce sensible results given the actual data.
> However, there are actually three types of balls in the experiment and I
> need to model the relationship between the odds of picking each of the
> type
> and the parameters A,B,C,D. So what I do now is split the initial data
> table
> and just run glm three times:
> 
>>all
> 
> [fictional data]
> 
> TYPE WHITE ALL A B C D 
> a	100	400	1	0	0	0
> b	200	600	1	0	0	0
> c	10	300	1	0	0	0
> ....
> a	30	100	1	1	1	1
> b	50	200	1	1	1	1
> c	20	120	1	1	1	1
> 
>> a<-all[all$type=="a",]
>> b<-all[all$type=="b",]
>> c<-all[all$type=="c",]
>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
> 
> But it seems to me that I should be able to incorporate TYPE into the
> model. 
> 
> Something like:
> 
>>summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W
> HITE,ALL-WHITE)~TYPE*A*B*C*D))
> 
> [please see the output below]
> 
> However, when I do this, it does not seem to give an expected result.
> Is this not the right way to do it? 
> Or this is actually less powerful than running the three models
> separately?  
> 
> Will greatly appreciate your advice!
> 
> Many thanks
> Mikhail
> 
> -----
> 
> 	Estimate	Std.	Error	z value	Pr(>|z|)
> (Intercept)	-8.90E-01	1.91E-02	-46.553	<2.00E-16
> ***
> TYPE1	1.93E-01	2.47E-02	7.822	5.18E-15	***
> TYPE2	1.19E+00	2.42E-02	49.108	<2.00E-16	***
> A	1.89E-01	3.34E-02	5.665	1.47E-08	***
> B	1.60E-01	4.41E-02	3.627	0.000286	***
> C	2.24E-02	4.91E-02	0.455	0.64906	
> D	1.96E-01	6.58E-02	2.982	0.002868	**
> TYPE1:A	-2.19E-01	4.59E-02	-4.759	1.95E-06	***
> TYPE2:A	-9.08E-01	4.50E-02	-20.178	<2.00E-16	***
> TYPE1:C	2.39E-01	5.93E-02	4.022	5.77E-05	***
> TYPE2:B	-1.82E+00	6.46E-02	-28.178	<2.00E-16	***
> A:B	-2.26E-01	8.52E-02	-2.649	0.008066	**
> TYPE1:C	8.07E-01	6.27E-02	12.87	<2.00E-16	***
> TYPE2:C	-2.51E+00	7.83E-02	-32.039	<2.00E-16	***
> A:C	-1.70E-01	9.51E-02	-1.783	0.074512	.
> B:C	-3.01E-01	1.12E-01	-2.698	0.006985	**
> TYPE1:D	-1.37E-01	9.20E-02	-1.489	0.136548	
> TYPE2:D	-1.13E+00	9.19E-02	-12.329	<2.00E-16	***
> A:D	-2.11E-01	1.27E-01	-1.655	0.097953	.
> B:D	-2.15E-01	1.55E-01	-1.387	0.165472	
> C:D	-5.51E-01	2.76E-01	-1.997	0.045829	*
> TYPE1:A:B	-2.15E-01	1.17E-01	-1.84	0.065714
> .
> 
> 
> TYPE2:A:B	7.21E-01	1.28E-01	5.635	1.75E-08
> ***
> TYPE1:A:C	-3.26E-01	1.24E-01	-2.643	0.008221
> **
> TYPE2:A:C	9.70E-01	1.53E-01	6.36	2.02E-10
> ***
> TYPE1:B:C	-3.21E-01	1.38E-01	-2.321	0.020313
> *
> TYPE2:B:C	1.35E+00	1.89E-01	7.133	9.85E-13
> ***
> A:B:C	1.80E-01	2.11E-01	0.852	0.394425	
> TYPE1:A:D	-1.92E-01	1.83E-01	-1.05	0.293758	
> TYPE2:A:D	6.76E-01	1.80E-01	3.75	0.000177
> ***
> TYPE1:B:D	-3.87E-01	2.16E-01	-1.796	0.072443
> .
> TYPE2:B:D	1.09E+00	2.30E-01	4.709	2.49E-06
> ***
> A:B:D	1.92E-01	2.73E-01	0.702	0.482512	
> TYPE1:C:D	-3.33E-02	3.18E-01	-0.105	0.916465	
> TYPE2:C:D	1.20E-01	5.05E-01	0.238	0.811914	
> A:C:D	-7.37E+00	1.74E+04	-4.23E-04	0.999663	
> B:C:D	3.14E-01	4.92E-01	0.638	0.523254	
> TYPE1:A:B:C	3.21E-01	2.64E-01	1.218	0.223336	
> TYPE2:A:B:C	-8.43E-01	3.59E-01	-2.351	0.018747
> *
> TYPE1:A:B:D	3.27E-01	3.84E-01	0.85	0.3952	
> TYPE2:A:B:D	-6.59E-01	4.08E-01	-1.617	0.105883	
> TYPE1:A:C:D	7.69E+00	1.74E+04	4.42E-04	0.999648
> 
> TYPE2:A:C:D	-1.60E+01	3.48E+04	-4.58E-04	0.999634
> 
> TYPE1:B:C:D	2.49E-01	5.70E-01	0.437	0.662288
> TYPE2:B:C:D	-7.08E-01	8.97E-01	-0.789	0.430007
> A:B:C:D	9.08E-03	2.47E+04	3.67E-07	1
> TYPE1:A:B:C:D	-3.30E-01	2.47E+04	-1.34E-05	0.999989
> TYPE2:A:B:C:D	1.10E+00	4.94E+04	2.22E-05	0.999982
> -- 
> View this message in context:
> http://www.nabble.com/logistic-regression-tp18102137p18102137.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/logistic-regression-tp18102137p18108808.html
Sent from the R help mailing list archive at Nabble.com.