[R] comparing matched proportions using glm

Mon Oct 8 13:51:38 CEST 2007

Thanks very much for your reply Chuck, I have a quick follow up
question. You mention putting the data into a 2x2x3 for log-linear
model, however my lists have many more than 3 strata, actually
thousands. I am trying to work out whether the proportions in list 1
tend to be equal to the proportions in list 2, in a kind of matched
pairs proportional test. Is the log-linear approach possible with a
2x2x1000 table, for example? Or would it be better to pursue the glm
route, using the surrogate Poisson model, as you suggested?

Best regards,

Corry

>-----Original Message-----
>From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu] 
>Sent: 04 October 2007 21:47
>To: Corry Gellatly
>Cc: r-help at r-project.org
>Subject: Re: [R] comparing matched proportions using glm
>
>On Thu, 4 Oct 2007, Corry Gellatly wrote:
>
>>
>> Dear R users,
>>
>> Is it possible to use a generalized linear model to do a binomial 
>> comparison of one list of proportions with a matched list of 
>> proportions to test for a difference?
>>
>> So, for example:
>>
>> list 1  		list 2
>>
>> a1  |  b1        	a2 |  b2
>>
>> 3   |  4          7  |  9
>> 6   |  7          5  |  1
>> 9   |  1          3  |  1
>>
>>
>> I want to compare list 1 with list 2 and the samples are matched.
>
>
>Meaning that
>
> 	 3     4          7    9
>
>are the _counts_ in one stratum of three in all?
>
>And you have an hypothesis that claims the proportions are 
>equal in each stratum??
>
>The obvious candidate for that setup is a log-linear model for 
>the counts in a 2 by 2 by 3 table.
>
>See
>
> 	?loglin
>
>and
>
> 	?loglm (in MASS)
>
>and the refernces therein.
>
>You can do this type of work in glm() if you understand 
>surrogate Poisson models as outlined in
>
>McCullagh P. and Nelder, J. A. (1989) Generalized Linear 
>Models. London: 
>Chapman and Hall.
>
>HTH,
>
>Chuck
>
>> Obviously, I could add the columns and do a binomial test, i.e.
>> prop.test(c(18,15),c(30,26)), however, I have a large 
>dataset so this 
>> would reduce the power of my analysis. I could compare the 
>ratios, i.e.
>> a1/(a1+b1) compared to a2/(a2+b2) for the samples in each list, 
>> however, this does not account for the difference in sample sizes 
>> between samples in each list.
>>
>> I have tried a glm where I bind a2 and b2 as the y variable, i.e.
>> y<-cbind(a2,b2) and also bind a1 and b1 as the x variable, i.e.
>> y<-cbind(a1,b1) and run <-glm(y~x,binomial)
>>
>> I get this type of output:
>>
>> 	Call:
>> 	glm(formula = y ~ x, family = binomial)
>>
>> 	Deviance Residuals:
>> 	     Min        1Q    Median        3Q       Max
>> 	-3.20426  -0.72686  -0.01822   0.68320   4.05035
>>
>> 	Coefficients:
>> 	             Estimate Std. Error z value Pr(>|z|)
>> 	(Intercept)  0.178369   0.186421   0.957    0.339
>> 	xa1     	 0.008109   0.017430   0.465    0.642
>> 	xb1		-0.026666   0.018153  -1.469    0.142
>>
>> 	(Dispersion parameter for binomial family taken to be 1)
>>
>> 	    Null deviance: 565.14  on 467  degrees of freedom
>> 	Residual deviance: 559.69  on 465  degrees of freedom
>> 	AIC: 1883.3
>>
>> 	Number of Fisher Scoring iterations: 3
>>
>>
>> Is this output meaningful? It seems that y is not compared directly 
>> with x, but rather compared with a1 and b1, which is not intended?
>>
>> I wonder if this is a suitable approach to the problem? I'll be very 
>> grateful for any advice or suggestions.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>Charles C. Berry                            (858) 534-2098
>                                             Dept of 
>Family/Preventive Medicine
>E mailto:cberry at tajo.ucsd.edu	            UC San Diego
>http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San 
>Diego 92093-0901
>
>
>