[R] comparing matched proportions using glm
corry.gellatly at newcastle.ac.uk
Mon Oct 8 13:51:38 CEST 2007
Thanks very much for your reply Chuck, I have a quick follow up
question. You mention putting the data into a 2x2x3 for log-linear
model, however my lists have many more than 3 strata, actually
thousands. I am trying to work out whether the proportions in list 1
tend to be equal to the proportions in list 2, in a kind of matched
pairs proportional test. Is the log-linear approach possible with a
2x2x1000 table, for example? Or would it be better to pursue the glm
route, using the surrogate Poisson model, as you suggested?
>From: Charles C. Berry [mailto:cberry at tajo.ucsd.edu]
>Sent: 04 October 2007 21:47
>To: Corry Gellatly
>Cc: r-help at r-project.org
>Subject: Re: [R] comparing matched proportions using glm
>On Thu, 4 Oct 2007, Corry Gellatly wrote:
>> Dear R users,
>> Is it possible to use a generalized linear model to do a binomial
>> comparison of one list of proportions with a matched list of
>> proportions to test for a difference?
>> So, for example:
>> list 1 list 2
>> a1 | b1 a2 | b2
>> 3 | 4 7 | 9
>> 6 | 7 5 | 1
>> 9 | 1 3 | 1
>> I want to compare list 1 with list 2 and the samples are matched.
> 3 4 7 9
>are the _counts_ in one stratum of three in all?
>And you have an hypothesis that claims the proportions are
>equal in each stratum??
>The obvious candidate for that setup is a log-linear model for
>the counts in a 2 by 2 by 3 table.
> ?loglm (in MASS)
>and the refernces therein.
>You can do this type of work in glm() if you understand
>surrogate Poisson models as outlined in
>McCullagh P. and Nelder, J. A. (1989) Generalized Linear
>Chapman and Hall.
>> Obviously, I could add the columns and do a binomial test, i.e.
>> prop.test(c(18,15),c(30,26)), however, I have a large
>dataset so this
>> would reduce the power of my analysis. I could compare the
>> a1/(a1+b1) compared to a2/(a2+b2) for the samples in each list,
>> however, this does not account for the difference in sample sizes
>> between samples in each list.
>> I have tried a glm where I bind a2 and b2 as the y variable, i.e.
>> y<-cbind(a2,b2) and also bind a1 and b1 as the x variable, i.e.
>> y<-cbind(a1,b1) and run <-glm(y~x,binomial)
>> I get this type of output:
>> glm(formula = y ~ x, family = binomial)
>> Deviance Residuals:
>> Min 1Q Median 3Q Max
>> -3.20426 -0.72686 -0.01822 0.68320 4.05035
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) 0.178369 0.186421 0.957 0.339
>> xa1 0.008109 0.017430 0.465 0.642
>> xb1 -0.026666 0.018153 -1.469 0.142
>> (Dispersion parameter for binomial family taken to be 1)
>> Null deviance: 565.14 on 467 degrees of freedom
>> Residual deviance: 559.69 on 465 degrees of freedom
>> AIC: 1883.3
>> Number of Fisher Scoring iterations: 3
>> Is this output meaningful? It seems that y is not compared directly
>> with x, but rather compared with a1 and b1, which is not intended?
>> I wonder if this is a suitable approach to the problem? I'll be very
>> grateful for any advice or suggestions.
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
>Charles C. Berry (858) 534-2098
> Dept of
>E mailto:cberry at tajo.ucsd.edu UC San Diego
>http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San
More information about the R-help