[R] How to test a difference in ratios of count data in R
David Winsemius
dwinsemius at comcast.net
Wed Sep 28 22:54:46 CEST 2016
> On Sep 28, 2016, at 9:49 AM, Greg Snow <538280 at gmail.com> wrote:
>
> There are multiple ways of doing this, but here are a couple.
>
> To just test the fixed effect of treatment you can use the glm function:
>
> test <- read.table(text="
> replicate treatment n X
> 1 A 32 4
> 1 B 33 18
> 2 A 20 6
> 2 B 21 18
> 3 A 7 0
> 3 B 8 4
> ", header=TRUE)
>
> fit1 <- glm( cbind(X,n-X) ~ treatment, data=test, family=binomial)
> summary(fit1)
>
> Note that the default baseline value may differ between R and SAS,
> which would result in a reversed sign on the slope coefficient (and
> different intercept).
>
> To include replicate as a random effect you need an additional
> package, here I use lme4 and the glmer function:
>
> library(lme4)
> fit2 <- glmer( cbind(X, n-X) ~ treatment + (1|replicate), data=test,
> family=binomial)
> summary(fit2)
>
>
>
> On Tue, Sep 27, 2016 at 9:03 PM, Shuhua Zhan <szhan at uoguelph.ca> wrote:
>> Hello R-experts,
>> I am interested to determine if the ratio of counts from two groups differ across two distinct treatments. For example, we have three replicates of treatment A, and three replicates of treatment B. For each treatment, we have counts X from one group and counts Y from another group. My understanding is that that GLIMMIX procedure in SAS can calculate whether the ratio of counts in one group (X/X+Y) significantly differs between treatments.
>>
>> I think this is the way you do it in SAS. The replicate and treatment variables are self-explanatory. The first number (n) refers to the total counts X + Y; the second number (X) refers to the counts X. Is there a way to do this in R? Since we have 20,000 datasets to be tested, it may be easier to retrive the significant test as the given dataset below and its p>F value and mean ratios of treatments in R than SAS.
>>
>>
>> data test;
>> input replicate treatment$ n X;
>> datalines;
>> 1 A 32 4
>> 1 B 33 18
>> 2 A 20 6
>> 2 B 21 18
>> 3 A 7 0
>> 3 B 8 4
>> ;
>>
Greg has already shown you how that is done in R and how to do logistic regression:
# I usually think of Poisson regression when I hear a desire is to model ratios of counts that have a denominator. The log(sample_size) is supplied as an offset to correct for the variation in size of subsamples.
fit1 <- glm( X ~ treatment+offset(log(n)), data=test, family=poisson)
summary(fit1)
# And the lme4 analogue with replication:
library(lme4)
fit2 <- glmer( X ~ treatment + offset(log(n))+ (1|replicate), data=test,
family=poisson)
summary(fit2)
#----output----
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation)
[glmerMod]
Family: poisson ( log )
Formula: X ~ treatment + offset(log(n)) + (1 | replicate)
Data: test
AIC BIC logLik deviance df.resid
31.9 31.3 -13.0 25.9 3
Scaled residuals:
Min 1Q Median 3Q Max
-1.0504 -0.4146 -0.3487 0.3956 1.0791
Random effects:
Groups Name Variance Std.Dev.
replicate (Intercept) 0.03159 0.1777
Number of obs: 6, groups: replicate, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7875 0.3372 -5.301 1.15e-07 ***
treatmentB 1.3365 0.3529 3.787 0.000152 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
treatmentB -0.838
Compare with the binomial model:
#============
fitBin <- glmer( cbind(X,n-X) ~ treatment + (1|replicate), data=test,
family=binomial)
coef(fitBin)
#----
$replicate
(Intercept) treatmentB
1 -2.0487694 2.364695
2 -0.9908556 2.364695
3 -2.1844435 2.364695
attr(,"class")
[1] "coef.mer"
#-----
summary(fitBin)
#---------
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation)
[glmerMod]
Family: binomial ( logit )
Formula: cbind(X, n - X) ~ treatment + (1 | replicate)
Data: test
AIC BIC logLik deviance df.resid
30.1 29.4 -12.0 24.1 3
Scaled residuals:
Min 1Q Median 3Q Max
-0.88757 -0.35065 -0.03137 0.26897 0.67505
Random effects:
Groups Name Variance Std.Dev.
replicate (Intercept) 0.4123 0.6421
Number of obs: 6, groups: replicate, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7442 0.5438 -3.208 0.00134 **
treatmentB 2.3647 0.4741 4.988 6.11e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
treatmentB -0.568
The binomial model has a logit link. Your glimmix procedure appears to have a gaussian/normal distributional assumption and an identity link by default. If we run this using those assumptions in lme4::glmer we get these results (with a warning that in this case we can overlook since the results with lmer turned out to be identical)
#--------
fitNorm <- glmer( I(X/n) ~ treatment + (1|replicate), data=test,
family=gaussian)
#-------
Warning message:
In glmer(I(X/n) ~ treatment + (1 | replicate), data = test, family = gaussian) :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
> coef(fitNorm); summary(fitNorm)
$replicate
(Intercept) treatmentB
1 0.091096925 0.4925325
2 0.324579602 0.4925325
3 0.009323473 0.4925325
attr(,"class")
[1] "coef.mer"
Linear mixed model fit by REML ['lmerMod']
Formula: I(X/n) ~ treatment + (1 | replicate)
Data: test
REML criterion at convergence: -4.2
Scaled residuals:
Min 1Q Median 3Q Max
-0.7864 -0.4278 -0.1152 0.5143 0.8246
Random effects:
Groups Name Variance Std.Dev.
replicate (Intercept) 0.027895 0.16702
Residual 0.002356 0.04854
Number of obs: 6, groups: replicate, 3
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.14167 0.10042 1.411
treatmentB 0.49253 0.03963 12.427
Correlation of Fixed Effects:
(Intr)
treatmentB -0.197
That's (probably) the model to compare to your SAS results if my reading of the SAS Proc GLIMMIX manual page is correct.
--
David.
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> run;
>>
>> ods select lsmeans;
>> proc glimmix data=test;
>> class replicate treatment;
>> model X/n = treatment / solution;
>> random intercept / subject=replicate;
>> lsmeans treatment / cl ilink;
>> run;
>>
>> I appreciate your help in advance!
>> Joshua
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list