[R] comparing proportions
Robert A LaBudde
ral at lcfltd.com
Fri Feb 11 07:15:53 CET 2011
You need to change models 2 and 3 to use ~ group
+ subject. You left subject out as a fixed factor.
At 05:17 PM 2/10/2011, array chip wrote:
>Robert, thank you!
>
>I tried all 3 models you suggested. Since each
>subject only has one line of data in my dataset,
>would including Subject as a factor in glm() or
>lm() lead to 0 df for resaiduals?
>
>Attached is my dataset and here is my version of the 3 models:
>
>test<-read.table("test.txt",sep='\t',header=T)
>library(lme4)
>
>## model 1: generalized mixed model
>fit1<-glmer(cbind(success,failure)~group+(1|subject),data=test,family=binomial)
>## model 2: generalized model
>fit2<-glm(cbind(success,failure)~group,data=test,family=binomial)
>## model 3: linear model
>fit3<-lm(prop.success~group,weights=success+failure,data=test)
> > fit1
>Generalized linear mixed model fit by the Laplace approximation
>Formula: cbind(success, failure) ~ group + (1 | subject)
> Data: test
> AIC BIC logLik deviance
> 54.75 57.89 -24.38 48.75
>Random effects:
> Groups Name Variance Std.Dev.
> subject (Intercept) 1.8256 1.3511
>Number of obs: 21, groups: subject, 21
>Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
>(Intercept) -0.6785 0.4950 -1.371 0.170
>groupgroup2 -0.7030 0.6974 -1.008 0.313
>
>
> > summary(fit2)
>
>Call:
>glm(formula = cbind(success, failure) ~ group, family = binomial,
> data = test)
>
>Deviance Residuals:
> Min 1Q Median 3Q Max
>-2.8204 -2.0789 -0.5407 1.0403 4.0539
>
>Coefficients:
> Estimate Std. Error z value Pr(>|z|)
>(Intercept) -0.4400 0.2080 -2.115 0.0344 *
>groupgroup2 -0.6587 0.3108 -2.119 0.0341 *
>---
>Signif. codes: 0 â***â 0.001 â**â 0.01
>â*â 0.05 â.â 0.1 â â 1
>
>(Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 78.723 on 20 degrees of freedom
>Residual deviance: 74.152 on 19 degrees of freedom
>
> > summary(fit3)
>
>Call:
>lm(formula = prop.success ~ group, data = test, weights = success +
> failure)
>
>Residuals:
> Min 1Q Median 3Q Max
>-1.1080 -0.7071 -0.2261 0.4754 1.9157
>
>Coefficients:
> Estimate Std. Error t value Pr(>|t|)
>(Intercept) 0.39175 0.08809 4.447 0.000276 ***
>groupgroup2 -0.14175 0.12364 -1.146 0.265838
>---
>Signif. codes: 0 â***â 0.001 â**â 0.01
>â*â 0.05 â.â 0.1 â â 1
>
>Residual standard error: 0.8676 on 19 degrees of freedom
>Multiple R-squared: 0.0647, Adjusted R-squared: 0.01548
>F-statistic: 1.314 on 1 and 19 DF, p-value: 0.2658
>
>
>
>As you can see, all 3 models gave quite
>different results, the model 2 (generalized
>linear model) gave significant p value while
>model 1 (generalized mixed model) and model 3
>(regular linear model) did not. Also as you can
>from the data, prop.success has some value
>outside the range 0.15 to 0.85, so maybe regular
>linear model may not be appropriate?
>
>
>
>Thank you,
>
>
>
>John
>
>
>
>
>
>
>From: Robert A LaBudde <ral at lcfltd.com>
>To: array chip <arrayprofile at yahoo.com>
>Cc: Bert Gunter <gunter.berton at gene.com>; R-help at stat.math.ethz.ch
>Sent: Thu, February 10, 2011 12:54:44 PM
>Subject: Re: [R] comparing proportions
>
>1. If you use a random effects model, you should
>make Subject the random factor. I.e., a random
>intercepts model with 1|Subject. Group is a
>fixed effect: You have only 2 groups. Even if
>you had more than 2 groups, treating Group as
>random would return a standard deviation, not a
>P-value as you wanted. Finally, I doubt you
>believe the groups used are meaningless, and
>only the population of groups is of interest.
>Instead you consider them special, so Group is a fixed effect.
>
>2. The number of observations for each Subject
>is the number of trials, which you previously
>indicated were 7 to 10 in the cases listed.
>
>3. If you have no interest in the Subject
>effect, you can use a fixed Subject factor
>instead with glm() instead of glmer() or other
>mixed model function. This is a good idea so
>long as the number of subjects is, say, less
>than 10. Otherwise a mixed model would be a better idea.
>
>I suggest you fit all three models to learn
>about what you're doing: 1) glmer() or
>equivalent, with cbind(successes, failures) ~
>1|Subject + Group; 2) glm() with
>cbind(successes, failures) ~ Subject + Group;
>and 3) lm(p ~ Subject + Group), where p is the
>proportion success for a particular subject and group.
>
>Then compare the results. They will probably all
>3 give the same conclusion to the hypothesis
>question about Group. I would guess the glmer()
>P-value will be larger, then the glm() and
>finally the lm(), but the last two may reverse.
>The lm() model may actually perform fairly well,
>as the Edgeworth series converges rapidly to
>normal for binomial distributions with p within
>0.15 to 0.85 and 10+ replicates, as I stated before.
>
>I'd be interested in seeing the results of these
>3 fits myself just for curiosity.
>
>At 01:21 PM 2/10/2011, array chip wrote:
> > Robert and Bert, thank you both very much for
> the response, really appreciated. I agree that
> using regular ANOVA (or regular t test) may not
> be wise during the normality issue. So I am
> debating between generalized linear model using
> glm(.., family=binomial) or generalized linear
> mixed effect model using glmer(...,
> family=binomial). I will forward to Robert an
> offline list email I sent to Bert about whether
> using (1|subject) versus (1|group) in mixed
> model specification. If using (1|group), both
> models will give me the same testing for fixed
> effects, which is what I am mainly interested
> in. So do I really need a mixed model here?
> >
> > Thanks again
> >
> > John
> >
> >
> > From: Bert Gunter <<mailto:gunter.berton at gene.com>gunter.berton at gene.com>
> > To: Robert A LaBudde <<mailto:ral at lcfltd.com>ral at lcfltd.com>
> > Cc: array chip <<mailto:arrayprofile at yahoo.com>arrayprofile at yahoo.com>
> > Sent: Thu, February 10, 2011 10:04:06 AM
> > Subject: Re: [R] comparing proportions
> >
> > Robert:
> >
> > Yes, exactly. In an offlist email exchange, he clarified this for me,
> > and I suggested exactly what you did, also with the cautions that his
> > initial ad hoc suggestions were unwise. His subsequent post to R-help
> > and the sig-mixed-models lists were the result, although he appears to
> > have specified the model incorrectly in his glmer function (as
> > (1|Group) instead of (1|subject).
> >
> > Cheers,
> > Bert
> >
> > On Thu, Feb 10, 2011 at 9:55 AM, Robert A
> LaBudde <<mailto:ral at lcfltd.com><mailto:ral at lcfltd.com>ral at lcfltd.com> wrote:
> > > prop.test() is applicable to a binomial
> experiment in each of two classes.
> > >
> > > Your experiment is binomial only at the subject level. You then have
> > > multiple subjects in each of your groups.
> > >
> > > You have a random factor "Subjects" that must be accounted for.
> > >
> > > The best way to analyze is a generalized
> linear mixed model with a binomial
> > > distribution family and a logit or probit link. You will probably have to
> > > investigate overdispersion. If you have a small number of subjects, and
> > > don't care about the among-subject effect, you can model them as fixed
> > > effects and use glm() instead.
> > >
> > > Your original question, I believe, related to doing an ANOVA assuming
> > > normality. In order for this to work with
> this kind of proportion problem,
> > > you generally won't get good results unless the number of replicates per
> > > subject is 12 or more, and the proportions
> involved are within 0.15 to 0.85.
> > > Otherwise you will have biased confidence
> intervals and significance tests.
> > >
> > >
> > >
> > > At 07:51 PM 2/9/2011, array chip wrote:
> > >>
> > >> Content-type: text/plain
> > >> Content-disposition: inline
> > >> Content-length: 2969
> > >>
> > >> Hi Bert,
> > >>
> > >> Thanks for your reply. If I understand correctly, prop.test() is not
> > >> suitable to
> > >> my situation. The input to prop.test() is 2 numbers for each group (# of
> > >> success
> > >> and # of trials, for example, groups 1 has 5 success out of 10 trials;
> > >> group 2
> > >> has 3 success out of 7 trials; etc. prop.test() tests whether the
> > >> probability of
> > >> success is the same across groups.
> > >>
> > >> In my case, each group has several
> subjects and each subject has 2 numbers
> > >> (#
> > >> success and # trials). So
> > >>
> > >> for group 1:
> > >> subject 1: 5 success, 10 trials
> > >> subject 2: 3 success, 8 trials
> > >> :
> > >> :
> > >>
> > >> for group 2:
> > >> subject a: 7 success, 9 trials
> > >> subject b: 6 success, 7 trials
> > >> :
> > >> :
> > >>
> > >> I want to test whether the probability of success in group 1 is the same
> > >> as in
> > >> group 2. It's like comparing 2 groups of samples using t test, what I am
> > >> uncertain about is that whether regular t
> test (or non-pamametric test) is
> > >> still
> > >> appropriate here when the response variable is actually proportions.
> > >>
> > >> I guess prop.test() can not be used with my dataset, or I may be wrong?
> > >>
> > >> Thanks
> > >>
> > >> John
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> ________________________________
> > >> From: Bert Gunter
> <<mailto:gunter.berton at gene.com><mailto:gunter.berton at gene.com>gunter.berton at gene.com>
> > >>
> > >> Sent: Wed, February 9, 2011 3:58:05 PM
> > >> Subject: Re: [R] comparing proportions
> > >>
> > >> 1. Is this a homework problem?
> > >>
> > >> 2. ?prop.test
> > >>
> > >> 3. If you haven't done so already, get and consult a basic statistical
> > >> methods book to help you with questions such as this.
> > >>
> > >> -- Bert
> > >>
> > >>
> > >> > Hi, I have a dataset that has 2 groups
> of samples. For each sample, then
> > >> > response measured is the number of success (no.success) obatined with
> > >> > the
> > >> >number
> > >> > of trials (no.trials). So a porportion
> of success (prpop.success) can be
> > >> > computed as no.success/no.trials. Now
> the objective is to test if there
> > >> > is a
> > >> > statistical significant difference in
> the proportion of success between
> > >> > the 2
> > >> > groups of samples (say n1=20, n2=30).
> > >> >
> > >> > I can think of 2 ways to do the test:
> > >> >
> > >> > 1. regular t test based on the variable prop.success
> > >> > 2. Mann-Whitney test based on the variable prop.success
> > >> > 2. do a binomial regression as:
> > >> > fit<-glm(cbind(no.success,no.trials-no.success) ~ group, data=data,
> > >> > family=binomial)
> > >> > anova(fit, test='Chisq')
> > >> >
> > >> > My questions is:
> > >> > 1. Is t test appropriate for comparing 2 groups of proportions?
> > >> > 2. how about Mann-Whitney non-parametric test?
> > >> > 3. Among the 3, which technique is more appropriate?
> > >> > 4. any other technique you can suggest?
> > >> >
> > >> > Thank you,
> > >> >
> > >> > John
> > >> >
> > >> >
> > >> >
> > >> > [[alternative HTML version deleted]]
> > >> >
> > >> >
> > >> > ______________________________________________
> > >> >
> <mailto:R-help at r-project.org><mailto:R-help at r-project.org>R-help at r-project.org
> mailing list
> > >> >
> <<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> > >> > PLEASE do read the posting guide
> > >> >
> <http://www.R-project.org/posting-guide.html><http://www.r-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> > >> > and provide commented, minimal, self-contained, reproducible code.
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Bert Gunter
> > >> Genentech Nonclinical Biostatistics
> > >>
> > >>
> > >>
> > >>
> > >> [[alternative HTML version deleted]]
> > >>
> > >>
> > >> ______________________________________________
> > >>
> <mailto:R-help at r-project.org><mailto:R-help at r-project.org>R-help at r-project.org
> mailing list
> > >>
> <<https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >>
> <<http://www.r-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html>http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >
> > > ================================================================
> > > Robert A. LaBudde, PhD, PAS, Dpl.
> ACAFS e-mail: <mailto:ral at lcfltd.com><mailto:ral at lcfltd.com>ral at lcfltd.com
> > > Least Cost Formulations,
> Ltd. URL: <http://lcfltd.com/><http://lcfltd.com/>http://lcfltd.com/
> > > 824 Timberlake Drive Tel: 757-467-0954
> > > Virginia Beach, VA 23464-3239 Fax: 757-467-2947
> > >
> > > "Vere scire est per causas scire"
> > > ================================================================
> > >
> > >
> >
> >
> >
> > --
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > 467-7374
> >
> <http://devo.gene.com/groups/devo/depts/ncb/home.shtml><http://devo.gene.com/groups/devo/depts/ncb/home.shtml>http://devo.gene.com/groups/devo/depts/ncb/home.shtml
>
>================================================================
>Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail:
><mailto:ral at lcfltd.com>ral at lcfltd.com
>Least Cost Formulations, Ltd. URL:
><http://lcfltd.com/>http://lcfltd.com/
>824 Timberlake Drive Tel: 757-467-0954
>Virginia Beach, VA 23464-3239 Fax: 757-467-2947
>
>"Vere scire est per causas scire"
>================================================================
>
>
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
================================================================
More information about the R-help
mailing list