[R] a "simple" statistic question
Joshua Wiley
jwiley.psych at gmail.com
Mon Mar 22 16:41:59 CET 2010
Dear Xiang,
Unequal sample size is not a problem for t-tests. If I understand
correctly, you do not want to pool your data because you believe the
variance of individual factories is heterogenous. Are you willing to
pool the means? You could calculate the variance for factories
individually and then pool the variances using the weighted.mean()
function (variance of each factory weighted by its sample size minus
1). Then you could just compare the means between all the factories
from City A and B or Big and Small factories. Another option could be
to use an ANOVA (see ?aov). This should let you keep your data broken
down into subgroups.
If you have specific theories, I would also recommend looking into
using contrast weights. With contrasts, you would end up basically
doing a one-sample t-test but it would be testing whether your theory
(given by the weights you assigned) fit the data well. The nice thing
about it, is you can include a lot of predictions (e.g., that there
will be more good samples than bad samples and that big factories will
be better than small factories and that City A will be better than
City B) all in one test.
HTH,
Joshua
On Mon, Mar 22, 2010 at 7:47 AM, Xiang Gao <xianggao2006 at gmail.com> wrote:
> Hi, Please suggest a method to answer below questions:
>
>
> Factory_ID Factory_Location Factory_Size Total_Sample
> Good_Sample Fair_Sample Bad_Sample
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 1 City_A Big
> 100 90 10 10
> 2 City_A Big
> 120 55 35 30
> 3 City_A Small
> 80 40 25 15
>
> 4 City_A Small
> 75 50 15 10
> 5 City_B Big
> 150 80 30 40
> 6 City_B Big
> 120 55 25 40
> 7 City_B Big
> 125 40 80 5
> 8 City_B Big
> 100 60 25 15
> 9 City_B Small
> 70 45 15 10
> 10 City_B Small
> 85 65 5 15
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> (1) Is there a statistically significant different between City_A and City_B
> for the amount of Good_Quality_Sample that they produce?
> (2) Is there a statistically significant different between Big and Small
> factories for the amount of Good_Quality_Sample that they produce?
>
> I don't think that t-test works here because the Total_Sample (i.e., the
> total number of samples) from each factories is different.
> I don't like to pool data from individual factory together. For example, I
> don't like to pool Factory 1 and 2 together, because the variance among
> individual Factory can be quite big in real data.
>
>
> Thank you
>
> Xiang
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/
More information about the R-help
mailing list