Joshua Wiley jwiley.psych at gmail.com
Mon Mar 22 16:41:59 CET 2010

```Dear Xiang,

Unequal sample size is not a problem for t-tests.  If I understand
correctly, you do not want to pool your data because you believe the
variance of individual factories is heterogenous.  Are you willing to
pool the means?  You could calculate the variance for factories
individually and then pool the variances using the weighted.mean()
function (variance of each factory weighted by its sample size minus
1).  Then you could just compare the means between all the factories
from City A and B or Big and Small factories.  Another option could be
to use an ANOVA (see ?aov).  This should let you keep your data broken
down into subgroups.

If you have specific theories, I would also recommend looking into
using contrast weights.  With contrasts, you would end up basically
doing a one-sample t-test but it would be testing whether your theory
(given by the weights you assigned) fit the data well.  The nice thing
about it, is you can include a lot of predictions (e.g., that there
will be more good samples than bad samples and that big factories will
be better than small factories and that City A will be better than
City B) all in one test.

HTH,

Joshua

On Mon, Mar 22, 2010 at 7:47 AM, Xiang Gao <xianggao2006 at gmail.com> wrote:
>
>
> Factory_ID   Factory_Location   Factory_Size       Total_Sample
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 1                  City_A                      Big
> 100                      90                        10                 10
> 2                  City_A                      Big
> 120                     55                        35                 30
> 3                  City_A                      Small
> 80                      40                         25                15
>
> 4                  City_A                      Small
> 75                      50                         15                10
> 5                  City_B                      Big
> 150                      80                         30                40
> 6                  City_B                      Big
> 120                      55                         25                40
> 7                  City_B                      Big
> 125                      40                         80                  5
> 8                  City_B                      Big
> 100                     60                         25                15
> 9                  City_B                      Small
> 70                       45                         15                 10
> 10                City_B                      Small
> 85                       65                           5                 15
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> (1) Is there a statistically significant different between City_A and City_B
> for the amount of Good_Quality_Sample that they produce?
> (2) Is there a statistically significant different between Big and Small
> factories for the amount of Good_Quality_Sample that they produce?
>
> I don't think that t-test works here because the Total_Sample (i.e., the
> total number of samples) from each factories is different.
> I don't like to pool data from individual factory together. For example, I
> don't like to pool Factory 1 and 2 together, because the variance among
> individual Factory can be quite big in real data.
>
>
> Thank you
>
> Xiang
>
>        [[alternative HTML version deleted]]
>
>

