[BioC] Good practice for choosing biological groups to include in array analysis

Fri Aug 17 12:46:50 CEST 2012

Dear listers,

Apologies for asking a general statistics question here, but maybe 
someone will be willing to help me.

I'm analysing an Illumina dataset that comes from 4 biological groups. 
For simplicity let's call them:
Control
Drug A
Drug B - Concentration 1
Drug B - concentration 2

Each group has 4 biological replicates and they were hybridised across 2 
chips so that each chip had 2 samples from each group. In terms of 
biological questions asked, Drug A is being compared to Control. And two 
concentrations of Drug B are compared to control as well as to each 
other. So Drug A is never compared to Drug B.

As far as I understand, for comparing Drug A to Control I have two options:

1) Extract data for Drug A and Control from the dataset and run a linear 
model on those;
2) Run a linear model on samples from all groups and set up contrasts to 
compare Drug A to Control.

Naturally, the second option has a higher number of experimental units, 
which brings variation down and results in more differentially expressed 
genes being detected between Drug A and Control.

Now my question is, is there anything wrong (ethically, statistically, 
etc) with the second option?

Many thanks for your help!

Aliaksei.