[BioC] Affy normalization question
Mark W Kimpel
mwkimpel at gmail.com
Sat Dec 22 23:08:45 CET 2007
Jim,
My understanding is that our lab normally randomizes by
1. treatment
2. RNA extraction
3. labeling
4. hybridization
In addition, we sometimes have multiple brain regions, and, for the
purpose of the MA run, each region is treated as an independent
experiment, thus there is no randomization across brain regions for the
above factors.
My question arises because of two recent situations. First, in one
experiment, for a reason not clear to me, the labeling and hybridization
groups were combined and there is a clear batch effect when this
labeling-hybridization factor is put into Limma. In such a case, would
separate normalization be suggested? It will make the batch effect
larger, but would seem to be addressed by using the batch-effect as a
factor.
Secondly, in another experiment I need to perform an analysis across 5
brain regions to look for overall gene expression differences resulting
from genetic differences between strains. In that experiment the 4
factors mentioned at the beginning were randomized for so there is no
batch effect within-brain region, but there is across brain region. In
this experiment I am not trying to find differences across brain
regions, which would be impossible to separate out from a batch effect,
but rather between two treatments that are independent of brain region.
One way I have done this in the past has been to simply average all 5
brain regions together to come up with an average-brain expression
measure, but, I wonder if it would be better to put brain region in as a
factor. Regardless of whether I average or not, I need to decide whether
to normalize all brain regions together or, because they were run as
separate MA experiments, to normalize them individually.
Really, the question seems to be whether RMA should be used on a group
of CEL files in the presence of a non-chip related batch effect, if so,
will it make a batch effect "go away" (not from my experience), and then
if not, how to incorporate the batch effect in a model.
Finally, I realize that by randomizing at each step mentioned at the
top, one spreads any variance out so that it cannot be picked up with a
batch effect. With the "n" we usually use, if one were to take each of
the 4 factors into account one usually would run out of degrees of
freedom. Nevertheless the variance induced at each step of the wet-lab
is there, it is just not apparent and presumably doesn't induce bias. It
does, however, decrease power, and I wonder if it wouldn't be better to
block by treatment, so that equal numbers from each treatment are in a
group, but that then each group is processed totally together. There the
batch effect would be large, but it would be present as only one
factor, which with large enough "n" one could take into account in a
statistical model. That, it seems, might increase power to detect
differential expression. Maybe this is counter-intuitive, and would
probably only work if "n" were large enough to provide enough degrees of
freedom, but it makes some sense to me. Am I nuts? (many people think
so, so don't be shy about saying so ;) ).
Thanks so much for your helpful input,
Mark
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)
mwkimpel<at>gmail<dot>com
******************************************************************
James W. MacDonald wrote:
> Hi Mark,
>
> Mark W Kimpel wrote:
>> Not infrequently on this list the question arises as to how to perform
>> RMA on a large number of CEL files. The simple answer, of course, is
>> to use "justRMA" or buy more RAM.
>>
>> As I have learned more about the wet-lab side of microarray
>> experiments it has come to my attention that there is a technical
>> limitation in our lab as to how many chips can actually be run at one
>> time and that there is a substantial batch effect between batches.
>>
>> So, in my case at least, it seems to me that it would be incorrect to
>> normalize 60 CEL files at once when in fact they have been run in 4
>> batches of 16. Would it not be better to normalize them separately,
>> within-batch, and then include a batch effect in an analytical model?
>
> Ideally you would randomize the samples when you are processing them (we
> randomize at four different steps) so you don't have batches that are
> processed together all the way through.
>
> Whether or not you fit a batch effect in a linear model depends on how
> the samples were processed. If the lab processed all the same type of
> samples in each of the batches (please say they didn't), then any batch
> effect will be aliased with the sample types and fitting an effect won't
> really help.
>
> If the batches were at least semi-randomized, then with 60 samples you
> won't be losing that many degrees of freedom, and it probably won't hurt
> to do so, and it just might help.
>
>>
>> Is my situation unique or, in fact, is this the way most MA wet-labs
>> are set up? If the latter is correct, should the recommendation not be
>> to use justRMA on 80 CEL files if they have been run in batches?
>
> Regardless of how the lab is set up, once you get to large sample sets
> there will always be batches. If you do proper randomization of the
> samples during processing IMO there should be no need to do any
> post-processing adjustments for the batches.
>
> Best,
>
> Jim
>
>
>>
>> Thanks,
>> Mark
>
More information about the Bioconductor
mailing list