[BioC] RE: RMA normalization

Wed Sep 15 10:20:42 CEST 2004

Hi,

I've tried to get a discussion on this several times but have got very
few responses.

I'm looking at some data where the treatment has a very BIG effect, but
I don't think this is unusual it's just that a lot of people don't
realise it or ignore it. 

If we take the average Pearson correlation of treated versus untreated
as a crude indication of the number of changes (is this valid?) then in
our experiments this is 0.96. Comparing 25 treated versus 25 untreated
replicates (GCRMA, LIMMA gene-wise fdr corrected p<0.001) we get c.30%
of transcripts on the chip changing!

Looking at a couple of public datasets I don't think our treatment
effect (as indicated by the Pearson) is that unusual, it's just that we
have the statistical power to detect the changes. Also looking at the
changes, and considering the biology it seems reasonable to get these
changes.

In the discussions on RMA/GCRMA there are 2 assumptions discussed 
1)few genes changing - obviously not
2)equal # up and down - despite the huge amount of changes there are
only 20 more transcripts going up compared to down - so yes.

I've also looked at a number of control genes and can't find any real
bias, in fact there is quite a bit of (random?) variation, so if you
normalised on a few of these then you may get strange results...

Also it depends what you are looking for. Amongst the 25 replicates we
have different genotypes, and to look for differences here I GCRMA
normalise treated and untreated separately, but then don't make
comparisons untreated-treated, only between genotypes.

Finally, I will at some point try separate GCRMAs and then scaling. If
anyone has any scripts for mean, robust mean or median scaling a series
of separate exprs sets then I'd appreciate it.

Cheers,
Matt

Hello Hairong, Adai,
That suggestion was mine a few weeks ago. 
My thinking currently is that we may reasonably expect different cell
types
to have different distributions of RNA abundances; as an extreme
example,
some cells specialize in making one protein for export. Then it seems to
me
our best shot is to make the raw data comparable within each cell type,
and
to make the different cell types comparable per identical weight of RNA
(ideally we'd like to find some way to normalize by the number of
cells).
Normalization within cell types might be done by quantiles;
normalization
across cell types by the simpler (robust) mean until we can normalize by
cells. Is there a better way?
In practice I find substantial differences when normalizing across
different
cell types, as opposed to normalizing within cell types separately. 
Does anyone else have experience with this?

Regards

Mark Reimers

Date: Fri, 10 Sep 2004 15:56:00 +0100
From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
Subject: RE: [BioC] RMA normalization
To: Hairong Wei <HWei at ms.soph.uab.edu>
Cc: BioConductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID: <1094828160.3055.29.camel at ndmpc126.ihs.ox.ac.uk>
Content-Type: text/plain

I was under the impression getting a sufficient mRNA from a single
sample
was difficult enough.

Sorry, I do not think I can be of much help as I never encountered this
sort
of problem, perhaps due to my own inability to distinguish the terms
mRNA,
sample, tissue. But there are many other people on the list who have
better
appreciation of biology and hopefully one of them could advise you.

Could you give us the link to this message you are talking about.

On Fri, 2004-09-10 at 15:26, Hairong Wei wrote:
> Dear Adai:
> 
> Thanks for asking.  I got this phrase from the messages stored in the 
> archive yesterday.  My understand is that, suppose you have 100 
> arrays, and 10 mRNA samples from 10 tissues.  Each 10 arrays are 
> hybridized with mRNAs from the same tissue.  When you run RMA 
> algoritm, you run those arrays (10 each time) that hybridized with 
> mRNA from same tissue together rathan than running 100 arrays 
> together.  After running RMA for each tissue, the scaling is applied
to
arrays form different tissues.
> 
> The reason for doing this is that it is not reasonable to assume that 
> the arrays from different have the same distribution.
> 
> What is you idea to do background.correction and normalization of 100 
> arrays across 10 tissues?
> 
> Thank you very much in advance
> 
> Hairong Wei, Ph.D.
> Department of Biostatisitics
> University of Alabama at Birmingham
> Phone:  205-975-7762

Mark Reimers,
senior research fellow, 
National Cancer Inst., and SRA,
9000 Rockville Pike, bldg 37, room 5068
Bethesda MD 20892

	[[alternative HTML version deleted]]