[BioC] Normalising divergent samples - Discussion (Please!)

Wed Jul 21 13:00:57 CEST 2004

Again this has met a blank, I can't believe that nobody has any comment on this.

If you are working on this, or want to, please let me know. I'm a biologist and
have no serious hope of developing a new method, especially not alone but more 
in knowing that something may be possible or in the pipeline as I have some 
potentially useful applications.

Even if you are not working on this I'd be interested in gathering opinions on
the following even if it's just speculation.

What is known about the hybridisation behaviour of samples with less transcripts
present? Are there any studies on this (not sure how you would do this though).
Specifically - how would one sample containing 5000 mRNA's compare to one with
10000 mRNA's on a 20k chip. Would you expect the overall intensity be changed 
or shifted? What would happen with the background? How would having less transcripts
affect different normalisations?

Has anyone tried to use the B2 oligo intensities in any way, is it possible to 
access them, and is its use (in terms of the way it is spiked in) consistent 
enough to be used in any useful way to control for hybridisation efficiency?
Or does anyone know anything more about them than the mention in the Affy manuals.

Has anyone normalised (even within MAS5) to a small number of control genes on
affy arrays, if so how were they selected and how did it perform.

How much of the differences in the intensity distributions on different arrays
is technical, interfering biological(RNA quality and quantity eg: see first 
question) versus meaningful differences in expression levels. Considering this, 
would distorting (modifying?) the distribution using quantile normalisation 
be worse than a simple scaling normalisation?(speculation is welcome as I guess 
this cannot actually be answered)

What (off-the-shelf) normalistion would you recommend/think to be best?

Finally, I guess it needs to be asked whether this is really a data analysis
problem or a case of expecting data magic, and with so many unknown factors (at 
present) will this kind of study ever produce really useful data?

Thanks (in hope).
Matt