[BioC] fold-change when no expression to high expression

Sat May 30 20:31:07 CEST 2009

Matthew McCormack ha scritto:
> Transcripts not expressed in control but which have high expression in 
> treatment theoretically have an infinite fold-change. Preprossesing 
> algorithms will provide numbers for fold-change for these genes, but to 
> do this there seems to be an assumption that all genes are expressed to 
> some small degree at all times and that the chip can reliably detect 
> this. If this is not the case, then it would seem that the fold-change 
> number the preprocessing algorithms provide for genes that go from no 
> expression to expression would be very unreliable and would not be able 
> to be compared with fold changes for other genes that have an 
> appreciable signal intensity in both control and treatment. These genes, 
> off-on genes, are biologically very important to identify. Not 
> identifying these genes because of the low or no control signal 
> intensity would provide misleading data from a biological viewpoint. Is 
> there any algorithm on BioConductor that addresses this problem ?
> 
> Matthew McCormack

Hi Matthew,

There is a discussion on this topic in chapter 5 of our "case studies" 
book [1]. More technically, also in [2], and very briefly in Section 12 
of the vignette of the vsn package.

Basically: these genes are of course very important. The variance 
stabilisation trick allows to still report reproducible "generalised 
log-ratios" in these cases, which are estimators of the true log-ratios 
that are shrunken towards 0 (from +/- infinity) and the amount of 
shrinkage depends on the sensitivity of the array, as estimated from the 
"background" component of noise.

Note the word *estimator*: it is useful to distinguish your data-based 
estimate from the unknown, true value, and to know what stochastic and 
systematic effects might occur in between them.

You are also right that the (log-)ratio is a compression of the data 
that looses information. If you do not want this information loss, you 
can always go back and look at the (glog) intensities in control and 
treatment.

[1] Bioconductor Case Studies
http://www.springer.com/statistics/stats+life+sci/book/978-0-387-77239-4

[2] Huber W., Von Heydebreck A. and Vingron M. (2004)
Error models for microarray intensities.
http://www.ebi.ac.uk/huber/docs/huber_vingron_2004.pdf

Best wishes
      Wolfgang

------------------------------------------------
Wolfgang Huber, EMBL, http://www.ebi.ac.uk/huber