[BioC] fold-change when no expression to high expression
Wolfgang Huber
huber at ebi.ac.uk
Sat May 30 20:31:07 CEST 2009
Matthew McCormack ha scritto:
> Transcripts not expressed in control but which have high expression in
> treatment theoretically have an infinite fold-change. Preprossesing
> algorithms will provide numbers for fold-change for these genes, but to
> do this there seems to be an assumption that all genes are expressed to
> some small degree at all times and that the chip can reliably detect
> this. If this is not the case, then it would seem that the fold-change
> number the preprocessing algorithms provide for genes that go from no
> expression to expression would be very unreliable and would not be able
> to be compared with fold changes for other genes that have an
> appreciable signal intensity in both control and treatment. These genes,
> off-on genes, are biologically very important to identify. Not
> identifying these genes because of the low or no control signal
> intensity would provide misleading data from a biological viewpoint. Is
> there any algorithm on BioConductor that addresses this problem ?
>
> Matthew McCormack
Hi Matthew,
There is a discussion on this topic in chapter 5 of our "case studies"
book [1]. More technically, also in [2], and very briefly in Section 12
of the vignette of the vsn package.
Basically: these genes are of course very important. The variance
stabilisation trick allows to still report reproducible "generalised
log-ratios" in these cases, which are estimators of the true log-ratios
that are shrunken towards 0 (from +/- infinity) and the amount of
shrinkage depends on the sensitivity of the array, as estimated from the
"background" component of noise.
Note the word *estimator*: it is useful to distinguish your data-based
estimate from the unknown, true value, and to know what stochastic and
systematic effects might occur in between them.
You are also right that the (log-)ratio is a compression of the data
that looses information. If you do not want this information loss, you
can always go back and look at the (glog) intensities in control and
treatment.
[1] Bioconductor Case Studies
http://www.springer.com/statistics/stats+life+sci/book/978-0-387-77239-4
[2] Huber W., Von Heydebreck A. and Vingron M. (2004)
Error models for microarray intensities.
http://www.ebi.ac.uk/huber/docs/huber_vingron_2004.pdf
Best wishes
Wolfgang
------------------------------------------------
Wolfgang Huber, EMBL, http://www.ebi.ac.uk/huber
More information about the Bioconductor
mailing list