[BioC] fold-change when no expression to high expression
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Thu May 28 13:52:27 CEST 2009
I don't think there is a standard way to deal with these genes, and
you are right: to eliminate them would be missing some of the
potentialy most interesting data.
If you look at fold changes, there will always be some very large ones
although not quite as large as infinite, because there's always some
background intensity. Depending on the way you deal with background
intensities, the fold difference can be larger or smaller, but at any
rate larger than the rest.
What I do is use that information (fold change) as is, as a ranking
device if you wish. I don't like to call it fold change, 'though,
because when I verify my data by RT-PCR, the fold differences I
measure vary from those of the microarray (and unless you have control
spots to calibrate the array, this will be the rule). I don't mean to
tell you to change the way you call those ratios, but just highlight
that if you want to talk of real fold changes, the data from a
microarray will be unlikely to be accurate enough. With this premise,
I don't worry about whether a gene that goes from zero in the control
to X level of expression shows a "fold change" value of 40 or 400.
Once you identify it as possibly off/on gene, the actual value is
irrelevant.
Things may look a bit better if using one colour arrays like
Affymetrix or Nimblegen, where the oligos are synthesised on the array
itself. There it's easier to identify genes that are not expressed
(although there would always be a grey area), and you can express the
data as log(intensity) rather than as log(ratio). Again, the
expression values may not be as accurate as what you'll see with RT,
but it makes (in my opinion) dealing with "on/off" genes a bit more
reasonable.
In my work, a lot fo what I do is based actually on looking for these
genes that either become silenced or activated after a given
treatment. I generally look for a signal threshold below which I can
be confident that a gene will not be expressed, and another threshold
above which I can be quite confident that a gene is expressed. Then
compare both. log(ratios) I only use as a ranking parameter, something
that gives me an idea of what genes show a larger change. There is a
grey area, between the two thresholds... I am aware of that, I know I
will miss things, probably some interesting ones too... but what I
find is usually solid and I rather get a "cleaner" list by using two
thresholds.
It's not a complicated issue, intellectually. I think everybody deals
with it in their own way. It is very imporatnt to know the limitations
of any method, and when you present the data, present meaningful
results. In my opinion to indicate a 1000 fold change when you're
really talking about "I can see this gene expressed at about 1000
times the level of the control background" is not very nice.
There is a point after which we're not talking fold-changes anymore,
so I prefer to call them ratios or log(ratios) or whatever. Maybe it's
just semantics, but it's the way I like to deal with the situation you
describe.
Adding a "fudge factor" is another way to deal with things like this.
When background correction was performed simply by substracting a
local background, one often obtained negative signals... to solve
that, adding a small factor was commonly done. But I personally don't
like to do that. At the end of the day, you can pick out those
"on/off" genes with any (reasonable) method, and their ratios would
never be meaningful as "fold differences"...
Maybe I went on for to long, sorry :-)
Jose
Quoting Laurent Gautier <laurent at cbs.dtu.dk>:
> If I understand your question right, the issue is about fold-changes
> for which the denominator is very small/zero.
>
> You may consider adding a small offset to the signal ("fudge factor")
> making the denominator leave the "danger zone", or using a
> generalized-log transform (I think that the function glog() is in the
> package "vsn").
>
>
> L.
>
>
> Matthew McCormack wrote:
>> Transcripts not expressed in control but which have high expression
>> in treatment theoretically have an infinite fold-change.
>> Preprossesing algorithms will provide numbers for fold-change for
>> these genes, but to do this there seems to be an assumption that
>> all genes are expressed to some small degree at all times and that
>> the chip can reliably detect this. If this is not the case, then it
>> would seem that the fold-change number the preprocessing
>> algorithms provide for genes that go from no expression to
>> expression would be very unreliable and would not be able to be
>> compared with fold changes for other genes that have an
>> appreciable signal intensity in both control and treatment. These
>> genes, off-on genes, are biologically very important to identify.
>> Not identifying these genes because of the low or no control
>> signal intensity would provide misleading data from a biological
>> viewpoint. Is there any algorithm on BioConductor that addresses
>> this problem ?
>>
>> Matthew McCormack
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Bioconductor
mailing list