[BioC] fold-change when no expression to high expression

Thu May 28 13:52:27 CEST 2009

I don't think there is a standard way to deal with these genes, and  
you are right: to eliminate them would be missing some of the  
potentialy most interesting data.

If you look at fold changes, there will always be some very large ones  
although not quite as large as infinite, because there's always some  
background intensity. Depending on the way you deal with background  
intensities, the fold difference can be larger or smaller, but at any  
rate larger than the rest.
What I do is use that information (fold change) as is, as a ranking  
device if you wish. I don't like to call it fold change, 'though,  
because when I verify my data by RT-PCR, the fold differences I  
measure vary from those of the microarray (and unless you have control  
spots to calibrate the array, this will be the rule). I don't mean to  
tell you to change the way you call those ratios, but just highlight  
that if you want to talk of real fold changes, the data from a  
microarray will be unlikely to be accurate enough. With this premise,  
I don't worry about whether a gene that goes from zero in the control  
to X level of expression shows a "fold change" value of 40 or 400.  
Once you identify it as possibly off/on gene, the actual value is  
irrelevant.

Things may look a bit better if using one colour arrays like  
Affymetrix or Nimblegen, where the oligos are synthesised on the array  
itself. There it's easier to identify genes that are not expressed  
(although there would always be a grey area), and you can express the  
data as log(intensity) rather than as log(ratio). Again, the  
expression values may not be as accurate as what you'll see with RT,  
but it makes (in my opinion) dealing with "on/off" genes a bit more  
reasonable.

In my work, a lot fo what I do is based actually on looking for these  
genes that either become silenced or activated after a given  
treatment. I generally look for a signal threshold below which I can  
be confident that a gene will not be expressed, and another threshold  
above which I can be quite confident that a gene is expressed. Then  
compare both. log(ratios) I only use as a ranking parameter, something  
that gives me an idea of what genes show a larger change. There is a  
grey area, between the two thresholds... I am aware of that, I know I  
will miss things, probably some interesting ones too... but what I  
find is usually solid and I rather get a "cleaner" list by using two  
thresholds.

It's not a complicated issue, intellectually. I think everybody deals  
with it in their own way. It is very imporatnt to know the limitations  
of any method, and when you present the data, present meaningful  
results. In my opinion to indicate a 1000 fold change when you're  
really talking about "I can see this gene expressed at about 1000  
times the level of the control background" is not very nice.
There is a point after which we're not talking fold-changes anymore,  
so I prefer to call them ratios or log(ratios) or whatever. Maybe it's  
just semantics, but it's the way I like to deal with the situation you  
describe.

Adding a "fudge factor" is another way to deal with things like this.  
When background correction was performed simply by substracting a  
local background, one often obtained negative signals... to solve  
that, adding a small factor was commonly done. But I personally don't  
like to do that. At the end of the day, you can pick out those  
"on/off" genes with any (reasonable) method, and their ratios would  
never be meaningful as "fold differences"...

Maybe I went on for to long, sorry :-)

Jose

Quoting Laurent Gautier <laurent at cbs.dtu.dk>:

> If I understand your question right, the issue is about fold-changes
> for which the denominator is very small/zero.
>
> You may consider adding a small offset to the signal ("fudge factor")
> making the denominator leave the "danger zone", or using a
> generalized-log transform (I think that the function glog() is in the
> package "vsn").
>
>
> L.
>
>
> Matthew McCormack wrote:
>> Transcripts not expressed in control but which have high expression  
>>  in treatment theoretically have an infinite fold-change.   
>> Preprossesing algorithms will provide numbers for fold-change for   
>> these genes, but to do this there seems to be an assumption that   
>> all genes are expressed to some small degree at all times and that   
>> the chip can reliably detect this. If this is not the case, then it  
>>  would seem that the fold-change number the preprocessing  
>> algorithms  provide for genes that go from no expression to  
>> expression would be  very unreliable and would not be able to be  
>> compared with fold  changes for other genes that have an  
>> appreciable signal intensity  in both control and treatment. These  
>> genes, off-on genes, are  biologically very important to identify.  
>> Not identifying these  genes because of the low or no control  
>> signal intensity would  provide misleading data from a biological  
>> viewpoint. Is there any  algorithm on BioConductor that addresses  
>> this problem ?
>>
>> Matthew McCormack
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:   
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.