[BioC] Invalid fold-filter

Sat Feb 18 18:55:16 CET 2006

Hi Ben,
   Basically the problem is that the actual observed intensity only 
indirectly relates to copy number of the mRNA species being assayed. It 
is valid to do within probe-set (or probe) between array comparisons 
(which basically means if, on array 1 the probe has value x, and on 
array 2 it has value y, with y>x, then we are pretty sure there is more 
mRNA for that species in the second sample than the first). But within 
array, between spot comparisons are not valid, in the sense that just 
because one spot on array 1 is brighter than another spot on array 1 
does not mean that the underlying abundance of mRNA is ordered in the 
same way. There are lots of opportunities for attenuation (if the 
samples are amplified, which I think all are now, I don't believe that 
the amplification procedure has equal efficacy on all mRNAs; I don't 
believe all mRNAs label with equal efficiency or hybridize with equal 
efficiency). A correlate of the observation that within array between 
probe comparisons are not valid is that one should not filter on level. 
Just because the spot is low in intensity does not necessarily mean very 
much. It also seems to be the case (although I have not checked 
recently) that transcription factors are fairly low abundance.
   I won't disagree that spots that correspond to low intensities are 
more likely to be noise, and so are candidates for suspicion, I am just 
not sure how you could be sure that the baby has not departed with the 
bath water.

   The times when I still use level are when there are too many genes 
that show appropriate levels of variation and I need to further reduce 
the gene set or when I am looking for a biomarker (that is a longer 
discussion). But basically if what you are looking for are good reliable 
signatures that differentiate one group from the other, then you don't 
much care about the low expressing genes. In some of those cases you are 
not trying to understand the biology (and when comprehension is your 
objective then keeping everything is a good idea, IMHO) but rather most 
interested in an objective measure that can be used to classify samples. 
I am sure there are other reasons as well.

  Best wishes
    Robert

Wittner, Ben, Ph.D. wrote:
> Robert,
> 
> Could you explain why you say below that filtering on low values is probably a
> bad idea in many cases? Also, in what cases do you filter low values and in what
> cases not?
> 
> I filter out probe-sets for which expression values for two or more classes of
> interest are low on the theory that such values are dominated by noise and fold
> changes calculated between two such classes are not meaningful.
> 
> Thanks.
> -Ben
> 
> 
>>   You can do non-specific filtering, but all you are really doing there
>>is to remove genes that are inherently uninteresting no matter what the
>>phenotype of the corresponding sample (if there is no variation in
>>expression for a particular gene across samples then it has   no
>>information about the phenotype of the sample). Filtering on low values
>>is probably a bad idea although many do it (and I used to, and still do
>>sometimes depending on the task at hand).
>>
>>
>>  Best wishes
>>    Robert
> 
> 
> 
> ------------------------------------------------------
> Ben Wittner, 617-643-3166, wittner.ben at mgh.harvard.edu
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org