[BioC] Invalid fold-filter
Robert Gentleman
rgentlem at fhcrc.org
Sat Feb 18 18:55:16 CET 2006
Hi Ben,
Basically the problem is that the actual observed intensity only
indirectly relates to copy number of the mRNA species being assayed. It
is valid to do within probe-set (or probe) between array comparisons
(which basically means if, on array 1 the probe has value x, and on
array 2 it has value y, with y>x, then we are pretty sure there is more
mRNA for that species in the second sample than the first). But within
array, between spot comparisons are not valid, in the sense that just
because one spot on array 1 is brighter than another spot on array 1
does not mean that the underlying abundance of mRNA is ordered in the
same way. There are lots of opportunities for attenuation (if the
samples are amplified, which I think all are now, I don't believe that
the amplification procedure has equal efficacy on all mRNAs; I don't
believe all mRNAs label with equal efficiency or hybridize with equal
efficiency). A correlate of the observation that within array between
probe comparisons are not valid is that one should not filter on level.
Just because the spot is low in intensity does not necessarily mean very
much. It also seems to be the case (although I have not checked
recently) that transcription factors are fairly low abundance.
I won't disagree that spots that correspond to low intensities are
more likely to be noise, and so are candidates for suspicion, I am just
not sure how you could be sure that the baby has not departed with the
bath water.
The times when I still use level are when there are too many genes
that show appropriate levels of variation and I need to further reduce
the gene set or when I am looking for a biomarker (that is a longer
discussion). But basically if what you are looking for are good reliable
signatures that differentiate one group from the other, then you don't
much care about the low expressing genes. In some of those cases you are
not trying to understand the biology (and when comprehension is your
objective then keeping everything is a good idea, IMHO) but rather most
interested in an objective measure that can be used to classify samples.
I am sure there are other reasons as well.
Best wishes
Robert
Wittner, Ben, Ph.D. wrote:
> Robert,
>
> Could you explain why you say below that filtering on low values is probably a
> bad idea in many cases? Also, in what cases do you filter low values and in what
> cases not?
>
> I filter out probe-sets for which expression values for two or more classes of
> interest are low on the theory that such values are dominated by noise and fold
> changes calculated between two such classes are not meaningful.
>
> Thanks.
> -Ben
>
>
>> You can do non-specific filtering, but all you are really doing there
>>is to remove genes that are inherently uninteresting no matter what the
>>phenotype of the corresponding sample (if there is no variation in
>>expression for a particular gene across samples then it has no
>>information about the phenotype of the sample). Filtering on low values
>>is probably a bad idea although many do it (and I used to, and still do
>>sometimes depending on the task at hand).
>>
>>
>> Best wishes
>> Robert
>
>
>
> ------------------------------------------------------
> Ben Wittner, 617-643-3166, wittner.ben at mgh.harvard.edu
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list