[BioC] filtering

Jenny Drnevich drnevich at uiuc.edu
Thu Jul 12 17:13:15 CEST 2007


Hi Lev,

There have been several discussions about when to filter out data on 
this list previously, and the consensus has been to NOT filter until 
after all pre-processing steps (e.g., normalization) have been done. 
One reason is that one array may have had a higher background than 
others, and so more data values would be removed in your scheme, 
which can be problematic for many normalization routines.  I also 
would caution you against removing "badly measured signals" from your 
data set even after pre-processing. While these numbers may not be as 
accurate as larger numbers, they represent very low expression or no 
expression. Would you remove all the zeros from any set of data? My 
rationale is that had there been distinct expression, you would have 
measured it, therefore the low values near background are valid, if 
not as completely accurate.  In the worst case scenario, you would 
miss genes that weren't expressed in one treatment but were expressed 
in another treatment because you were throwing out all the data from 
the non-expressed treatment.  If the signals were "badly measured" in 
ALL samples, then I would remove that entire probe from the analysis 
(after pre-processing), but not if they were badly measured in only a 
few samples.

That's my two cents,
Jenny



At 08:59 AM 7/12/2007, Lev Soinov wrote:
>   Dear List,
>   I have posted a similar question before, but would like to ask you again
>   about filtering strategies. I have some AB1700 data and filter on signal to
>   noise ratios before normalization. The rationale is to get rid of badly
>   measured signals before actual processing of the data. Two jpg 
> histograms of
>   log2 signal distributions, before (raw.jpg) and after (filtered.jpg)
>   filtering, can be seen in this location:
>   http://tmgarden.cloud.prohosting.com/images/
>   Could you please have a look at the distributions and comment on whether
>   this is correct to filter before normalization as this changes 
> the distribution of
>   signals a lot?
>   Thank you very much for your help.
>   Lev.
>
>
>---------------------------------
>
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list