[BioC] filtering
Jenny Drnevich
drnevich at uiuc.edu
Thu Jul 12 17:13:15 CEST 2007
Hi Lev,
There have been several discussions about when to filter out data on
this list previously, and the consensus has been to NOT filter until
after all pre-processing steps (e.g., normalization) have been done.
One reason is that one array may have had a higher background than
others, and so more data values would be removed in your scheme,
which can be problematic for many normalization routines. I also
would caution you against removing "badly measured signals" from your
data set even after pre-processing. While these numbers may not be as
accurate as larger numbers, they represent very low expression or no
expression. Would you remove all the zeros from any set of data? My
rationale is that had there been distinct expression, you would have
measured it, therefore the low values near background are valid, if
not as completely accurate. In the worst case scenario, you would
miss genes that weren't expressed in one treatment but were expressed
in another treatment because you were throwing out all the data from
the non-expressed treatment. If the signals were "badly measured" in
ALL samples, then I would remove that entire probe from the analysis
(after pre-processing), but not if they were badly measured in only a
few samples.
That's my two cents,
Jenny
At 08:59 AM 7/12/2007, Lev Soinov wrote:
> Dear List,
> I have posted a similar question before, but would like to ask you again
> about filtering strategies. I have some AB1700 data and filter on signal to
> noise ratios before normalization. The rationale is to get rid of badly
> measured signals before actual processing of the data. Two jpg
> histograms of
> log2 signal distributions, before (raw.jpg) and after (filtered.jpg)
> filtering, can be seen in this location:
> http://tmgarden.cloud.prohosting.com/images/
> Could you please have a look at the distributions and comment on whether
> this is correct to filter before normalization as this changes
> the distribution of
> signals a lot?
> Thank you very much for your help.
> Lev.
>
>
>---------------------------------
>
> [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list