[BioC] Genefilter parameters for mouse 430 2
Richard Friedman
friedman at cancercenter.columbia.edu
Wed Mar 19 20:10:07 CET 2008
Dear Bioconductor Users,
I am using genefilter to filter an ExpressionSet of 4 Mouse 430 2 chips
preprocessed with gcrma prior to analysis with limma.
Here is a description of the expressionset.
> xen2dataeset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 45101 features, 4 samples
element names: exprs
phenoData
sampleNames: A_xen_1_21.cel, A_xen_2_22.cel, D_nodal_1_27.cel,
D_nodal_2_2
8.cel
varLabels and varMetadata description:
sample: arbitrary numbering
featureData
featureNames: 1415670_at, 1415671_at, ..., AFFX-r2-P1-cre-5_at
(45101 total)
fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: mouse4302
>
Here is my session information.
> sessionInfo()
R version 2.6.1 (2007-11-26)
i386-apple-darwin8.10.1
locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] mouse4302probe_2.0.0 mouse4302cdf_2.0.0 mouse4302.db_2.0.2
[4] limma_2.12.0 geneplotter_1.16.0 lattice_0.17-2
[7] annotate_1.16.1 AnnotationDbi_1.0.6 RSQLite_0.6-3
[10] DBI_0.2-3 RColorBrewer_1.0-1 affyPLM_1.14.0
[13] xtable_1.5-2 simpleaffy_2.14.05 gcrma_2.10.0
[16] matchprobes_1.10.0 genefilter_1.16.0 survival_2.34
[19] annaffy_1.10.1 KEGG_2.0.1 GO_2.0.1
[22] affy_1.16.0 preprocessCore_1.0.0 affyio_1.6.1
[25] Biobase_1.16.3
loaded via a namespace (and not attached):
[1] KernSmooth_2.22-21 grid_2.6.1 tools_2.6.1
>
I have tried the filtering parameters in the article by Scholtens and
Heydebreck on
p 233 of the book by Gentleman et al.:
f1<-pOverA(0.25,log2(100))
> f2<-function(x)(IQR(x)>0.5)
> ff<-filterfun(f1,f2)
> selected <-genefilter(xen2dataeset,ff)
> sum(selected)
[1] 289
This seemed a bit small so that I tried the effect of each of the
parameters individually:
selectedp025A <-genefilter(xen2dataeset,f1)
> sum(selectedp025A)
[1] 9681
> selectedIQRgtp5 <-genefilter(xen2dataeset,f2)
> sum(selectedIQRgtp5)
[1] 731
My questions;
1. Is the log2(100) intensity cutoff good for all chips?
If not can someone recommend a good intensity cutoff for mouse 4302.
2, Is the only effect of filtering to reduce the multiplier in the
false discovery
analysis OR does it reduce false positives in other ways by
A. In the case of intensity filters by reducing the number of large
fold changes resulting
from the ratios of small numbers.
B. In the case of IQR filters eliminating large t-statistics
resulting for genes with small variation
across samples but fortuitously low standard deviations,
Up until this time I have not filtered because the filtering
parameters looked arbitrary and I
thought that it was cheating to reduce the # of tests used to compute
the FDR. From reading and
further reflection I now believe otherwise. But whereas I now believe
I should filter I am
not at all sure what parameters to use, and how much my final list of
differentially expressed genes
will be sensitive to a choice of those parameters. In particular, i
wonder if the
intensity filter cutoff should vary with chip-type and preprocessing
method (eg GCRMA).
Any thoughts and guidance would be appreciated.
Thanks as always,
Rich
------------------------------------------------------------
Richard A. Friedman, PhD
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer
Department of Biomedical Informatics (DBMI)
Educational Coordinator
Center for Computational Biology and Bioinformatics (C2B2)
National Center for Multiscale Analysis of Genomic Networks (MAGNet)
Box 95, Room 130BB or P&S 1-420C
Columbia University Medical Center
630 W. 168th St.
New York, NY 10032
(212)305-6901 (5-6901) (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/
"Sure I am willing to stop watching television
to get a better education."
-Rose Friedman, age 11
More information about the Bioconductor
mailing list