[BioC] nsFilter and GSEA
Paolo Innocenti
paolo.innocenti at ebc.uu.se
Fri Jan 11 14:32:19 CET 2008
Hi all,
I have a set of 15 Affymetrix chips:
4 treatments, 2 technical replicates of 2 biological replicates for each
treatment (one chips has been excluded, all the others are really good
quality).
After running rma(), when i try to filter the ExpressionSet
>eset <- rma(mydata)
>eset.f <- nsFilter(eset)$eset
it removes 13047 features for low variance, leaving 171 features in my
dataset.
$numDupsRemoved
[1] 3
$numLowVar
[1] 13047
$feature.exclude
[1] 3
$numRemoved.ENTREZID
[1] 786
It is quite strange, because another analysis (few years ago) on the
same dataset revealed more than 1000 DE genes.
Now, I can just set a less stringent cutoff, but is it reasonable to go
on with the analysis with 171 features? Is it realistic to get these
results with the default parameters of nsFilter? Obviously, it depends
by what I am expecting and by the experimental design... well, i was
expecting some more dramatic changes in expression. At the end of the
analysis, I end up with ~60 differentially expressed probesets (lfc=1,
p.value=0.05, adjustment method=BH)
Second question: Is it informative to test for gene sets (GSEA) on 171
genes, or would be better not to filter the expressionset?
Thanks,
Paolo
> sessionInfo()
R version 2.6.1 (2007-11-26)
i486-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] splines tools stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] annotate_1.16.1 xtable_1.5-2 AnnotationDbi_1.0.6
[4] RSQLite_0.6-4 DBI_0.2-4 statmod_1.3.1
[7] limma_2.12.0 drosgenome1_2.0.1 genefilter_1.16.0
[10] survival_2.34 Biobase_1.16.1
loaded via a namespace (and not attached):
[1] rcompgen_0.1-17
More information about the Bioconductor
mailing list