[BioC] nsFilter and GSEA

Fri Jan 11 14:32:19 CET 2008

Hi all,

I have a set of 15 Affymetrix chips:
4 treatments, 2 technical replicates of 2 biological replicates for each 
treatment (one chips has been excluded, all the others are really good 
quality).

After running rma(), when i try to filter the ExpressionSet

 >eset <- rma(mydata)
 >eset.f <- nsFilter(eset)$eset

it removes 13047 features for low variance, leaving 171 features in my 
dataset.

$numDupsRemoved
[1] 3

$numLowVar
[1] 13047

$feature.exclude
[1] 3

$numRemoved.ENTREZID
[1] 786

It is quite strange, because another analysis (few years ago) on the 
same dataset revealed more than 1000 DE genes.
Now, I can just set a less stringent cutoff, but is it reasonable to go 
on with the analysis with 171 features? Is it realistic to get these 
results with the default parameters of nsFilter? Obviously, it depends 
by what I am expecting and by the experimental design... well, i was 
expecting some more dramatic changes in expression. At the end of the 
analysis, I end up with ~60 differentially expressed probesets (lfc=1, 
p.value=0.05, adjustment method=BH)

Second question: Is it informative to test for gene sets (GSEA) on 171 
genes, or would be better not to filter the expressionset?

Thanks,
Paolo

 > sessionInfo()
R version 2.6.1 (2007-11-26)
i486-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
  [1] annotate_1.16.1     xtable_1.5-2        AnnotationDbi_1.0.6
  [4] RSQLite_0.6-4       DBI_0.2-4           statmod_1.3.1
  [7] limma_2.12.0        drosgenome1_2.0.1   genefilter_1.16.0
[10] survival_2.34       Biobase_1.16.1

loaded via a namespace (and not attached):
[1] rcompgen_0.1-17