[BioC] Non-specific filtering for HyperGeometric/GSEA test
Yuan Hao
yuan.hao at cantab.net
Tue May 11 01:41:27 CEST 2010
Dear list,
May I have a question about the non-specific filtering used for defining a
gene universe during HyperGeometric/GSEA test?
I have fifteen samples from Affymetrix. To remove probe sets that have
little variation across samples, I evaluated IQR of each probe set across
samples by either of the following two pieces of code:
# code one
> cutoff <- 0.5
> Iqr <- apply (exprs(eset), 1, IQR)
> selected <- (Iqr > cutoff)
> filtered <- eset[selected, ]
> dim(filtered)
Features Samples
11490 15
# code two
> library(genefilter)
> filtered<-varFilter(eset, var.func=IQR, var.cutoff=0.5,
filterByQuantile=TRUE)
> dim(filtered)
Features Samples
27337 15
I realized the differences in "filtered" given by above two methods may
come from the different definitions of IQR. In the first case, IQR was
computed by using the 'quantile' function rather than Tukey's format:
IQR(x) = quantile(x,3/4) - quantile(x,1/4), which was used in the second
case. I am aware the fact that the number of genes in the gene universe
would has significant effects on the test result. However, I am not sure
which IQR evaluation method will be a better choice for the
HyperGeometric/GSEA test? It would be appreciated very much if you could
shed some light on it!
Regards,
Yuan
More information about the Bioconductor
mailing list