[BioC] IQR implementation in varFiler and nsFilter (genefilter package)

Patrick Aboyoun paboyoun at fhcrc.org
Fri May 29 20:26:45 CEST 2009


James,
Thanks for pointing out this inconsistency in varFilter and nsFilter 
when var.func = IQR. I just checked in changes to genefilter in the BioC 
2.4 (release) and BioC 2.5 (devel) branches that brings nsFilter in-line 
with varFilter. Since quantiles do not have a rigid definition, as the 
type argument to the quantile() function demonstrates, the varFilter and 
nsFilter defines IQR as

rowQ(eset, ceiling(0.75 * ncol(eset))) - rowQ(eset, floor(0.25 * 
ncol(eset)))

since this IQR calculation is relatively fast to compute and tends to 
work well when IQR-based filtering is appropriate. As with before, 
end-users can enter their own var.func, which could represent a 
different calculation of IQR.


Cheers,
Patrick


James F. Reid wrote:
> Dear list,
>
> I have noticed that nsFilter and varFilter from the genefilter package 
> implement their respective default variance function (var.func = IQR) 
> in different ways and I don't know if this is intended or not. The IQR 
> function in nsFilter uses an apply IQR on the rows of the matrix 
> whereas varFilter uses its own rowIQRs function which lead to 
> different results.
> If this is intended I think it should be made clearer in the help page 
> since both functions use the same default parameters for variance 
> filtering.
>
> Here is an example with the Biobase sample.ExpressionSet followed by 
> it's sessionInfo()
>
> Best,
> James Reid.
>
> > library("Biobase")
>
> Welcome to Bioconductor
>
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> > library("genefilter")
> >
> > data(sample.ExpressionSet)
> >
> > ## nsFilter using only var.filter
> > nsF <- nsFilter(sample.ExpressionSet,
> +                 require.entrez = FALSE,
> +                 remove.dupEntrez = FALSE,
> +                 feature.exclude = FALSE)
> > varF <- varFilter(sample.ExpressionSet)
> >
> > nrow(nsF$eset) == nrow(varF)
> Features
>     TRUE
> > length(intersect(featureNames(nsF$eset), featureNames(varF)))
> [1] 245
> > sessionInfo()
>
> R version 2.9.0 (2009-04-17)
> x86_64-redhat-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C 
>
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] genefilter_1.24.0 Biobase_2.4.1
>
> loaded via a namespace (and not attached):
> [1] annotate_1.22.0     AnnotationDbi_1.6.0 DBI_0.2-4
> [4] RSQLite_0.7-1       splines_2.9.0       survival_2.35-4
> [7] tools_2.9.0         xtable_1.5-5
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list