[BioC] IQR implementation in varFiler and nsFilter (genefilter package)
Patrick Aboyoun
paboyoun at fhcrc.org
Fri May 29 20:26:45 CEST 2009
James,
Thanks for pointing out this inconsistency in varFilter and nsFilter
when var.func = IQR. I just checked in changes to genefilter in the BioC
2.4 (release) and BioC 2.5 (devel) branches that brings nsFilter in-line
with varFilter. Since quantiles do not have a rigid definition, as the
type argument to the quantile() function demonstrates, the varFilter and
nsFilter defines IQR as
rowQ(eset, ceiling(0.75 * ncol(eset))) - rowQ(eset, floor(0.25 *
ncol(eset)))
since this IQR calculation is relatively fast to compute and tends to
work well when IQR-based filtering is appropriate. As with before,
end-users can enter their own var.func, which could represent a
different calculation of IQR.
Cheers,
Patrick
James F. Reid wrote:
> Dear list,
>
> I have noticed that nsFilter and varFilter from the genefilter package
> implement their respective default variance function (var.func = IQR)
> in different ways and I don't know if this is intended or not. The IQR
> function in nsFilter uses an apply IQR on the rows of the matrix
> whereas varFilter uses its own rowIQRs function which lead to
> different results.
> If this is intended I think it should be made clearer in the help page
> since both functions use the same default parameters for variance
> filtering.
>
> Here is an example with the Biobase sample.ExpressionSet followed by
> it's sessionInfo()
>
> Best,
> James Reid.
>
> > library("Biobase")
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> > library("genefilter")
> >
> > data(sample.ExpressionSet)
> >
> > ## nsFilter using only var.filter
> > nsF <- nsFilter(sample.ExpressionSet,
> + require.entrez = FALSE,
> + remove.dupEntrez = FALSE,
> + feature.exclude = FALSE)
> > varF <- varFilter(sample.ExpressionSet)
> >
> > nrow(nsF$eset) == nrow(varF)
> Features
> TRUE
> > length(intersect(featureNames(nsF$eset), featureNames(varF)))
> [1] 245
> > sessionInfo()
>
> R version 2.9.0 (2009-04-17)
> x86_64-redhat-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] genefilter_1.24.0 Biobase_2.4.1
>
> loaded via a namespace (and not attached):
> [1] annotate_1.22.0 AnnotationDbi_1.6.0 DBI_0.2-4
> [4] RSQLite_0.7-1 splines_2.9.0 survival_2.35-4
> [7] tools_2.9.0 xtable_1.5-5
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list