[BioC] hoeffd function (CRAN HMisc package) use in arrayQualityMetrics
Wolfgang Huber
whuber at embl.de
Tue Dec 17 21:39:37 CET 2013
Dear Jerry
Bottomline: I am not (at least not yet) sure the answer to your problem is a change to arrayQualityMetrics.
Why don't you remove the apparently large fraction of 'bad' features from your data before calling arrayQualityMetrics?
Long answer:
1. Another thing to do for you is to decide on your favourite alternative statistic to 'Hmisc::hoeffd', and patch the function "aqm.maplot" to use that one. The vignette "Advanced topics: Customizing arrayQualityMetrics reports and programmatic processing of the output" tells you how to do so. Any comments on that are welcome. (There was never an intention that default choices of arrayQualityMetrics would be always optimal for everybody.)
2. The example below is slightly construed, you simulate an "array" with 10 genes and then add 2 or 4 genes with NA, which amounts 17% or 29% of genes having NA. If the fraction of NA is smaller, then the effect on the result of 'hoeffd' is not so strong. The function (which resides in the Hmisc package) is designed to work with outliers. I am not familiar with your arrays and filtering procedure, but maybe an array with so many NA should be flagged as having "bad quality" anyway?
Best wishes
Wolfgang
Il giorno Dec 9, 2013, alle ore 5:48 pm, jerry davison [guest] <guest at bioconductor.org> ha scritto:
>
> ArrayQualityMetrics uses hoeffd to assess any linear relationship between M and A in MA plots and flags arrays where D > 0.15. Some of the matrices I'm evaluating have NA's; those result from filtering, for example in applying Illumina beadarray detection p-value rules.
>
> As you can see from evaluating the script below, NA's affect hoeffd. In my situation, arrays with no strong linear relationship are flagged by this AQM measure.
>
> Is this a bug in hoeffd? Should AQM address this, perhaps by removing NA's in M and A before applying hoeffd?
>
> # Filename: hoeffd.R
> #
> # Test Hmisc hoeffd() response to NA's in a matrix. Used by # AQM to assess outliers in the MA plots.
> #
> # J Davison 5dec2013
> #
> #--> source('hoeffd.R',echo=TRUE,max=Inf)
> #
>
> library(Hmisc)
> library(arrayQualityMetrics)
>
> set.seed(11)
> M = runif(10)*10
>
> set.seed(7)
> A = runif(10)*10
>
> df1 = data.frame(M, A)
> hoeffd(as.matrix(df1))$D
> # M A
> # M 1.0000 0.0437
> # A 0.0437 1.0000
>
> ### Add NA's
> df2 = rbind(df1, data.frame(M=rep(NA, 2), A=rep(NA, 2)))
> hoeffd(as.matrix(df2))$D
> # M A
> # M 1.0000 0.0959
> # A 0.0959 1.0000
>
> ### Add more NA's
> df3 = rbind(df1, data.frame(M=rep(NA, 4), A=rep(NA, 4)))
> hoeffd(as.matrix(df3))$D
> # M A
> # M 1.000 0.163
> # A 0.163 1.000
>
>
>
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] splines grid stats graphics
> [5] grDevices utils datasets methods
> [9] base
>
> other attached packages:
> [1] arrayQualityMetrics_3.18.0
> [2] Hmisc_3.13-0
> [3] Formula_1.1-1
> [4] survival_2.37-4
> [5] lattice_0.20-24
> [6] cluster_1.14.4
> [7] BiocInstaller_1.12.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.23.28 BeadDataPackR_1.14.0
> [3] Biobase_2.21.7 BiocGenerics_0.7.8
> [5] Biostrings_2.30.0 Cairo_1.5-2
> [7] DBI_0.2-7 IRanges_1.20.5
> [9] RColorBrewer_1.0-5 RSQLite_0.11.4
> [11] SVGAnnotation_0.93-1 XML_3.98-1.1
> [13] XVector_0.1.4 affy_1.39.6
> [15] affyPLM_1.38.0 affyio_1.29.5
> [17] annotate_1.39.0 beadarray_2.12.0
> [19] colorspace_1.2-4 gcrma_2.34.0
> [21] genefilter_1.43.0 hwriter_1.3
> [23] latticeExtra_0.6-26 limma_3.18.2
> [25] parallel_3.0.2 plyr_1.8
> [27] preprocessCore_1.23.0 reshape2_1.2.2
> [29] setRNG_2011.11-2 stats4_3.0.2
> [31] stringr_0.6.2 vsn_3.29.1
> [33] xtable_1.7-1 zlibbioc_1.7.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list