[BioC] hoeffd function (CRAN HMisc package) use in arrayQualityMetrics

Wolfgang Huber whuber at embl.de
Tue Dec 17 21:39:37 CET 2013


Dear Jerry

Bottomline: I am not (at least not yet) sure the answer to your problem is a change to arrayQualityMetrics.
Why don't you remove the apparently large fraction of 'bad' features from your data before calling arrayQualityMetrics?

Long answer: 
1. Another thing to do for you is to decide on your favourite alternative statistic to 'Hmisc::hoeffd', and patch the function "aqm.maplot" to use that one. The vignette "Advanced topics: Customizing arrayQualityMetrics reports and programmatic processing of the output" tells you how to do so. Any comments on that are welcome. (There was never an intention that default choices of arrayQualityMetrics would be always optimal for everybody.)

2. The example below is slightly construed, you simulate an "array" with 10 genes and then add 2 or 4 genes with NA, which amounts 17% or 29% of genes having NA. If the fraction of NA is smaller, then the effect on the result of 'hoeffd' is not so strong. The function (which resides in the Hmisc package) is designed to work with outliers. I am not familiar with your arrays and filtering procedure, but maybe an array with so many NA should be flagged as having "bad quality" anyway? 

	Best wishes
		Wolfgang 





 
Il giorno Dec 9, 2013, alle ore 5:48 pm, jerry davison [guest] <guest at bioconductor.org> ha scritto:

> 
> ArrayQualityMetrics uses hoeffd to assess any linear relationship between M and A in MA plots and flags arrays where D > 0.15.  Some of the matrices I'm evaluating have NA's; those result from filtering, for example in applying Illumina beadarray detection p-value rules.
> 
> As you can see from evaluating the script below, NA's affect hoeffd.  In my situation, arrays with no strong linear relationship are flagged by this AQM measure.
> 
> Is this a bug in hoeffd?  Should AQM address this, perhaps by removing NA's in M and A before applying hoeffd?
> 
> # Filename: hoeffd.R
> #
> # Test Hmisc hoeffd() response to NA's in a matrix.  Used by # AQM to assess outliers in the MA plots.
> # 
> # J Davison 5dec2013
> #
> #--> source('hoeffd.R',echo=TRUE,max=Inf)
> #
> 
> library(Hmisc)
> library(arrayQualityMetrics)
> 
> set.seed(11)
> M = runif(10)*10
> 
> set.seed(7)
> A = runif(10)*10
> 
> df1 = data.frame(M, A)
> hoeffd(as.matrix(df1))$D
> #        M      A
> # M 1.0000 0.0437
> # A 0.0437 1.0000
> 
> ### Add NA's
> df2 = rbind(df1, data.frame(M=rep(NA, 2), A=rep(NA, 2)))
> hoeffd(as.matrix(df2))$D
> #        M      A
> # M 1.0000 0.0959
> # A 0.0959 1.0000
> 
> ### Add more NA's
> df3 = rbind(df1, data.frame(M=rep(NA, 4), A=rep(NA, 4)))
> hoeffd(as.matrix(df3))$D
> #       M     A
> # M 1.000 0.163
> # A 0.163 1.000
> 
> 
> 
> 
> -- output of sessionInfo(): 
> 
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] splines   grid      stats     graphics 
> [5] grDevices utils     datasets  methods  
> [9] base     
> 
> other attached packages:
> [1] arrayQualityMetrics_3.18.0
> [2] Hmisc_3.13-0              
> [3] Formula_1.1-1             
> [4] survival_2.37-4           
> [5] lattice_0.20-24           
> [6] cluster_1.14.4            
> [7] BiocInstaller_1.12.0      
> 
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.23.28 BeadDataPackR_1.14.0 
> [3] Biobase_2.21.7        BiocGenerics_0.7.8   
> [5] Biostrings_2.30.0     Cairo_1.5-2          
> [7] DBI_0.2-7             IRanges_1.20.5       
> [9] RColorBrewer_1.0-5    RSQLite_0.11.4       
> [11] SVGAnnotation_0.93-1  XML_3.98-1.1         
> [13] XVector_0.1.4         affy_1.39.6          
> [15] affyPLM_1.38.0        affyio_1.29.5        
> [17] annotate_1.39.0       beadarray_2.12.0     
> [19] colorspace_1.2-4      gcrma_2.34.0         
> [21] genefilter_1.43.0     hwriter_1.3          
> [23] latticeExtra_0.6-26   limma_3.18.2         
> [25] parallel_3.0.2        plyr_1.8             
> [27] preprocessCore_1.23.0 reshape2_1.2.2       
> [29] setRNG_2011.11-2      stats4_3.0.2         
> [31] stringr_0.6.2         vsn_3.29.1           
> [33] xtable_1.7-1          zlibbioc_1.7.0       
> 
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list