[BioC] Interpreting mdqc output

Mark Dunning md392 at cam.ac.uk
Fri Oct 24 11:27:42 CEST 2008


Hello all,

I have been looking at the mdqc package for automatic quality assessment of
a large set of Affy SNP 6.0 data. I have already generated a set of QC stats
using Affy's own software and they exclude outlier arrays using a fixed
cut-off of the contrast QC scores (basically a measure of how separated the
three genotype clouds are). I wanted to see if mdqc would give me the same
answers.

Here are some of the contrast QC scores for the first 6 arrays (out of 140).
A value less than 0.4 in any of these columns could be a quality problem
according to Affy.

> allQC[1:6,]
  Contrast.QC Contrast.QC..Random. Contrast.QC..Nsp. Contrast.QC..Sty.
Contrast.QC..Nsp.Sty.Overlap.
1        0.72                 0.72              0.79              1.00
1.38
2        0.42                 0.42              0.72              0.35
0.99
3        1.08                 1.08              0.97              1.28
1.30
4        0.50                 0.50              0.75              0.79
0.64
5        0.00                 0.00              0.00             -0.22
0.00
6        0.47                 0.47              0.76              0.49
0.71

As you can see Array 5 is clearly an outlier (<0.4) in all 5 columns and we
flagged it as such originally. However, when running mdqc, it does not call
array 5 an outlier at the greatest significance level. Intuitively I would
expect this array to have the most extreme quality measure.

> mout=mdqc(allQC)
> mout
Method used: nogroups   Number of groups: 1 
Robust estimator: S-estimatorMDs exceeding the square root of the  90 %
percentile of the Chi-Square distribution
 [1]   5   8  14  16  48  63  75  78  81  86  91 114 117 122 126 131 132 134
137 138
MDs exceeding the square root of the  95 % percentile of the Chi-Square
distribution
 [1]   5   8  14  48  75  78  81  86  91 114 122 126 131 132 137 138
MDs exceeding the square root of the  99 % percentile of the Chi-Square
distribution
[1]  48  78  81  86 122 126 131 137 138


Which leads me (finally!) to my questions:-

-Is mdqc getting confused by the fact that array 5 is consistently low in
all qc measures?

-Does mdqc automatically assume that higher values indicate lower array
quality or vice-versa?



Many thanks in advance for any input,

Cheers,

Mark

PS here is my sessionInfo()



> sessionInfo()
R version 2.8.0 alpha (2008-10-04 r46598) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mdqc_1.4.0      MASS_7.2-44     cluster_1.11.11



More information about the Bioconductor mailing list