[BioC] QA of two-color array data

Tue Oct 27 10:53:35 CET 2009

dear list,

i have very limited experience in the QA of microarray data and i'd like
to know the opinion from people with more experience with this job if
there are issues with the QA of the data i'm analizing, and if could
pre-process these data differently in order to try to correct for the
possible QA problems.

i'm re-analizing a series of 12 two-color microarray experiments
deposited in GEO (acc. GSE13943). these are custom 4x44K Agilent arrays
with probes targeting exons and splice junctions in Drosophila
Melanogaster. the experiments correspond to RNAi knock-downs of 4
RNA-binding proteins -hrp36, hrp38, hrp40 and hrp48- (red channel)
against a non-specific RNAi control (green channel) in three independent
replicates for each KO experiment.

after reading the raw data files into an RGlist object called 'RG' i've
performed background correction, within- and between-normalization as
follows:

RGneMLE <- backgroundCorrect(RG, method="normexp", normexp.method="mle",
offset=50)

MA <- normalizeWithinArrays(RGneMLE[RGneMLE$genes$ControlType!=-1,],
                            method="loess", bc.method="none")

MA <- normalizeBetweenArrays(MA, method="scale")

i have produced the corresponding MA-plots of the latter pre-processed
MA data object for each of the 12 arrays which i've put on the web so
that you can take a look at them:

http://functionalgenomics.upf.edu/QA/MA-plots1.png

http://functionalgenomics.upf.edu/QA/MA-plots2.png

when i look to these plots i see the following two unexpected features:

-in the replicates of hrp36, replicate 1 of hrp38, replicate 1 of hrp40
and replicate 2 of hrp48 there are some small intensity dependent biases
affecting to the low average values A.

-through all replicates i see two clusters of probes with low M values
(i.e., higher green signal).

if i look to the image plots (generated with 'imageplot3by2(RG)'):

http://functionalgenomics.upf.edu/QA/image-Gb-1-6.png

http://functionalgenomics.upf.edu/QA/image-Gb-7-12.png

i see some line crossing from the top to the bottom, but i don't know if
this is related to the issues raised before.

i've run the array quality metrics package thorugh these data with the
following command:

arrayQualityMetrics(expressionset=RG, outdir="aqm", force=TRUE)

and put the output here:

http://functionalgenomics.upf.edu/QA/aqm/QMreport.html

according the this report there are no outlier arrays and so i'm
wondering whether maybe in fact there are no QA problems and simply i'm
not using the appropriate pre-processing algorithms for this kind of
data.

thanks!
robert.