[BioC] QA of two-color array data

Robert Castelo robert.castelo at upf.edu
Wed Oct 28 10:31:01 CET 2009

thanks Naomi, i guess this is embarrasingly obvious :-} i've made the
plot without the Agilent control spots and the two clusters with low
M-values have dissappear from these plots:


and now stand out even more clearly the intensity dependent biases for
some of the arrays. i find them a bit weird in the sense that it is not
a bias affecting the bulk of the probes with low intensities but a
subset of them. i've googled about this but found only success stories
about removing such bias after background correction and normalization.

if i look to the MA-plots for the raw data (from the 'RG' object)
excluding control spots:


i see the bias affecting the bulk of probes with low intensities for
those problematic cases, so i guess the problem might be that i'm not
using appropriate background correction and/or normalization algorithms.

as shown in my previous email i'm currently using 'normexp' with
'mle' (which if i correctly interpret a recent post from Gordon, the
version i used is in fact employing 'saddlepoint' estimates instead of
'mle'), loess within-normalization and scale between-normalization.

do you, or anybody in the list, have any hint on how could i preprocess
these data in order to try to remove those artifacts?



On Tue, 2009-10-27 at 11:38 -0400, Naomi Altman wrote:
> The weird spots are probably the Agilent quality control 
> spots.  Remove them and redo the plot.
> --Naomi
> At 05:53 AM 10/27/2009, Robert Castelo wrote:
> >dear list,
> >
> >i have very limited experience in the QA of microarray data and i'd like
> >to know the opinion from people with more experience with this job if
> >there are issues with the QA of the data i'm analizing, and if could
> >pre-process these data differently in order to try to correct for the
> >possible QA problems.
> >
> >i'm re-analizing a series of 12 two-color microarray experiments
> >deposited in GEO (acc. GSE13943). these are custom 4x44K Agilent arrays
> >with probes targeting exons and splice junctions in Drosophila
> >Melanogaster. the experiments correspond to RNAi knock-downs of 4
> >RNA-binding proteins -hrp36, hrp38, hrp40 and hrp48- (red channel)
> >against a non-specific RNAi control (green channel) in three independent
> >replicates for each KO experiment.
> >
> >after reading the raw data files into an RGlist object called 'RG' i've
> >performed background correction, within- and between-normalization as
> >follows:
> >
> >RGneMLE <- backgroundCorrect(RG, method="normexp", normexp.method="mle",
> >offset=50)
> >
> >MA <- normalizeWithinArrays(RGneMLE[RGneMLE$genes$ControlType!=-1,],
> >                             method="loess", bc.method="none")
> >
> >MA <- normalizeBetweenArrays(MA, method="scale")
> >
> >i have produced the corresponding MA-plots of the latter pre-processed
> >MA data object for each of the 12 arrays which i've put on the web so
> >that you can take a look at them:
> >
> >http://functionalgenomics.upf.edu/QA/MA-plots1.png
> >
> >http://functionalgenomics.upf.edu/QA/MA-plots2.png
> >
> >when i look to these plots i see the following two unexpected features:
> >
> >-in the replicates of hrp36, replicate 1 of hrp38, replicate 1 of hrp40
> >and replicate 2 of hrp48 there are some small intensity dependent biases
> >affecting to the low average values A.
> >
> >-through all replicates i see two clusters of probes with low M values
> >(i.e., higher green signal).
> >
> >if i look to the image plots (generated with 'imageplot3by2(RG)'):
> >
> >http://functionalgenomics.upf.edu/QA/image-Gb-1-6.png
> >
> >http://functionalgenomics.upf.edu/QA/image-Gb-7-12.png
> >
> >i see some line crossing from the top to the bottom, but i don't know if
> >this is related to the issues raised before.
> >
> >i've run the array quality metrics package thorugh these data with the
> >following command:
> >
> >arrayQualityMetrics(expressionset=RG, outdir="aqm", force=TRUE)
> >
> >and put the output here:
> >
> >http://functionalgenomics.upf.edu/QA/aqm/QMreport.html
> >
> >according the this report there are no outlier arrays and so i'm
> >wondering whether maybe in fact there are no QA problems and simply i'm
> >not using the appropriate pre-processing algorithms for this kind of
> >data.
> >
> >thanks!
> >robert.
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives: 
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111

More information about the Bioconductor mailing list