[Bioc-devel] affyQA/QC

Sat Sep 23 13:02:57 CEST 2006

> Date: Fri, 22 Sep 2006 15:36:10 -0700
> From: Robert Gentleman <rgentlem at fhcrc.org>
> Subject: [Bioc-devel] affyQA/QC
> To: bioc-devel at stat.math.ethz.ch
>
> Hi,
>    I am trying to put together a set of, what one might regard, as
> standard plots and summary statistics that should be collected on any
> set of Affymetrix microarrays (at least ones for gene expression). The
> first pass is attached, I would appreciate any comments on it,
> especially with regard to things that I have missed, or things I have
> suggested that don't seem to be quite correct, or could be improved.
>
>   On an implementation note - I will be making use of existing software
> and intend to work with Craig Parman to put this into the existing
> affyQCReport package - users of that might want to let me know what
> functionality they are relying on, but this should be strict additions.
>
>
>   thanks
>     Robert
>
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org

I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
implementing something close to this in the affylmGUI package for the BioC 1.9 release.

Best wishes
Gordon

-------------------
Set of affy QA plots and summaries:

Boxplots of chip-wise intensities:
\begin{Sinput}
> library(gcrma)
> x <- ReadAffy(filenames=targets$FileName,celfile.path="cel")
> narrays <- ncol(exprs(x))
> boxplot(x,names=targets$Target,las=2)
\end{Sinput}

Empirical distributions of chip-wise intensities:
\begin{Sinput}
> hist(x)
\end{Sinput}

RNA digestion plot:
\begin{Sinput}
> deg <- AffyRNAdeg(x)
> plotAffyRNAdeg(deg,col=1:narrays)
> legend("topleft",legend=1:narrays,col=1:narrrays,lty=1)
\end{Sinput}

Affy QC parameters:
The bioB spike-ins should be present.
All the other measures should be consistent across chips.
\begin{Sinput}
> library(simpleaffy)
> qc <- qc.affy(x)
> qc.tab <- rbind(
+      Percent.present=qc at percent.present,
+      Scale.factor=qc at scale.factors,
+      Average.background=qc at average.background,
+      bioBCalls=qc at bioBCalls=="P",
+      t(qc at spikes),
+      t(qc at qc.probes))
> colnames(qc.tab) <- paste("Chip",1:narrays)
> options(digits=2)
> qc.tab
\end{Sinput}

Image plots of probe level robust residuals.
Larger residuals are darker and indicate deviations from the additive model used to summarise
probes within each probe-set.
\begin{Sinput}
> library(affyPLM)
> pset <- fitPLM(x)
> oldpar <- par(mfrow=c(4,2),mar=c(1,1,2,1))
> image(pset, type="resids") # red=positive resids, blue=negative
> par(oldpar)
\end{Sinput}

Normalized Unscaled Standard Errors (NUSE) plot.
The standard error estimates obtained for each gene on each array from fitPLM
are standardized across arrays so that the median standard error for that
genes is 1 across all arrays.
An array with elevated SEs relative to other arrays is typically of
lower quality.
\begin{Sinput}
> NUSE(pset)
\end{Sinput}

Relative Log Expression (RLE) values.
RLE values are computed for each probeset by comparing the expression value
on each array against the median expression value for that probeset across all arrays.
Assuming that most genes are not changing in expression across arrays means ideally
most of these RLE values will be near 0.
When examining this plot focus should be
on the shape and position of each of the boxes.
Typically arrays with poorer quality
show up with boxes that are not centered about 0 and/or are more spread out.
\begin{Sinput}
> RLE(pset)
\end{Sinput}