[Bioc-devel] affyQA/QC

Tue Sep 26 19:34:34 CEST 2006

Hi Gordon,
   Not much to say - a few notes, but this seems very similar to what I 
have proposed - so I like it.  The differences don't seem to be that 
substantive.

Gordon K Smyth wrote:
>> Date: Fri, 22 Sep 2006 15:36:10 -0700
>> From: Robert Gentleman <rgentlem at fhcrc.org>
>> Subject: [Bioc-devel] affyQA/QC
>> To: bioc-devel at stat.math.ethz.ch
>>
>> Hi,
>>    I am trying to put together a set of, what one might regard, as
>> standard plots and summary statistics that should be collected on any
>> set of Affymetrix microarrays (at least ones for gene expression). The
>> first pass is attached, I would appreciate any comments on it,
>> especially with regard to things that I have missed, or things I have
>> suggested that don't seem to be quite correct, or could be improved.
>>
>>   On an implementation note - I will be making use of existing software
>> and intend to work with Craig Parman to put this into the existing
>> affyQCReport package - users of that might want to let me know what
>> functionality they are relying on, but this should be strict additions.
>>
>>
>>   thanks
>>     Robert
>>
>> --
>> Robert Gentleman, PhD
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> PO Box 19024
>> Seattle, Washington 98109-1024
>> 206-667-7700
>> rgentlem at fhcrc.org
> 
> 
> I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
> tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
> implementing something close to this in the affylmGUI package for the BioC 1.9 release.
> 
> Best wishes
> Gordon
> 
> -------------------
> Set of affy QA plots and summaries:
> 
> Boxplots of chip-wise intensities:
> \begin{Sinput}
>> library(gcrma)
>> x <- ReadAffy(filenames=targets$FileName,celfile.path="cel")
>> narrays <- ncol(exprs(x))
>> boxplot(x,names=targets$Target,las=2)
> \end{Sinput}
> 
> Empirical distributions of chip-wise intensities:
> \begin{Sinput}
>> hist(x)
> \end{Sinput}
> 
> RNA digestion plot:
> \begin{Sinput}
>> deg <- AffyRNAdeg(x)
>> plotAffyRNAdeg(deg,col=1:narrays)
>> legend("topleft",legend=1:narrays,col=1:narrrays,lty=1)
> \end{Sinput}
> 
> Affy QC parameters:
> The bioB spike-ins should be present.
> All the other measures should be consistent across chips.
> \begin{Sinput}
>> library(simpleaffy)
>> qc <- qc.affy(x)
>> qc.tab <- rbind(
> +      Percent.present=qc at percent.present,
> +      Scale.factor=qc at scale.factors,
> +      Average.background=qc at average.background,
> +      bioBCalls=qc at bioBCalls=="P",
> +      t(qc at spikes),
> +      t(qc at qc.probes))
>> colnames(qc.tab) <- paste("Chip",1:narrays)
>> options(digits=2)
>> qc.tab
> \end{Sinput}
> 
> Image plots of probe level robust residuals.
> Larger residuals are darker and indicate deviations from the additive model used to summarise
> probes within each probe-set.
> \begin{Sinput}
>> library(affyPLM)
>> pset <- fitPLM(x)
>> oldpar <- par(mfrow=c(4,2),mar=c(1,1,2,1))
>> image(pset, type="resids") # red=positive resids, blue=negative
>> par(oldpar)
> \end{Sinput}

   I thought about this, but for a lot of arrays it seems like it would 
be better to come back and concentrate on those that were indicated for 
other reasons.

> 
> Normalized Unscaled Standard Errors (NUSE) plot.
> The standard error estimates obtained for each gene on each array from fitPLM
> are standardized across arrays so that the median standard error for that
> genes is 1 across all arrays.
> An array with elevated SEs relative to other arrays is typically of
> lower quality.
> \begin{Sinput}
>> NUSE(pset)
> \end{Sinput}
> 
> Relative Log Expression (RLE) values.
> RLE values are computed for each probeset by comparing the expression value
> on each array against the median expression value for that probeset across all arrays.
> Assuming that most genes are not changing in expression across arrays means ideally
> most of these RLE values will be near 0.
> When examining this plot focus should be
> on the shape and position of each of the boxes.
> Typically arrays with poorer quality
> show up with boxes that are not centered about 0 and/or are more spread out.
> \begin{Sinput}
>> RLE(pset)
> \end{Sinput}
> 
> 

   Seems very similar -
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org