[Bioc-devel] affyQA/QC

Wed Sep 27 11:23:33 CEST 2006

Hi,

I don't know whether this would be of any use, but I've been trying to
develop a web front-end to some of the bioconductor tools, which so far
does a lot of the affy quality assessment stuff. The biologists we work
with aren't comfortable with the command line. I've tried to design it to
make it as easy as possible to add new components (I added the simple affy
qc stuff in about half an hour. It uses the perl Catalyst web framework
and templating to generate web pages and R scripts). It's only running on
the server at the moment, so it can be pretty slow, but we're planning on
sending the R jobs off to other machines soon.

If you want to have a look it's at http://bioinformatics.essex.ac.uk/ROME
(and trac/svn from http://rome.devjavu.com/ , though I wouldn't advise
trying to install it just yet). Upload of data is disabled during testing,
but if you register you'll get some demo data to play with. There's also a
lack of documentation, but if you log in as 'thing' with password 'thing'
and go to session->datafiles you can have a look at some datafiles and
image files I've already created.

If anyone fancies helping out with development, give me a shout.

Cheers,

Cass.

On Tue, 26 Sep 2006, Robert Gentleman wrote:

> Hi Gordon,
>    Not much to say - a few notes, but this seems very similar to what I
> have proposed - so I like it.  The differences don't seem to be that
> substantive.
>
>
> Gordon K Smyth wrote:
> >> Date: Fri, 22 Sep 2006 15:36:10 -0700
> >> From: Robert Gentleman <rgentlem at fhcrc.org>
> >> Subject: [Bioc-devel] affyQA/QC
> >> To: bioc-devel at stat.math.ethz.ch
> >>
> >> Hi,
> >>    I am trying to put together a set of, what one might regard, as
> >> standard plots and summary statistics that should be collected on any
> >> set of Affymetrix microarrays (at least ones for gene expression). The
> >> first pass is attached, I would appreciate any comments on it,
> >> especially with regard to things that I have missed, or things I have
> >> suggested that don't seem to be quite correct, or could be improved.
> >>
> >>   On an implementation note - I will be making use of existing software
> >> and intend to work with Craig Parman to put this into the existing
> >> affyQCReport package - users of that might want to let me know what
> >> functionality they are relying on, but this should be strict additions.
> >>
> >>
> >>   thanks
> >>     Robert
> >>
> >> --
> >> Robert Gentleman, PhD
> >> Program in Computational Biology
> >> Division of Public Health Sciences
> >> Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N, M2-B876
> >> PO Box 19024
> >> Seattle, Washington 98109-1024
> >> 206-667-7700
> >> rgentlem at fhcrc.org
> >
> >
> > I'd also be interested in reactions on a standard set of plots and summaries.  Below is the set of
> > tests I've been using recently (based on advice from Ken Simpson).  Keith Satterley is
> > implementing something close to this in the affylmGUI package for the BioC 1.9 release.
> >
> > Best wishes
> > Gordon
> >
> > -------------------
> > Set of affy QA plots and summaries:
> >
> > Boxplots of chip-wise intensities:
> > \begin{Sinput}
> >> library(gcrma)
> >> x <- ReadAffy(filenames=targets$FileName,celfile.path="cel")
> >> narrays <- ncol(exprs(x))
> >> boxplot(x,names=targets$Target,las=2)
> > \end{Sinput}
> >
> > Empirical distributions of chip-wise intensities:
> > \begin{Sinput}
> >> hist(x)
> > \end{Sinput}
> >
> > RNA digestion plot:
> > \begin{Sinput}
> >> deg <- AffyRNAdeg(x)
> >> plotAffyRNAdeg(deg,col=1:narrays)
> >> legend("topleft",legend=1:narrays,col=1:narrrays,lty=1)
> > \end{Sinput}
> >
> > Affy QC parameters:
> > The bioB spike-ins should be present.
> > All the other measures should be consistent across chips.
> > \begin{Sinput}
> >> library(simpleaffy)
> >> qc <- qc.affy(x)
> >> qc.tab <- rbind(
> > +      Percent.present=qc at percent.present,
> > +      Scale.factor=qc at scale.factors,
> > +      Average.background=qc at average.background,
> > +      bioBCalls=qc at bioBCalls=="P",
> > +      t(qc at spikes),
> > +      t(qc at qc.probes))
> >> colnames(qc.tab) <- paste("Chip",1:narrays)
> >> options(digits=2)
> >> qc.tab
> > \end{Sinput}
> >
> > Image plots of probe level robust residuals.
> > Larger residuals are darker and indicate deviations from the additive model used to summarise
> > probes within each probe-set.
> > \begin{Sinput}
> >> library(affyPLM)
> >> pset <- fitPLM(x)
> >> oldpar <- par(mfrow=c(4,2),mar=c(1,1,2,1))
> >> image(pset, type="resids") # red=positive resids, blue=negative
> >> par(oldpar)
> > \end{Sinput}
>
>    I thought about this, but for a lot of arrays it seems like it would
> be better to come back and concentrate on those that were indicated for
> other reasons.
>
> >
> > Normalized Unscaled Standard Errors (NUSE) plot.
> > The standard error estimates obtained for each gene on each array from fitPLM
> > are standardized across arrays so that the median standard error for that
> > genes is 1 across all arrays.
> > An array with elevated SEs relative to other arrays is typically of
> > lower quality.
> > \begin{Sinput}
> >> NUSE(pset)
> > \end{Sinput}
> >
> > Relative Log Expression (RLE) values.
> > RLE values are computed for each probeset by comparing the expression value
> > on each array against the median expression value for that probeset across all arrays.
> > Assuming that most genes are not changing in expression across arrays means ideally
> > most of these RLE values will be near 0.
> > When examining this plot focus should be
> > on the shape and position of each of the boxes.
> > Typically arrays with poorer quality
> > show up with boxes that are not centered about 0 and/or are more spread out.
> > \begin{Sinput}
> >> RLE(pset)
> > \end{Sinput}
> >
> >
>
>
>    Seems very similar -
> >
>
>