[BioC] Analyzing arrays with no replicates

Sun Feb 20 23:30:15 CET 2005

McGee, Monnie wrote:
> 
> 1.  I used rma(affy.object) to obtain normalized, background
> corrected expression levels.  However, rma uses median polish to
> obtain expression values.  I read in the achieves of this list that
> median polish was not a good idea when there are no replicates, and
> that rlm would be better.  In order to use rlm, one needs an
> expression set, which means I have to specify some sort of summary
> method (using expresso) or take median polish (rma).  If not median
> polish, which summary method?  I have also read on the list that
> reviewers of some journals consider that median polish is the best
> method, and, if it is not used, the analysis is suspect (resulting in
> a reanalysis or a rejection).  So, if I don't use median polish (or
> rma) how do I justify the method I use?

I think you are confused. RMA doesn't care if there are any replicates 
or not. It is simply used to convert PM data to an expression measure 
for each gene on each chip.

> 
> 2. When I do my.eset <- rma(affy.object), I obtain an object of class
> "exprSet", but I think the phenoData attribute is incorrect;
> 
>> print(my.eset)
> 
> Expression Set (exprSet) with 54675 genes 4 samples phenoData object
> with 1 variables and 4 cases varLabels sample: arbitrary numbering
> 
>> pData(my.eset)
> 
> sample Control    1 Treat1      2 Treat2      3 Treat3      4
> 
> It seems to me that the phenoData object should have 4 variables and
> 1 case.  In other words, phenoData sees my treatments as cases (or
> patients) and says that I have one variable.

You can change the phenoData slot at any time. See ?exprSet for more 
information.

> 
> 
> 3. Once I have expression levels (and a correct eset), what is the
> best way to do non-specific filtering?  Clearly, I don't want to test
> all 54K genes.  I  tried the following, suggested in one of the R
> working papers:
> 
> 
>> library(genefilter) f1 <- pOverA(0.25, log2(100)) f2 <- function(x)
>> (IQR(x) > 0.5) ff <- filterfun(f1,f2) selected <-
>> genefilter(my.eset,ff) sum(selected)
> 
> [1] 57
> 
> Maybe 57 genes is a great number, but it seems much smaller than the
> examples I've seen in papers.  Also, one of the AFFX genes is in this
> filtered sample.  Since they are housekeeping genes, I would think
> that they aren't supposed to pass through the filter.

You will likely have to do some sort of pre-filtering before looking at 
your fold change data (see answers from Sean Davis and Naomi Altman). 
Without any replication there is no reason to expect that an AFFX gene 
would be removed by a filter, because there is no reason to expect that 
these probes are any less noisy than any other, and without any measure 
of variability you will not be able to distinguish differentially 
expressed genes from noisy genes.

Jim

-- 
James W. MacDonald
University of Michigan
Affymetrix and cDNA Microarray Core
1500 E Medical Center Drive
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.