[BioC] Re: replicates and low expression levels

Eric emblal at uky.edu
Mon Jun 2 10:37:59 MEST 2003


Hi,

To add to what Rafael Irizarry said, when we had multiple subjects/ chips 
per treatment group in our recent publication (Blalock et al, 2003, J 
Neurosci.), we used P/A filtering to determine what probe sets were to be 
included in the 'final' statistical analysis. Because we did this on an 
entire record basis- that is, a single probe set was removed from further 
consideration if there were 'too many' absence calls for that probe set 
(the determination was arbitrarily set at 40% presence calls in at least 
one treatment group), the F-statistics for each gene that remained were 
unchanged. However, this filtering has a huge effect on the error of 
multiple testing when using the 'MAS' algorithms because part of what is 
being removed is the unexpressed probe set contingent- that fairly large 
group of probe sets (in our case nearly 50%) that are not detectable/ not 
expressed in the tissue of interest (I'd guess that this will be an issue 
with any 'general purpose' array designed to genome wide expression).

Affy is as much as telling you that they are not confident in the average 
difference score (ADS) and signal intensity (SI) numbers their algorithms 
produce if the probe sets are rated absent. My current understanding is 
that the MAS metrics are not 'stand alone'. Although Affy intends ADS and 
SI to be their quantitative measures of mRNA level, these measures go hand 
and glove with thier respective absence calls. As far as what the absence 
calls mean, there appears to be a shell game (three card monte) going on 
with the 'why' of absence calls. You are correct that many probe sets are 
called absent because they have insufficient signal, but many probe sets 
are also called 'absent' because, although there is sufficient signal 
intensity, there is also too much disagreement among probe pairs. Thus 
there are two reasons probe sets get called absent, 1) the signal is too 
dim and 2) the probe set is not working the way the algorithm expects. Oh, 
and add an interaction of those two as well.

So if you are using another algorithm like RMA to look at your data, then 
the presence/ absence calls could be dangerous because they are taking out 
probe sets that didn't work well for MAS, however those probe sets may have 
done just fine with RMA.

Hope that helps.,
-E

>Message: 4
>Date: Fri, 30 May 2003 17:28:45 +0100
>From: "Crispin Miller" <CMiller at picr.man.ac.uk>
>Subject: [BioC] replicates and low expression levels
>To: <bioconductor at stat.math.ethz.ch>
>Message-ID:
>         <BAA35444B19AD940997ED02A6996AAE00B1448 at sanmail.picr.man.ac.uk>
>Content-Type: text/plain;       charset="iso-8859-1"
>
>Hi,
>Just a quick question about low expression levels on Affy systems - I hope 
>it's not too off-topic; it is about normalisation and data analysis...
>I've heard a lot of people advocating that it's a good idea to perform an 
>initial filtering on either Present Marginal or Absent calls, or on 
>gene-expression levels (so that only genes with an expression > 40, say, 
>after scaling to a TGT of 100 using the MAS5.0 algorithm, are part of the 
>further analysis). Firstly, am I right in thinking that this is to 
>eliminate data that are too close to the background noise level of the system.
>
>I wanted to canvas opinion as to whether people feel we need to do this if 
>we have replicates and are using statistical tests - rather than just 
>fold-changes - to identify 'interesting' genes. Does the statistical 
>testing do this job for us?
>
>Crispin
>



More information about the Bioconductor mailing list