[BioC] Re: replicates and low expression levels
Eric
emblal at uky.edu
Mon Jun 2 10:37:59 MEST 2003
Hi,
To add to what Rafael Irizarry said, when we had multiple subjects/ chips
per treatment group in our recent publication (Blalock et al, 2003, J
Neurosci.), we used P/A filtering to determine what probe sets were to be
included in the 'final' statistical analysis. Because we did this on an
entire record basis- that is, a single probe set was removed from further
consideration if there were 'too many' absence calls for that probe set
(the determination was arbitrarily set at 40% presence calls in at least
one treatment group), the F-statistics for each gene that remained were
unchanged. However, this filtering has a huge effect on the error of
multiple testing when using the 'MAS' algorithms because part of what is
being removed is the unexpressed probe set contingent- that fairly large
group of probe sets (in our case nearly 50%) that are not detectable/ not
expressed in the tissue of interest (I'd guess that this will be an issue
with any 'general purpose' array designed to genome wide expression).
Affy is as much as telling you that they are not confident in the average
difference score (ADS) and signal intensity (SI) numbers their algorithms
produce if the probe sets are rated absent. My current understanding is
that the MAS metrics are not 'stand alone'. Although Affy intends ADS and
SI to be their quantitative measures of mRNA level, these measures go hand
and glove with thier respective absence calls. As far as what the absence
calls mean, there appears to be a shell game (three card monte) going on
with the 'why' of absence calls. You are correct that many probe sets are
called absent because they have insufficient signal, but many probe sets
are also called 'absent' because, although there is sufficient signal
intensity, there is also too much disagreement among probe pairs. Thus
there are two reasons probe sets get called absent, 1) the signal is too
dim and 2) the probe set is not working the way the algorithm expects. Oh,
and add an interaction of those two as well.
So if you are using another algorithm like RMA to look at your data, then
the presence/ absence calls could be dangerous because they are taking out
probe sets that didn't work well for MAS, however those probe sets may have
done just fine with RMA.
Hope that helps.,
-E
>Message: 4
>Date: Fri, 30 May 2003 17:28:45 +0100
>From: "Crispin Miller" <CMiller at picr.man.ac.uk>
>Subject: [BioC] replicates and low expression levels
>To: <bioconductor at stat.math.ethz.ch>
>Message-ID:
> <BAA35444B19AD940997ED02A6996AAE00B1448 at sanmail.picr.man.ac.uk>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi,
>Just a quick question about low expression levels on Affy systems - I hope
>it's not too off-topic; it is about normalisation and data analysis...
>I've heard a lot of people advocating that it's a good idea to perform an
>initial filtering on either Present Marginal or Absent calls, or on
>gene-expression levels (so that only genes with an expression > 40, say,
>after scaling to a TGT of 100 using the MAS5.0 algorithm, are part of the
>further analysis). Firstly, am I right in thinking that this is to
>eliminate data that are too close to the background noise level of the system.
>
>I wanted to canvas opinion as to whether people feel we need to do this if
>we have replicates and are using statistical tests - rather than just
>fold-changes - to identify 'interesting' genes. Does the statistical
>testing do this job for us?
>
>Crispin
>
More information about the Bioconductor
mailing list