[BioC] Re: Affy Present Calls

Francois Collin fcollin at sbcglobal.net
Thu Oct 9 10:45:19 MEST 2003

Just a couple of comments to add.
To filter or not to filter will depend on the application.  If your problem is one of classification and you don't care what fragments are used in your classifier function, there is no harm in filtering absent genes.  You will have plenty of genes to select from among the more reproducible, high intensity fragments that are called present all the time.  If you are looking for marker genes, on the other hand, filtering on presence calls may very well hide some relevent markers.
What is of interest to me is a thorough characterization of probe sets with respect to the relationship between PM and MM and the effect on detection of mRNA molecule.  When is the call even worthy of its name?  To me "call=P" means PMs are greater than MM by some reasonable measure.  When does is "call=P" equivalent to "the mRNA molecule is present"?  I would publish such a report (If I was editor, that is.)


Eric <emblal at uky.edu> wrote:

As a user I can second that- we definitely use P/A (as well as scaling factor) to see if a chip within a treatment group has gone awry. 

Regarding  P/A calls as a filter- I'm fairly certain most users would agree that, among the probe sets found to be present in 100% of the chips in the study, there are a greater proportion of statistically significant findings than would be found among the chips that were 100% absent- of course this is with MAS5 as a probe level algorithm. 

Interestingly, our own lab's observations are that, among probe sets in which >80% of the chips shows presence calls in one treatment group and <20% of the chips show presence in the other treatment group (a relatively small group of 20 genes in the example I'm using- 10 chips per group), the significance proportion was actually worse than in the 'fully present cadre'. I've seen this in at least three other data sets with 7or more chips per group. Although my initial assumption was that selecting for presence in one group and absence in the other would bias me towards finding significant results, my interpretation after seeing that this selection actually reduces the 'significance proportion' is that dividing up the data by P/A call like this isolates probe sets for which the data is noisier, but not necessarily smaller, in one group than the other.

Regarding MAS4 algorithm going to MAS5, I think the greatest tragedy was the eradication of the negative values by artificially altering the MM values. If the fragment was not present in the mix, then PM and MM should both be randomly hybridizing, and I would expect that about half of the time the PM < MM, so those values, while they may not make biological sense, are exactly what you would expect by the probe set design. Further, if, as you mention and we've seen in our own data, some negative values are good discriminators, then there may be some negative values that are there for other reasons- 'cross hybridization' or Affy's assumptions regarding how probe sets behave may not hold true in every case. However the MAS5 algorithm blinds users to such changes.

I've also gone in and looked at the dichotomy between MAS5 and RMA (different probe level algorithms, same test; 1-way ANOVA). As I've said before, there is no shortage of discrepancies between the two (RMA finds 508 significant, MAS5 finds 409, there is an overlap of 146 between the two). We specifically looked at feature values in probe sets that were:
1) RMA: very significant (p < .001) with RMA and very non-significant (p>.9) with MAS5 
2) MAS5: very significant (p < .001) with MAS5 and very non-significant (p>.9) with RMA
3) RMA & MAS5: high concordance (p<.001 in both)

We isolated images and extracted PM and MM values for the top 10 probe sets in each of the three categories. The plotted PM and MM values reveal different phenomena that go into the 'failure to agree'. First, in cases where tests performed with RMA found significant differences and MAS5 did not, this was often because of some large variations in the behavior of the MM features. Where tests performed with MAS5 found significant differences and RMA did not, this was often because the MM subtraction amplified a difference that already existed in the PM, or, there was no difference in the PM and the entire result was due to the MM differences between the two treatment groups. Of course the probe sets that showed concordance between the two probe level algorithms were well-behaved. I presented some of this at the 3rd annual Virtual Conference on Genomics and Bioinformatics, but I never thought it was really worth pursuing as a publication. 

Do you think that there is enough interest out there to publish this, and if so, where?

At 12:01 PM 10/9/2003 +0200, you wrote:
Date: Wed, 8 Oct 2003 11:11:08 -0700 (PDT)
From: Francois Collin <fcollin at sbcglobal.net>
Subject: RE: [BioC] Affy: Present calls in an eset
To: Crispin Miller <CMiller at PICR.man.ac.uk>,
        bioconductor at stat.math.ethz.ch
Message-ID: <20031008181108.47125.qmail at web80406.mail.yahoo.com>
Content-Type: text/plain

Indeed %present calls is arguably the best of all data quality indicators that are suggested by Affymetrix.  If you rehybe the same hybe mix to chips under different conditions - change scanner, hybridization time or temperature, hybe station - %present calls can vary widely.  Genes don't appear and disappear out of the hybe mix, but probe affinities change under the different conditions.  Making sure that %present calls are consistent across a set of chips is a way to check that the processing and experimental conditions that affect hybridization kinetics were fairly consistent across a set of chips.
As for the Present calls ability to discriminate between samples in which a given mRNA fragment is present vs a samples in which it isn't, it will vary from probe set to probe set.  In an ideal probe sets in which all PM/MM probe pairs have similar non-specific binding affinities and the PM probe has good binding affinity to the target mRNA fragment, and the target doesn't bind to too many other probes on the chip, the calls will work well.  It is not clear for what proportion of probe sets the calls actually work as intended.  You can definitely find probe sets for which MM>>PM for several probe pairs in the set and these fragments will never be called present.  The reverse is also true.  
Very little has been published on the subject as far as I know.  There is the work by Ben Rubenstein mentioned earlier in this thread.  More work obviously need to go into this question.  I think that one should be aware that by screening out absent calls, you may be losing many interesting target fragments.  In the days of MAS 4.0, I recall some genes with negative expression being very good discriminators of tumor class.
Eric Blalock, PhD
Dept Pharmacology, UKMC
859 323-8033


The contents of this e-mail message and any attachments are confidential and are intended solely for addressee. The information may also be legally privileged. This transmission is sent in trust, for the sole purpose of delivery to the intended recipient. If you have received this transmission in error, any use, reproduction or dissemination of this transmission is strictly prohibited. If you are not the intended recipient, please immediately notify the sender by reply e-mail or at (859) 323-8033 and delete this message and its attachments, if any. _______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch

	[[alternative HTML version deleted]]

More information about the Bioconductor mailing list