[BioC] how deal with multiplicate affy probes?

Lawrence Paul Petalidis
Fri Mar 26 10:46:56 CET 2004

Hello Laurent
Thank you for your message. Yes, I do use the target sequence for BLASTing
as opposed to performing multiple blasts with each of the probe sequences as
I find this is faster. I am not suggesting that BLASTing all probe set
target sequences is an option, not yet at least - and I am not performing
this exercise for all genes. However, if there are genes that one is
interested in particularly, and a priori, it is worth it.

For example, I have genomic annotation for 5 genes (ie I know if they have
amplifications or deletions in my tumour samples) and am looking at these
genes on the Affy U133 chips very carefully (multiple probe sets issues,
specificity) to assess correlations of expression with genomic status. For
the MDM2 gene for example, a probe that was said to be _at really seems to
pick up a sea of different transcript variants when assessed by BLAST (and
verified by subsequent multiple sequence alignment of the target sequence
against the transcript variant sequences found).

Indeed, I agree, this is most likely because the information available at
the time did not include these novel variants, but I had the impression that
NetAffx was routinely updated against the current Unigene version. I am
somewhat perplexed however, as it seems that although NetAffx is updated,
some of the information is still based on the Unigene 133 version and in
some cases the probe set display tool is not sufficiently up to date.

In any case, my initial message did not refer to a widespread issue with the
technology but aimed to raise a discussion on the issue of _at probe
unique-ness, an issue that I believe could have been dealt by in a slightly
better way in NetAffx.

Many thanks for your attention, Lawrence

Lawrence Paul Petalidis
Ph.D. Candidate

University of Cambridge
Department of Pathology

From: Laurent Gautier
Sent: 26 March 2004
To: Lawrence Paul Petalidis
Cc: Michael Seewald; Johnnidis, Jonathan;
bioconductor at stat.math.ethz.ch; maechler at stat.math.ethz.ch;
jgentry at jimmy.harvard.edu
Subject: RE: [BioC] how deal with multiplicate affy probes?

On Thu, 2004-03-25 at 18:34, Lawrence Paul Petalidis wrote:
> Hello,
> As a note following on from Michael Seewald's message, I totally agree
> there is a STRONG need to BLAST probe set sequences.

Do we really need to use BLAST (then how would we decide on cut-off
values) ? The short probes are short oligonucleotides, so I think
perfect string matches are likely to be enough in many cases.

>  I tend to use the probe
> set target sequence instead of the indicidual probe sequences however.

At the risk of looking silly, may I ask you to detail a bit (I am not
certain to understand... do you mean that you prefer working with the
target sequence a given probe set is supposed to match ?... then you
BLAST it against the rest of the world ?)

>  You
> will be surprised to see the inconsistency of the Affy annotation, in many
> cases _at probes are really not unique at all.

I have spent some time damaging my sight by looking at how Affymetrix
probes match reference sequences, and I would not be so fast at throwing
the stone at them. What is there is not perfect (there are obvious
problems), but:
1) it was done some time ago (the Dorian Gray syndrome referred in a
previous mail)... and your very own "BLASTs" (or whatever else) could
suffer from the same problem in some time
2) in some cases suspect that the people at Affymetrix did combined
different sources of information to create the probes in a probe set
(ex: a gene with tentatively 2 different isoforms, and two independants
entries GENBANK, can lead to a unique probe set by setting the probes at
appropriate locations.... whether it is relevant to merge two different
isoforms into one goes can then be discussed, but that a different

>  So if you are really
> interested in a transcript, BLAST it to make sure you are actually seeing
> what you think you are.

The notion of "alternative mappings" implemented in the package
'altcdfenvs' is worth a look. Staring at probe matches is probably not
the idea of fun many people have, but apparently some start to do it for
their favorite genes. I believe that a community-based mapping could
benefit... well... the community...


> Best regards to all, Lawrence
> Lawrence Paul Petalidis
> Ph.D. Candidate
> University of Cambridge
> Department of Pathology
> From: bioconductor-bounces at stat.math.ethz.ch
To: Johnnidis, Jonathan
> As a rule of thumb: If statistics based on a given probe set data tells
> that a transcript is significantly deregulated, you can usually trust it
> discard every other probe set for that transcript!
> The thing to look at is the probe design itself: Download the probe set
> NetAffx and blast the single probes agains the genome (e.g. in ensembl).
> will be surprised, how many probes match up with introns or genomic
> that do not correspond to any cDNA!
> 2 examples: There are 4 probe sets for human Wnt6 (HG-U133AB), 2 match
> the sense (!) strand and have to be discarded. Out of >12 probe sets for
> human
> CD44, only 4 have probes that are completely matching the transcripts. >8
> can
> be discarded.
> Best,
> Michael
> PS: www.ensembl.org is always a good place to check probe sets. Their
> mapping
> of probe sets does not show the location of single probes, though...
> PPS: In affymetrix.com you can check out the "Details" view for a probe
> There you can discover, that 2 probe sets of Wnt 6 map to the (-) strand,
> which is bad. It doesn't tell you, however, that many probe sets match
> intron
> regions.
On Sat, 20 Mar 2004, Johnnidis, Jonathan wrote:
> > I'm a new list member and am not quite sure if this question is
> appropriate
> > for the list, but will shoot anyway. I'm analyzing a bunch of data from
> Affy
> > MgU74Av2 chips and am a bit perplexed as to how to treat conflicting
> > expression data from multiplicate probe sets (that is a gene that has >1
> > probe set designed against it (for example, 97569_r_at and 97658_r_at
> > both probes for the Insulin gene).
