[BioC] Does the strand of a microarray probe matter?
Kasper Daniel Hansen
khansen at stat.berkeley.edu
Mon Nov 24 02:36:37 CET 2008
On Nov 21, 2008, at 19:30 , Nick Henriquez wrote:
> And sorry for perhaps not making absolutely clear so to be
> completely certain there is no misunderstanding about this;
>
>
>
> Regardless of annotation, even if a piece of DNA encodes a gene on
> both strands only ONE of these will hybridise to your probe. The
> reverse-complement is NOT a perfect match, except in vanishingly
> rare cases, i.e. palindromic sequences of restriction enzymes. These
> are usually excluded from probe sets due to ambiguity/
> crosshybridising potential. RC sequences are completely different
> and do not crosshybridise with cDNA. Take any sequence (actgctgacag
> becomes ctgtcagcagt) and you will see that and why this is the case.
>
>
> Given that we know the sequence of the probe we can always tell from
> which strand the hybridising cDNA is derived. So there is no doubt
> whatsoever which gene was involved/altered in expression. If geneX
> is on the "opposite strand" geneX was NOT the gene which was altered
> in its expression, geneX is not detected by the probe in question.
> This annotation introvertibly proves that geneX is not measured by
> this probe. Therefore it was geneY encoded by the relevant strand of
> DNA. You may have to figure out what geneY is depending on quality
> of annotation but there are sufficient secondary databases to do
> that. You may even discover a "new gene".
>
This is only true if the assay does not loose strandedness. Let us say
your assay involves making double stranded cDNA as eg. some high-
throughput sequencing does. In that case you have no way of telling
what strand your original material came from.
Kasper
>
>
> If 10% of genes may be affected, that means 10% of the genes in your
> dataset. Usually we're not talking about thousands so it's fairly
> easy to check. E.g. by looking for "encoded by" in the annotation
> etc. If you use affy chips their expression console provides an
> excel/openoffice compatible output which will allow this, even if
> within R/BioC some of the annotated information might be lost. As
> long as the "strand identity" annotation is retained you will always
> see from BioC output whether geneX was in fact measured or not
> perhaps code can be adjusted to ignore "other strand" annotations
> altogether, I don't write code but it seems a relatively easy
> command to me, whatever the correct syntax " probes with "other
> strand" in the description=FALSE".
>
>
>
> Best, Nick
>
>
>
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of
> Sean Davis
> Sent: 20 November 2008 22:51
> To: Cei Abreu-Goodger
> Cc: n.henriquez at ion.ucl.ac.uk; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Does the strand of a microarray probe matter?
>
>
>
>
>
> On Thu, Nov 20, 2008 at 3:48 PM, Cei Abreu-Goodger <cei at ebi.ac.uk>
> wrote:
>
> Hi Nick, and others,
>
> Apologies for not making my question more clear, but I guess there
> have been some interesting answers anyway. I was in fact thinking of
> expression arrays. And my main interest was from the standpoint of
> probe annotation.
>
> It now does seem pretty clear that there are many regions in the
> genome that encode transcripts on both strands. If a probe is
> designed to such a region, the expression microarrays will be
> measuring both transcripts, and you will essentially have a
> "perfectly" cross-hybridizing probe.
>
>
> Not really. It depends on the protocol being used. For illumina,
> you will end up with a product that goes on the array that is strand-
> specific. That is not true of all array platforms.
>
>
>
> Now, annotation-wise, what should we do? Ignore such probes? At
> least flag them up? The problem is, many bioconductor annotation
> packages only allow a single gene to be assigned to each probe. So,
> in many cases you many be led to believe that your experiment has
> measured differential expression for a particular gene (with its set
> of GO terms, KEGG pathways, etc) when in fact the changing gene was
> the one on the other strand.
>
>
> I don't think this comes up very often, but it is always possible
> that for any given gene there is another explanation for
> differential expression as observed. That is why for a given gene,
> it is important to validate using a different technology. Globally
> (as in sets of genes), it hopefully won't be too much a factor.
>
>
>
>
> These "problems" tend to show up on the list occasionally, for
> example when people find out that different databases (Ensembl/
> Biomart, NCBI, the manufacturer or a bioC annotation package) lists
> different genes for the same probe. Obviously not all, but many of
> these differences have been due to overlapping transcripts. In fact,
> Ensembl recently patched their probe mapping pipeline to be "strand-
> aware". If you think that this would affect a tiny portion of
> probes, think again: the Affymetrix probes affected on the human and
> mouse genomes was around 10%:
>
> http://osdir.com/ml/science.biology.ensembl.devel/2008-06/
> msg00052.html
>
> Also, from talking to some of the NuID/Illumina mapping people it
> seems that they simply don't consider the strand of the probe. But
> they do calculate a "uniqueness" score to avoid probes that map to
> multiple genes.
>
> In the end, I would ideally prefer "cross-hybridizing" probes (of
> whatever sort) to be annotated in a way that they could be
> identified. But I have no idea of how much a nightmare that would be
> for the developers of the current annotation packages...
>
>
> There is no attempt to map probes in bioconductor annotation
> packages (at least those maintained by the core). The annotation
> from which the annotation packages are derived come directly from
> the manufacturers, generally. Herve Pages just posted some code to
> the list that will allow you to align your own probes to the genome
> or, more probably, to a transcript database of your choice. Then,
> you can use your own definitions for probes. I used to do this on a
> large scale for all arrays that we used, but I have backed away
> because the answers that one gets are very similar for the vast
> majority of probes.
>
> Sean
>
>
>
>
> Nick Henriquez wrote:
>
> Dear Cei, Steve,
>
> There are two versions of the correct answer depending on whether we
> are
> talking about an expression or CGH/SNP type array;
>
> If we are using an EXPRESSION array
>
> 1) It does not matter on which strand the gene resides.
> 2) It a not matter of bad probe design. It is either a negative
> control or a
> misnomer derived from genome annotation.
>
> For ANY probe to hybridise it has to be the RC of cDNA and therefore
> the DNA
> homologue of the original RNA sequence. (I'll let you work that one
> out for
> yourself).
>
> If the probe WAS encoded on "the opposite strand" your labelled
> target would
> not hybridise as it would be the reverse complement of the actual
> sequence.
> The annotation "opposite strand" stems from the convention that we
> call one
> strand the "coding strand" and the other strand the non-coding or
> "opposite"
> strand. By definition then a gene cannot be encoded by the "opposite"
> strand.
> However, what often happens when sequencing genomes is that we find
> several
> genes encoded on one strand (which we will then call the coding
> strand) and
> then somewhat later also one or more genes on the "opposite" strand.
> This
> annotation is (wrongly in my opinion) retained when genomes are
> assembled
> and thus part of the annotation of the probes.
>
> So an opposite strand probe is at best a kind of negative control,
> at worst
> a misnomer annotation retained when the genome was assembled. Mostly
> we now
> try to use terms like + and - but even that has the drawback that we
> generally associate + with coding and - with noncoding. As we all
> know BOTH
> strand encode functional RNAs of various kinds including those
> coding for
> proteins.....
>
> If we are talking about DNA targets, e.g. a SNP array
>
> 1) It does not matter on which strand a gene resides, any overlap is a
> matter of coincidence- "genes" are rare events on the genome.
> 2) It is not a matter of bad probe design. Usually it simply does
> not matter
> and this is a sequence that was used historically without knowledge
> of the
> gene (often discovered later). Sometimes the sequence on the coding
> strand
> may have a problem with background or sequence similarity. To get
> around
> this one can try to use the RC (i.e. "opposite strand" sequence)
> which is
> often different enough. Of course if more than 2 similar sequences
> exist the
> problem remains as we can use this trick only once.
>
> Hope this helps,
>
> Nick
>
> N.V. Henriquez, Senior Research Associate
> Dept. Of Neurodegenerative Diseases
> Institute of Neurology, UCL, Queen Square House rm 124
> Queen Square
> London WC1N 3BG
>
>
>
>
> Message: 8
> Date: Wed, 19 Nov 2008 10:45:52 -0500
> From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
> Subject: Re: [BioC] Does the strand of a microarray probe matter?
> To: Cei Abreu-Goodger <cei at ebi.ac.uk>
> Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Hi Cei,
>
> On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
>
> Hello all,
>
> Related issues have arisen before, where the probe of a particular
> array platform was annotated to a gene on the opposite strand. But
> I was just asked if this even matters, or should it simply be
> considered a case of bad probe design.
>
> Does the protocol for different manufacturer's arrays always
> produce amplified product of both strands for the transcript to be
> measured? I could imagine that protocols that amplify based on poly-
> A tails would tend to produce an anti-sense biased amplification
> product (older Affy arrays?), whereas those based on random priming
> could produce products of both strands (and so the actual strand
> that is on the array becomes meaningless).
>
> Does someone know what is the case in particular for Illumina
> Beadarrays?
>
>
>
> I've never worked on the bench-side of a microarray experiment, but
> for gene expression arrays I was under the impression that most
> protocols:
>
> (i) extract the the RNA from cell lysate using their poly-A tails
> as targets
> (ii) reverse transcribe to cDNA and amplify the cDNA w/ random
> primers.
> (iii) hybridize amplified cDNA to the array
>
> If that's the case, I don't think that the strand of the probe
> should be an issue.
>
> I'd be interested, of course, to hear other people's thoughts on
> this, too (while this info should be easily available from the
> manufacturer's site, or the Methods section of many papers, let's
> see if the lazy-web can help :-).
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> http://cbio.mskcc.org/~lianos <http://cbio.mskcc.org/%7Elianos>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> offi...{{dropped:16}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list