[BioC] how deal with multiplicate affy probes?

Sean Davis sdavis2 at mail.nih.gov
Tue Mar 30 13:33:14 CEST 2004


Michael,

In reading my statement out of context, I should clarify a bit.  The problem
is that the space in which one is searching for blast or blat "hits" is
larger (unnecessarily large) in the genomic case as compared to the
transcript case.  That is, for expression analysis, one does not need or
even want to know if a probe hits some anonymous piece of DNA that is not
represented as a transcript (or for some researchers, as an annotated gene
in some curated gene effort).  In practice, what can happen is that a probe
may align to multiple places in the genome, only one of which represents a
"true" gene, others representing either common repeat elements (yes, I think
there are probably probes in production arrays that for one reason or
another have many hits in the genome) or pseudogenes.  One can argue about
the meaning of these hits, but unless there is a way of determining which of
the multiple hits is against an annotated gene, the probe is not
particularly useful for expression analysis.  Yes, in 2004, it is fairly
easy to determine if a hit is against an annotated stretch of DNA, but this
is an added step (and not entirely trivial--think splice sites-->gaps) as
compared to just looking for similarity between the probes and a library of
transcripts.  

For CGH, the opposite is true.  To what transcripts or genes a set of oligos
aligns is less interesting than the genomic DNA that they align to.

Hope that clarifies a bit.

Sean

On 3/30/04 4:08 AM, "Michael Seewald" <mseewald at gmx.de> wrote:

> 
> On Fri, 26 Mar 2004, Sean Davis wrote:
>> Finally, as noted above, blatting or blasting against the genome does not
>> get you the same information.
> 
> Sorry, I didn't get your point: If *everything* is mapped to the genome, both
> probes and transcript, what do you miss? Shouldn't the curated and curated
> genome be the reference everything else is linked to (in 2004)? I don't think,
> the transcript based mapping as done in GeneAnnot is the way to do it. It
> complicates things without necessity.
> 
> Best wishes,
> Michael
>



More information about the Bioconductor mailing list