[BioC] SQLForge and probes that map to multiple genes
Marc Carlson
mcarlson at fhcrc.org
Wed Jul 16 19:11:53 CEST 2008
Mark Cowley wrote:
> Hi Marc, Sean and list.
>
> If I can follow up on Marc's comment:
> "The thing that has me scratching my head is why you would want to map
> multiple genes onto a single probe in your annotation package?"
>
> The genomics annotation problem (what does this ProbeSet detect, and
> which ProbeSets detect my gene of interest) is inherently many to
> many, that is, one ProbeSet can map to many 'genes' (or at least many
> different accessions that point to the same gene), and that 1 'gene'
> can map to multiple ProbeSets (perhaps different isoforms).
>
> Does SQLforge handle these inevitable situations nicely?
> Having read the SQLForge pdf documentation, and this post, it seems
> that you can only provide at most 2 accessions for each ProbeSet,
> perhaps a RefSeq accession, and if that is not known, a GenBank
> accession.
>
> If this has been discussed elsewhere, can someone please point me in
> the right direction?
>
> Cheers,
>
> Mark
> -----------------------------------------------------
> Mark Cowley, BSc (Bioinformatics)(Hons)
>
> Peter Wills Bioinformatics Centre
> Garvan Institute of Medical Research, Sydney, Australia
> -----------------------------------------------------
> On 15/07/2008, at 6:57 AM, Marc Carlson wrote:
>
Hi Mark,
In its current form, SQLForge takes as many IDs as you want to give it,
but it currently assumes that you only intended to assign one kind of
gene to a given probe at a time. That is, it assumes that when you made
the probe that you really only meant to measure one thing. It is well
understood by all of us who make annotation packages that in practice
this may not always work out as you intended. But what was confusing me
was why you would want to deal with ambiguous probes by creating an
ambiguous database? It seems to me that it might really be better to
just not make a gene assignment if you really don't know what your probe
is measuring. If a probe is known to be sticking to more than one
thing, then the interpretation of any measurement from that probe really
becomes very speculative since you will have no way of knowing what
proportion of the signal belongs to what. I agree with Sean that in the
rare case like this you will really want to look at a recent blast
alignment for your mystery probe. But since a case like that really is
(ultimately) a mystery probe, I feel quite hesitant to assign multiple
identities to it inside of an annotation package...
Just for the sake of clarification, it is not the case that SQLForge
will only take two kinds of IDs at a time for mapping. One of the
parameters (otherSrc) takes a vector of filenames so you can pass
several different mappings into that parameter at once if desired. Many
major ID types are supported as a way to tell SQLForge what gene to
assign, but once it has an assignment it will then go and get all the
data for the database from public sources. So all your mapping files
are just a hook to let SQLForge find the rest of the information. In
most cases, your initial mapping will probably be complete enough to
render the extra data that is passed into the otherSrc parameter as
redundant.
I hope this clarifies things,
Marc
More information about the Bioconductor
mailing list