[BioC] SQLForge and probes that map to multiple genes

Tue Jul 15 02:42:55 CEST 2008

Hi Marc, Sean and list.

If I can follow up on Marc's comment:
"The thing that has me scratching my head is why you would want to map  
multiple genes onto a single probe in your annotation package?"

The genomics annotation problem (what does this ProbeSet detect, and  
which ProbeSets detect my gene of interest) is inherently many to  
many, that is, one ProbeSet can map to many 'genes' (or at least many  
different accessions that point to the same gene), and that 1 'gene'  
can map to multiple ProbeSets (perhaps different isoforms).

Does SQLforge handle these inevitable situations nicely?
Having read the SQLForge pdf documentation, and this post, it seems  
that you can only provide at most 2 accessions for each ProbeSet,  
perhaps a RefSeq accession, and if that is not known, a GenBank  
accession.

If this has been discussed elsewhere, can someone please point me in  
the right direction?

Cheers,

Mark
-----------------------------------------------------
Mark Cowley, BSc (Bioinformatics)(Hons)

Peter Wills Bioinformatics Centre
Garvan Institute of Medical Research, Sydney, Australia
-----------------------------------------------------
On 15/07/2008, at 6:57 AM, Marc Carlson wrote:

> Sean Davis wrote:
>> On Mon, Jul 14, 2008 at 12:07 PM, Cei Abreu-Goodger  
>> <cei at sanger.ac.uk> wrote:
>>
>>> Hi Sean,
>>>
>>> Ok, so my example was even worse than I thought. And I had forgot  
>>> to mention
>>> that the otherSrc parameter wasn't what I needed. So, to return to  
>>> my bad
>>> example, I now have two separate files, the first column in the  
>>> first file,
>>> the second in the second file:
>>>
>>>
>>>> refseqs <- "gnf1m.test.tab"
>>>> refseqs2 <- "gnf1m.test2.tab"
>>>>
>>>> read.table(refseqs)
>>>>
>>>             V1        V2
>>> 1   gnf1m00050_at NM_008929
>>> 2 gnf1m00051_a_at NM_007487
>>> 3 gnf1m00052_a_at NM_178939
>>> 4 gnf1m00053_a_at NM_181666
>>> 5 gnf1m00054_a_at NM_026430
>>> 6 gnf1m00055_a_at NM_029916
>>> 7 gnf1m00056_a_at NM_181666
>>>
>>>> read.table(refseqs2)
>>>>
>>>             V1        V2
>>> 1   gnf1m00050_at NM_172283
>>> 2 gnf1m00051_a_at NM_172283
>>> 3 gnf1m00052_a_at NM_172283
>>> 4 gnf1m00053_a_at NM_172283
>>> 5 gnf1m00054_a_at NM_172283
>>> 6 gnf1m00055_a_at NM_172283
>>> 7 gnf1m00056_a_at NM_172283
>>>
>>> I now add the second file as an otherSrc:
>>>
>>>
>>>> makeMOUSECHIP_DB(affy=FALSE, prefix="test", fileName=refseqs,
>>>> baseMapType="refseq", otherSrc=c(refseqs2),
>>>>
>>>               outputDir=".", version="0.9", manufacturer="GNF- 
>>> Affymetrix",
>>> chipName="gnf1m")
>>>
>>>
>>> But this till doesn't add the second gene's annotation to all the  
>>> probes
>>> (the resulting package's annotation is exactly the same as in the  
>>> first
>>> case). Is there any other way?
>>>
>>
>> I think that the way SQLForge works now, it will only use the
>> additional annotation if the first ID is not successfully mapped.
>> (Someone else should probably confirm my assertion about this).   
>> Since
>> it appears that your first column contains all RefSeq IDs, you will
>> never get to the second column.  So, in short, I don't know how to
>> make SQLForge do what you want.
>>
>> Sean
>>
>>
>
> Hi Guys,
>
> Sean is correct about the purpose of the the otherSrc parameter, and  
> about the way that SQLforge currently works.  The thing that has me  
> scratching my head is why you would want to map multiple genes onto  
> a single probe in your annotation package?
>
>   Marc
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor