[BioC] annotations for Codelink arrays

Diego Diez diez at kuicr.kyoto-u.ac.jp
Fri Apr 6 06:50:16 CEST 2007


Hi,

On Apr 6, 2007, at 6:06 AM, Weiwei Shi wrote:

> Hi, there:
> I am analyzing an expression profile using CodeLink RU1 arrays and
> assume I could use the package called r10kcod for annotation. I did
> some manual work before by using biomaRt and now I would like to try
> this package. I searched the archives and found the following old post
> (on 2005), discussing on a couple of issues like one2multiple mapping.
> Here I am wondering how these problems have been solved in this new(?)
> package.

well, the new packages available are built using standard methods for  
building annotation packages with AnnBuilder. That means that from my  
side there is no especial action done about probes mapping to  
multiple genes. I do what it is best in terms of comparability  
between different annotation packages, i.e. use the same methodology  
and the same annotation sources than any other package in a specific  
BioC release.

See comments below:

>
> Thanks,
>
> Weiwei
>
>
> On 10/17/05, John Zhang <jzhang at jimmy.harvard.edu> wrote:
>>
>>> So in this case, if some probes map to differents Entrez Gene  
>>> ID's (that
>>> is the case of some of the MULTIPLE probes in this chips, at  
>>> least with
>>> the company mappings) then it will be taken only one of the  
>>> Entrez Gene
>>> ID's (the smallest). I will have to check the company's mappings  
>>> for these
>>> probes to Entrez Gene or maybe not use it at all and be confident on
>>> AnnBuilder method (best way a think).
>>
>> One to many mappings is always a problem as far as annotation is  
>> concerned.
>> AnnBuilder makes a choice (may not be the best one) for the users  
>> when there are
>> multiple Entrez Gene mappings for a given probe id. I would like  
>> to invite
>> comments on what would be the best way of handling this situation.
>>

As John Zang said, the problem is not restricted to Codelink arrays.  
For this problem the designers of AnnBuilder had to make a choice,  
and for me it is ok. So you will lose the information about multiple  
mapping at entrez gene level. But you still have the information of  
multiple mapping at accession level, which is stored in r10kcodACCNUM  
environment in the case of r10kcod annotation packages. If you find  
one interesting gene with many ACCNUM mapped to it I would take a  
look into the different mappings to see how reliable is that probe.

By the way, new packages have been made for the next BioC release and  
are available for testing purpose. You will need to have R-2.5  
(devel) and BioC-2.0 (devel) to install the binary packages if you  
want to give it a try though.

Hope this helps,

Diego.


>>
>>>
>>> But how can I use a mixture of genebank ids (for most of the  
>>> probes) and
>>> unigene ids (for some of them)? If I use "gb" as baseMapType I  
>>> will not
>>> get the mapping for the unigene ids. If I use "ug" then the same  
>>> for the
>>> genbank ids. Cannot use the unigene ids in otherSrc because this  
>>> can only
>>> use Entrez ids. I worked a little with this with no good result.  
>>> This is
>>> briefly what I do:
>>
>> Currently there is no parser for both GB and UniGene ids. I will  
>> look into
>> writing one. The go around for now is probably to map by GB and UG  
>> separately
>> and then merge the results
>>
>>>
>>> gb.txt: File with mappings from probe ids to genbank ids.
>>> Sometimes I used a file ll.txt with mappings from probe ids. to  
>>> locuslink
>>> ids (mappings from the company) in otherSrc
>>
>> It is always a good idea to include otherSrc. AnnBuilder has a  
>> voting machenism
>> that takes the mapping with the most votes from differenct sources.
>>
>>
>>>
>>>> library(AnnBuilder)
>>>> myBase <- file.path("gb.txt")
>>>> myBaseType <- "gb"
>>>> mySrcUrls <- getSrcUrl("all", organism="Rattus norvegicus")
>>>> myDir <- tempdir()
>>>> ABPkgBuilder(baseName=myBase, srcUrls=mySrcUrls,  
>>>> baseMapType=myBaseType,
>>>> pkgPath=myDir, organism="Rattus norvegicus", ... other  
>>>> parameters ...)
>>>
>>>
>>> Thank you again for your help. I think this package is great and  
>>> the best
>>> way to deal with the nightmare of annotations out there.
>>>
>>> D.
>>>
>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>> D.
>>>>>
>>>>> El 13/10/2005, a las 3:14, Robert Gentleman escribió:
>>>>>
>>>>>> Hi Tao,
>>>>>>   If the right set of mappings is available to get started,  
>>>>>> AnnBuilder
>>>>>> is pretty easy to use. We can help you with the first one or  
>>>>>> two, and
>>>>>> are happy to distribute them. If there is more widespread  
>>>>>> interest
>>>>>> (and
>>>>>> they are stable) we can add them to the build process.
>>>>>>
>>>>>>   Robert
>>>>>>
>>>>>> Shi, Tao wrote:
>>>>>>
>>>>>>> Any plans to create annotation packages for Codelink arrays?
>>>>>>>
>>>>>>> ...Tao
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Robert Gentleman, PhD
>>>>>> Program in Computational Biology
>>>>>> Division of Public Health Sciences
>>>>>> Fred Hutchinson Cancer Research Center
>>>>>> 1100 Fairview Ave. N, M2-B876
>>>>>> PO Box 19024
>>>>>> Seattle, Washington 98109-1024
>>>>>> 206-667-7700
>>>>>> rgentlem at fhcrc.org
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>
>>>> Jianhua Zhang
>>>> Department of Medical Oncology
>>>> Dana-Farber Cancer Institute
>>>> 44 Binney Street
>>>> Boston, MA 02115-6084
>>>>
>>
>> Jianhua Zhang
>> Department of Medical Oncology
>> Dana-Farber Cancer Institute
>> 44 Binney Street
>> Boston, MA 02115-6084
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>
>
> -- 
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list