[Bioc-devel] New package with methods for annotation packages

Martin Morgan mtmorgan at fhcrc.org
Sat Jan 8 01:38:06 CET 2011


On 01/07/2011 05:46 AM, Stefan McKinnon Edwards wrote:
> Hi all,
> 
> I have compiled a package of methods to ease the use of the annotation data packages from the Biocore Data Team (such as "org.Bt.eg.db"). It basically provides a routine for mapping biological entities from one identifier (e.g. Ensembl) to another (e.g. RefSeq) by the use of the aforementioned data packages. In the case with org.Bt.eg.db, one would have to map from Ensembl to Entrez and then to RefSeq, and meanwhile cleaning the result. With my package, it can be done with a single line. Here is an example:
> 
> R> library(AnnotationFuncs)
> R> library(org.Bt.eg.db)
> R> symbols <- c("SERPINA1","KERA","CD5")
> R> refseq <- translate(symbols, from=org.Bt.egSYMBOL2EG, to=org.Bt.egREFSEQ)
> R> refseq
> $SERPINA1
> [1] "NM_173882" "NP_776307"
> 
> $KERA
> [1] "NM_173910" "NP_776335"
> 
> $CD5
> [1] "NM_173899" "NP_776324"
> 
> R> pickRefSeq(refseq, priorities=c('NP','XP'), reduce='all')
> $SERPINA1
> [1] "NP_776307"
> 
> $KERA
> [1] "NP_776335"
> # End of example.
> 
> For this, I have two questions:


Hi Stefan --

I'd be interested, on or off list, in learning a little more about your
package implementation -- e.g., is it using SQL to query the underlying
tables, or relying on the AnnotationDbi framework? what other functions
are there in addition to those you illustrate?

> 1) Is there any other package on CRAN or BioConductor that provides the same functionality?

Vince mentioned GSEABase, which for the first mapping might be

library(GSEABase)
> symbols <- GeneSet(c("SERPINA1","KERA","CD5"),
+                    geneIdType=SymbolIdentifier("org.Bt.eg.db")
+                    setName="My Genes")
> mapIdentifiers(symbols, RefseqIdentifier())
setName: My Genes
geneIds: NM_173882, NP_776307, ..., NP_776324 (total: 6)
geneIdType: Refseq (org.Bt.eg.db)
collectionType: Null
details: use 'details(object)'

which already reveals some differences in functionality, e.g., GSEABase
returns the mapped identifiers, translate() returns the map.

> 2) I was thinking of making a small Application Note to e.g. Oxford Journals Bioinformatics. Would there be any issue, if I already have posted the package on my personal website?

Best to check with the journal, but my experience has really been the
opposite -- no sense in advertising a package that is not accessible, or
that the reviewer can't access! And as an extension, since Bioconductor
provides added value (e.g., in terms of availability and developer
infrastructure such as svn, and name recognition) it is not unusual for
application notes to indicate that the package is submitted or available
via Biocondcutor (provided of course that the package has been submitted...)

Martin

> Kind regards,
> 
> Stefan McKinnon Edwards
> PhD student
> Dept. of Genetics and Biotechnology
> Faculty of Agricultural Sciences
> Aarhus University
> Blichers Allé 20, Postboks 50
> DK-8830 Tjele
> 
> Tel.: +45 8999 1291
> Email: stefanm.edwards at agrsci.dk
> 
> Tel.: +45 8999 1900
> Web: www.agrsci.au.dk
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list