Hi Luigi,

You might want to also have a look at the homologue annotation packages that
can be found in Bioc. They are based up imparanoid. For instance the package
for human would be

hom.Hs.imp.db

Cheers,
--Tony

On Thu, Nov 5, 2009 at 11:10 PM, Luigi Marchionni <marchion@jhu.edu> wrote:

> Dear All,
> As I  wrote to the list a couple of weeks ago I took on the endeavor of
> creating an S4 package for storing genomics results data and further analyze
> them.
> I had already code working to compare results across experiments, platform
> and species.
> To be a good citizen I start using S4, and I start relying on all classes
> already existing in Bioc.
> Now I came to the issue of dealing with mapping genes (and features) across
> species.
> I see that Hong Li maintains a package (homolog.db) containing such
> information, which depends on several other packages.
> I installed them and found difficult to use it.
> I will give you few examples:
>
> This retrieves the mapping between the Homologene ID and the Entrez Gene
> ID.
> Obviously each list element has a different length, however there is not
> easy way to tell the correspondence between organism and Entrez gene ID.
> I can say that the first 1 in both elements below is Human, then...
> If this has to be the structure, then each element in xx below should be
> names with the corresponding taxonomy id.
> See the chunk of code below:
>
>
> ################################################################################
> > xx <- as.list(homologHOMOLOG2GENEID)
> > xx[1]
> $`3`
>  [1]      34  469356  490207  505968   11364   24158  406283
>  [8]   38864 1276346  181757  173979  181758
>
> ################################################################################
>
> By using the code below I can however retrieve the mapping between Entrez
> gene identifiers to Homologene identifiers.
> Lets consider the first two elements of xx[1] above:
>
>
> ################################################################################
> > yy <- as.list(homologHOMOLOG)
> > yy["34"]
> $`34`
> [1] 3
> > yy["469356"]
> $`469356`
> [1] 3
>
> ################################################################################
>
> Using a little coding I can now map from one Entrez ID to another across
> species, although without knowing which species. So I can use species
> information:
>
>
> ################################################################################
> > zz["34"]
> $`34`
> [1] 9606
> > zz["469356"]
> $`469356`
> [1] 9598
>
> ################################################################################
>
> OK. now I know that Entrez ID "34" in Taxonomy "9006" (human) correspond to
> Entrez ID "469356" in n Taxonomy "9598" (which I do not know by heart),
> through the Homologene id "3". To learn the the second taxonomy I can do:
>
>
> ################################################################################
> > ff <- as.list(homologORGANISM)
> > ff["9598"]
> $`9598`
> [1] "Pan troglodytes"
>
> ################################################################################
>
> Good!  I had to play around a little with the code, however I could map the
> human Entrez ID "34" to the monkey "469356" one.
> However I think this is a little too complicated. To install homolog.db and
> (with dependencies=TRUE) I also had to install:
> org.Hs.ipi.db_1.1.1.tar.gz
> org.Hs.sp.db_1.1.1.tar.gz
> PAnnBuilder_1.9.0.tar.gz
> And the package does not point to a library that implements the chunks of
> code above to map Entrez ids across species.
>
> Look the code below, I load my mapping library (where the cross-mapping
> homologene table takes 3.2 Mb), I load this object, and the taxonomy
> information:
>
>
> ################################################################################
> > library(moreFGS)
> > data(homol)
> > data(tax)
> > ls()
> [1] "ff"    "homol" "tax"   "xx"    "yy"    "zz"
>
> ################################################################################
>
> Finally I load a library containing the  taxSwitch() function:
>
>
> ################################################################################
> > library(funcBox)
> > args(taxSwitch)
> function (IDs, org1, org2, whatIn = "EGID", whatOut = "EGID")
> NULL
>
> ################################################################################
>
> Now look at this, for one ID:
>
>
> ################################################################################
> > taxSwitch("34","Homo","Pan","EGID","EGID")
> [1] "469356"
> > taxSwitch("34","Homo","Pan","EGID","EGID")
> [1] "469356"
> > taxSwitch("469356","Pan","Homo","EGID","EGID")
> [1] "34"
> > taxSwitch("469356","Pan","Homo","EGID","symbol")
> [1] "ACADM"
> > taxSwitch("34","Homo","Mus","EGID","symbol")
> [1] "Acadm"
> > taxSwitch("Acadm","Mus","Homo","symbol","EGID")
> [1] "34"
> > taxSwitch("Acadm","Mus","Pan","symbol","EGID")
> [1] "469356"
> > taxSwitch("Acadm","Mus","Bos","symbol","EGID")
> [1] "505968"
> > taxSwitch("Acadm","Mus","Bos","symbol","Acc")
> [1] "NP_001068703"
> > taxSwitch("NP_001068703","Bos","Rattus","Acc","symbol")
> [1] "Acadm"
>
> ################################################################################
>
> Or more than one ID:
>
>
> ################################################################################
> > taxSwitch(c("34","37","3211"),"Homo","Mus","EGID","Acc")
> [1] "NP_031408" "NP_059062" "NP_032292"
> > taxSwitch(c("34","37","3211"),"Homo","Mus","EGID","symbol")
> [1] "Acadm"  "Acadvl" "Hoxb1"
>
> ################################################################################
>
> and so on.
> I would be very happy to provide bioconductor with the code to make the
> moreFGS library and with the taxSwitch() function.
>
> Luigi
>
> PS: the session info is below
>
>
> ################################################################################
> > sessionInfo()
> R version 2.11.0 Under development (unstable) (2009-10-01 r49916)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets
> [6] methods   base
>
> other attached packages:
>  [1] moreFGS_1.0.2       homolog.db_1.1.1
>  [3] PAnnBuilder_1.9.0   RSQLite_0.7-3
>  [5] DBI_0.2-4           funcBox_0.0.3
>  [7] annotate_1.25.0     AnnotationDbi_1.9.0
>  [9] Biobase_2.7.0       limma_3.3.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0 xtable_1.5-5
>
> ################################################################################
>
> _______________________________________________
> Bioc-devel@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]

