[BioC] find overlaping genes in ENSEMBL gene ID list and NCBI gene ID list

Vincent Carey stvjc at channing.harvard.edu
Tue Mar 8 20:49:39 CET 2011


There are various approaches using Bioconductor.  The fundamental
resource is the package org.Bt.eg.db, which you can acquire using
biocLite().

You can find associations between ENSEMBL ids and Entrez ids using
mappings in that package.

You may find GSEABase useful also.  For example

> dput(ensex)
c("ENSBTAG00000000005", "ENSBTAG00000000008", "ENSBTAG00000000009",
"ENSBTAG00000000010", "ENSBTAG00000000011", "ENSBTAG00000000012",
"ENSBTAG00000000013", "ENSBTAG00000000014", "ENSBTAG00000000015",
"ENSBTAG00000000016", "ENSBTAG00000000018", "ENSBTAG00000000019",
"ENSBTAG00000000020", "ENSBTAG00000000021", "ENSBTAG00000000022",
"ENSBTAG00000000023", "ENSBTAG00000000024", "ENSBTAG00000000025",
"ENSBTAG00000000026", "ENSBTAG00000000027")
> e1 = GeneSet(ensex, geneIdType=ENSEMBLIdentifier("org.Bt.eg.db"))
> e1
setName: NA
geneIds: ENSBTAG00000000005, ENSBTAG00000000008, ...,
ENSBTAG00000000027 (total: 20)
geneIdType: ENSEMBL (org.Bt.eg.db)
collectionType: Null
details: use 'details(object)'
> g1 = e1
> geneIdType(g1) = EntrezIdentifier("org.Bt.eg.db")
> g1
setName: NA
geneIds: 282136, 539250, ..., 512788 (total: 21)
geneIdType: EntrezId (org.Bt.eg.db)
collectionType: Null
details: use 'details(object)'

This shows that there are 21 Entrez IDs associated with the 20 ENSEMBL
ids given above.
After converting sets to common ID type, you can use intersect,
setdiff methods to answer some of the
questions you pose.


> sessionInfo()
R version 2.13.0 Under development (unstable) (2011-03-01 r54628)
Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
 [1] GSEABase_1.13.3       graph_1.29.3          annotate_1.29.3
 [4] org.Bt.eg.db_2.4.6    RSQLite_0.9-4         DBI_0.2-5
 [7] AnnotationDbi_1.13.15 Biobase_2.11.9        weaver_1.17.0
[10] codetools_0.2-8       digest_0.4.2

loaded via a namespace (and not attached):
[1] Matrix_0.999375-47 XML_3.2-0          grid_2.13.0        lattice_0.19-17
[5] xtable_1.5-6


On Tue, Mar 8, 2011 at 12:13 PM, Biase, Fernando <biase at illinois.edu> wrote:
> Hi everyone,
>
> I have a list of ENSEMBL gene _IDS  and a list with NCBI gene_IDs. I need to find which  ids correspond to genes in both list (overlapping genes) and each genes are in each one of them but not present in the other list (non-overlapping genes).
> Can anyone give me some advice on this task? Or indicate a material do read?
> In case it is relevant, the organism is Bos taurus.
>
> Thanks in advance,
> Fernando
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list