[BioC] find overlaping genes in ENSEMBL gene ID list and NCBI gene ID list
Vincent Carey
stvjc at channing.harvard.edu
Tue Mar 8 20:49:39 CET 2011
There are various approaches using Bioconductor. The fundamental
resource is the package org.Bt.eg.db, which you can acquire using
biocLite().
You can find associations between ENSEMBL ids and Entrez ids using
mappings in that package.
You may find GSEABase useful also. For example
> dput(ensex)
c("ENSBTAG00000000005", "ENSBTAG00000000008", "ENSBTAG00000000009",
"ENSBTAG00000000010", "ENSBTAG00000000011", "ENSBTAG00000000012",
"ENSBTAG00000000013", "ENSBTAG00000000014", "ENSBTAG00000000015",
"ENSBTAG00000000016", "ENSBTAG00000000018", "ENSBTAG00000000019",
"ENSBTAG00000000020", "ENSBTAG00000000021", "ENSBTAG00000000022",
"ENSBTAG00000000023", "ENSBTAG00000000024", "ENSBTAG00000000025",
"ENSBTAG00000000026", "ENSBTAG00000000027")
> e1 = GeneSet(ensex, geneIdType=ENSEMBLIdentifier("org.Bt.eg.db"))
> e1
setName: NA
geneIds: ENSBTAG00000000005, ENSBTAG00000000008, ...,
ENSBTAG00000000027 (total: 20)
geneIdType: ENSEMBL (org.Bt.eg.db)
collectionType: Null
details: use 'details(object)'
> g1 = e1
> geneIdType(g1) = EntrezIdentifier("org.Bt.eg.db")
> g1
setName: NA
geneIds: 282136, 539250, ..., 512788 (total: 21)
geneIdType: EntrezId (org.Bt.eg.db)
collectionType: Null
details: use 'details(object)'
This shows that there are 21 Entrez IDs associated with the 20 ENSEMBL
ids given above.
After converting sets to common ID type, you can use intersect,
setdiff methods to answer some of the
questions you pose.
> sessionInfo()
R version 2.13.0 Under development (unstable) (2011-03-01 r54628)
Platform: x86_64-apple-darwin10.4.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices datasets tools utils methods
[8] base
other attached packages:
[1] GSEABase_1.13.3 graph_1.29.3 annotate_1.29.3
[4] org.Bt.eg.db_2.4.6 RSQLite_0.9-4 DBI_0.2-5
[7] AnnotationDbi_1.13.15 Biobase_2.11.9 weaver_1.17.0
[10] codetools_0.2-8 digest_0.4.2
loaded via a namespace (and not attached):
[1] Matrix_0.999375-47 XML_3.2-0 grid_2.13.0 lattice_0.19-17
[5] xtable_1.5-6
On Tue, Mar 8, 2011 at 12:13 PM, Biase, Fernando <biase at illinois.edu> wrote:
> Hi everyone,
>
> I have a list of ENSEMBL gene _IDS and a list with NCBI gene_IDs. I need to find which ids correspond to genes in both list (overlapping genes) and each genes are in each one of them but not present in the other list (non-overlapping genes).
> Can anyone give me some advice on this task? Or indicate a material do read?
> In case it is relevant, the organism is Bos taurus.
>
> Thanks in advance,
> Fernando
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list