[BioC] retrieving external Gene IDs from TranscriptDB Object
Ugo Borello
ugo.borello at inserm.fr
Mon May 6 12:18:50 CEST 2013
Dear Stefanie,
I just learned, thanks to Marc Carlson, an easy way to do what you want.
It is nicely described in this vignette, section 05 (and 03):
http://www.bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/in
st/doc/IntroToAnnotationPackages.pdf
I hope this help.
Ugo
> From: Stefanie Tauber <stefanie.tauber at univie.ac.at>
> Date: Mon, 6 May 2013 11:25:09 +0200
> To: <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] retrieving external Gene IDs from TranscriptDB Object
>
> Dear List,
>
> I have created a TranscriptDB for yeast as follows:
>
> library(GenomicFeatures)
> library(biomaRt)
>
> ## create yeast DB
> myDB <- makeTranscriptDbFromBiomart(biomart = "ensembl", dataset =
> "scerevisiae_gene_ensembl", circ_seqs = c(DEFAULT_CIRC_SEQS, "Mito"))
> myDBx <- cdsBy(myDB,by = "tx",use.names = TRUE)
>
> Now, I would like to retrieve the external gene ids.
> Is this the most generic way?
>
> # select mart and dataset
> mymart = useMart("ENSEMBL_MART_ENSEMBL", dataset =
> "scerevisiae_gene_ensembl", host="www.ensembl.org")
>
> # just a selection of transcripts
>
> sel = names(myDBx)[5:6]
>
> getBM(attributes=c("ensembl_transcript_id","external_gene_id"), values =
> sel, filters = "ensembl_transcript_id", mart = mymart)
>
>
> And, when creating a TranscriptDB From UCSC:
>
>
> myDB1 <- makeTranscriptDbFromUCSC(genome = "hg19",tablename = "knownGene")
> myDBx1 <- cdsBy(myDB1,by = "tx",use.names =TRUE)
>
> What would be here the most generic way to retrieve the external gene IDs
> for each transcript ID?
>
> Best,
> Stefanie
>
>
>> sessionInfo()
> R Under development (unstable) (2013-05-02 r62711)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] biomaRt_2.16.0 GenomicFeatures_1.12.1 AnnotationDbi_1.22.3
> [4] Biobase_2.20.0 GenomicRanges_1.12.2 IRanges_1.18.0
> [7] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] Biostrings_2.28.0 bitops_1.0-5 BSgenome_1.28.0 DBI_0.2-6
> [5] RCurl_1.95-4.1 Rsamtools_1.12.2 RSQLite_0.11.3
> rtracklayer_1.20.1
> [9] stats4_3.1.0 tools_3.1.0 XML_3.96-1.1 zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list