[BioC] How to get position for each gene ID/gene symbol instead of position for each transcript
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Aug 25 04:41:47 CEST 2010
Sorry:
> You can do this pretty "simply" with GenomicFeatures, if you want to
> stick with that:
>
> R> txdb <- loadFeatures('your.transcript.db')
> R> xcripts <- transcriptsBy(txdb, by='gene')
>
> ## This part is really slow -- this will be subject of next email
> R> gene.bounds <- seqapply(xcripts, reduce)
Should have used `range` instead of `reduce` here:
R> gene.bounds <- seqapply(xcripts, range)
The rest is the same ...
> the names() of gene.bounds is the entrez.id of the gene. You can use
> the org.Hs.eg.db pacakges
>
> R> library(org.Hs.eg.db)
> R> symbols <- mget(names(gene.bounds), org.Hs.egSYMBOL, ifnotfound=NA)
>
> symbols will now be a list (names are entrez ids, values are the gene
> symbols) that you can manipulate in "the standard R way"
>
> Hope that helps,
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list