[BioC] How can I extract promoter regions from a list NCBI genbank acc?
Marc Carlson
mcarlson at fhcrc.org
Tue Nov 6 00:29:11 CET 2012
Hi Yisong,
Here is one way to solve this problem:
## 1st load up the org package for mouse and get the entrez gene IDs
library(org.Mm.eg.db)
cols(org.Mm.eg.db)
ids = c("NM_144551", "NM_019413")
res = select(org.Mm.eg.db, keys=ids, cols="ENTREZID", keytype="REFSEQ")
res
egs = res[,"ENTREZID"]
## Then load the knownGene based package that you mentioned earlier.
(note that the gene IDs used by this package are entrez gene IDs!
library("TxDb.Mmusculus.UCSC.mm9.knownGene")
mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene ## a shorter name, just for
convenience
cols(mm9)
## Now at this point you have not really told me what you wanted.
## So lets start with just how you can get information about your genes
of interest.
## Lets start by just gettting out some of the transcript information
for these genes:
res2 = select(mm9, keys=egs,
cols=c("TXNAME","TXSTRAND","TXCHROM","TXSTART","TXEND"), keytype="GENEID")
res2
## From your description it sounds like you wanted a GRanges object with
the ranges for your promoters.
## In that case you could do it like this:
proms = promoters(mm9)
## Then you can extract the txnames from before...
txnms = unique(res2[,"TXNAME"])
## And then subset to only the promoters that have the names that you
want here
myproms = proms[mcols(proms)[,'tx_name'] %in% txnms,]
Now I feel that I should also mention that with the latest version of
Bioconductor, we have added a new package called OrganismDbi. So with
that version you can load a package like Homo.sapiens or Mus.musculus
and use select() on an object that refers to several annotation packages
at once. When that package is pointed to the correct resources, it can
simplify some of the steps above. For details on this approach, please
see the OrganismDbi package.
Hope that helps,
Marc
On 11/05/2012 12:23 AM, Yisong Zhen wrote:
> Dear All,
>
> I read the GenomicFeatures manual, there is only a example to extract
> promoters from a list EntrezGeneID. So how can I extract a group promoters
> from a list NCBI genbank acc, like "NM_144551" "NM_019413""? I have
> already installed locally "TxDb.Mmusculus.UCSC.mm9.knownGene". Thanks.
>
> Yisong
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list