[BioC] How can I extract promoter regions from a list NCBI genbank acc?

Marc Carlson mcarlson at fhcrc.org
Tue Nov 6 00:29:11 CET 2012


Hi Yisong,

Here is one way to solve this problem:

## 1st load up the org package for mouse and get the entrez gene IDs
library(org.Mm.eg.db)
cols(org.Mm.eg.db)
ids = c("NM_144551", "NM_019413")
res = select(org.Mm.eg.db, keys=ids, cols="ENTREZID", keytype="REFSEQ")
res
egs  = res[,"ENTREZID"]

## Then load the knownGene based package that you mentioned earlier. 
(note that the gene IDs used by this package are entrez gene IDs!
library("TxDb.Mmusculus.UCSC.mm9.knownGene")
mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene  ## a shorter name, just for 
convenience
cols(mm9)

## Now at this point you have not really told me what you wanted.
## So lets start with just how you can get information about your genes 
of interest.
## Lets start by just gettting out some of the transcript information 
for these genes:
res2 = select(mm9, keys=egs, 
cols=c("TXNAME","TXSTRAND","TXCHROM","TXSTART","TXEND"), keytype="GENEID")
res2

## From your description it sounds like you wanted a GRanges object with 
the ranges for your promoters.
## In that case you could do it like this:
proms = promoters(mm9)
## Then you can extract the txnames from before...
txnms  = unique(res2[,"TXNAME"])
## And then subset to only the promoters that have the names that you 
want here
myproms = proms[mcols(proms)[,'tx_name'] %in% txnms,]

Now I feel that I should also mention that with the latest version of 
Bioconductor, we have added a new package called OrganismDbi.  So with 
that version you can load a package like Homo.sapiens or Mus.musculus 
and use select() on an object that refers to several annotation packages 
at once.  When that package is pointed to the correct resources, it can 
simplify some of the steps above.  For details on this approach, please 
see the OrganismDbi package.


Hope that helps,


   Marc


On 11/05/2012 12:23 AM, Yisong Zhen wrote:
> Dear All,
>
> I read the GenomicFeatures manual, there is only a example to extract
> promoters from a list EntrezGeneID. So how can I extract a group promoters
> from a list NCBI genbank acc, like "NM_144551"    "NM_019413""? I have
> already installed locally "TxDb.Mmusculus.UCSC.mm9.knownGene".  Thanks.
>
> Yisong
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list