[BioC] BSgenomes and protein sequences

Valerie Obenchain vobencha at fhcrc.org
Wed May 2 21:02:45 CEST 2012


Hi Boris,

You can accomplish this by extracting the coding regions from the 
BSgenome then translating the sequences. A similar example is on the 
extractTranscriptsFromGenome() man page. See ?extractTranscriptsFromGenome.

(1) Create your own TranscriptDb object with one of

     makeTranscriptDb()
     makeTranscriptDbFromUCSC()
     makeTranscriptDbFromBiomart()

or load an existing txdb

     library(GenomicFeatures)
     library(TxDb.Hsapiens.UCSC.hg19.knownGene)
     txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene


(2) Create a GRangesList of coding regions grouped by transcripts.
The use of use.names=TRUE uses transcript names as labels instead of the 
internal transcript ids.

     cdsbytx <- cdsBy(txdb, "tx", use.names=TRUE)

Extract the corresponding sequences from a BSgenome. Be sure to use a 
BSgenome that is compatible with the TranscriptDb (i.e., both hg19) :

     library(BSgenome.Hsapiens.UCSC.hg19)
     cds_seqs <- extractTranscriptsFromGenome(Hsapiens, cdsbytx)

Sanity check :

     stopifnot(identical(unname(sapply(width(cdsbytx), sum)), 
width(cds_seqs)))

(3) translate :

All BSgenome objects store the"+" strand only. The 
extractTRanscriptsFromGenome() functions is strand-aware and takes care 
of reverse complementing sequences on the "-" strand so the sequences 
returned can be passed to the translate() function. Notice that you see 
the stop codon at the end of (almost) all sequences.

     aa <- translate(cds_seqs)


Valerie

On 05/01/2012 11:40 AM, Zybaylov, Boris L wrote:
> Dear list,
>
>
>
> If I need to access all human transcripts I can use BSgenomes -
>
> but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this?
>
>
>
> Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way?
>
>
>
> Thank you very much for you help!
>
>
>
> Dr. Boris Zybaylov
>
> Instructor
> Department of Biochemistry and Molecular Biology
> University of Arkansas Medical Sciences
> Little Rock, AR
> 1-501-686-7254
> Confidentiality Notice: This e-mail message, including a...{{dropped:10}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list