[BioC] BSgenomes and protein sequences
Valerie Obenchain
vobencha at fhcrc.org
Wed May 2 21:02:45 CEST 2012
Hi Boris,
You can accomplish this by extracting the coding regions from the
BSgenome then translating the sequences. A similar example is on the
extractTranscriptsFromGenome() man page. See ?extractTranscriptsFromGenome.
(1) Create your own TranscriptDb object with one of
makeTranscriptDb()
makeTranscriptDbFromUCSC()
makeTranscriptDbFromBiomart()
or load an existing txdb
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
(2) Create a GRangesList of coding regions grouped by transcripts.
The use of use.names=TRUE uses transcript names as labels instead of the
internal transcript ids.
cdsbytx <- cdsBy(txdb, "tx", use.names=TRUE)
Extract the corresponding sequences from a BSgenome. Be sure to use a
BSgenome that is compatible with the TranscriptDb (i.e., both hg19) :
library(BSgenome.Hsapiens.UCSC.hg19)
cds_seqs <- extractTranscriptsFromGenome(Hsapiens, cdsbytx)
Sanity check :
stopifnot(identical(unname(sapply(width(cdsbytx), sum)),
width(cds_seqs)))
(3) translate :
All BSgenome objects store the"+" strand only. The
extractTRanscriptsFromGenome() functions is strand-aware and takes care
of reverse complementing sequences on the "-" strand so the sequences
returned can be passed to the translate() function. Notice that you see
the stop codon at the end of (almost) all sequences.
aa <- translate(cds_seqs)
Valerie
On 05/01/2012 11:40 AM, Zybaylov, Boris L wrote:
> Dear list,
>
>
>
> If I need to access all human transcripts I can use BSgenomes -
>
> but what do I need to access all human proteins (amino acid sequences, including hypothetical proteins); what would be the best way to do this?
>
>
>
> Do i have to translate all transcripts from BSgenome.Hsapiens. UCSC.hg19, or is there a better way?
>
>
>
> Thank you very much for you help!
>
>
>
> Dr. Boris Zybaylov
>
> Instructor
> Department of Biochemistry and Molecular Biology
> University of Arkansas Medical Sciences
> Little Rock, AR
> 1-501-686-7254
> Confidentiality Notice: This e-mail message, including a...{{dropped:10}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list