[BioC] Question about translate funciton in Biostrings package

Pages, Herve hpages at fhcrc.org
Fri Mar 18 06:57:01 CET 2011


Hi LiGang,

It's not clear to me what translate() should do when the input
contains ambiguity letters. I can see that for some ambiguities
in the input, the output won't be affected. Like in your first
example, replacing M by either A or C produces the same ouput:

> translate(DNAString("AACTGTCGACCC"))
  4-letter "AAString" instance
seq: NCRP
> translate(DNAString("AACTGTCGCCCC"))
  4-letter "AAString" instance
seq: NCRP

So yes I could add support for this.

Otherwise, in general, what to do? Should the output contain letters
representing ambiguous amino acids? The problem is that last time I
checked I was not able to find "official" ambiguity codes for amino
acids that would represent all possible ambiguities in the protein
sequence resulting from all possible ambiguities in the DNA sequence.

Can you please clarify what your question is?

Thanks,
H.


----- Original Message -----
From: "ligang" <luzifer.li at gmail.com>
To: bioconductor at stat.math.ethz.ch
Sent: Thursday, March 17, 2011 10:23:15 PM
Subject: [BioC] Question about translate funciton in Biostrings package

Dear list,

I'm using "tanslate" function in "Biostrings" package to translate DNA sequence
in proteins.

It did well when the base letter is "A/G/C/T"

But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M",
"tanslate" function did not work, for example:

translate(DNAString("AACTGTCGMCCC"))
#Error in translate(DNAStringSet(x)) : not a base at pos 9

translate(DNAString("AACTGNTCG"))
#Error in translate(DNAStringSet(x)) : not a base at pos 6


sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Chinese_People's Republic of China.936  
LC_CTYPE=Chinese_People's
Republic of China.936    LC_MONETARY=Chinese_People's Republic of China.936
[4] LC_NUMERIC=C                                       LC_TIME=Chinese_People's
Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.18.2 IRanges_1.8.9    

loaded via a namespace (and not attached):
[1] Biobase_2.10.0 tools_2.12.1  

---
LiGang

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list