[BioC] RE : Question about translate funciton in Biostrings package

Simon Noël simon.noel.2 at ulaval.ca
Fri Mar 18 13:01:02 CET 2011


Hi,

There is my understanting of the situation

In DNA, there are some time ambiguities in nucleic acide sequence.  Because an aa may have many codon, sometime swiching an A for a C for exemple won't do any big difference.  That where ambiguities letters are used.  Each organism have a prefered codon for each aa, and that's helping to find mutation when an other codon for the same aa is used.  If you simply want an aa sequence, replacing the ambiguities letters by one of the possible an won't do any difference.  If it's for doing phylogenic analysis, there a difference.  From what I know from physogenic analysis and what that package do, i think that's not what is intended to be done here.

A solution can be to replace manualy each ambiguities letters by one of his correspondian nucleic acide.  After that, the function will work well...  But an other possibility is to simply add new parameter to it.  You say that there no universal convention for the ambiguities letters...  But the user should know what is the convention for his sequence.  So if my understanding is correct, adding new parameters to specify wich ambiguities letters may be find and by wich nucleic acide do the replacement should fix the function.

Am I right?

Simon Noël
CdeC

________________________________________
De : bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] de la part de Pages, Herve [hpages at fhcrc.org]
Date d'envoi : 18 mars 2011 01:57
À : ligang
Cc : bioconductor at stat.math.ethz.ch
Objet : Re: [BioC] Question about translate funciton in Biostrings package

Hi LiGang,

It's not clear to me what translate() should do when the input
contains ambiguity letters. I can see that for some ambiguities
in the input, the output won't be affected. Like in your first
example, replacing M by either A or C produces the same ouput:

> translate(DNAString("AACTGTCGACCC"))
  4-letter "AAString" instance
seq: NCRP
> translate(DNAString("AACTGTCGCCCC"))
  4-letter "AAString" instance
seq: NCRP

So yes I could add support for this.

Otherwise, in general, what to do? Should the output contain letters
representing ambiguous amino acids? The problem is that last time I
checked I was not able to find "official" ambiguity codes for amino
acids that would represent all possible ambiguities in the protein
sequence resulting from all possible ambiguities in the DNA sequence.

Can you please clarify what your question is?

Thanks,
H.


----- Original Message -----
From: "ligang" <luzifer.li at gmail.com>
To: bioconductor at stat.math.ethz.ch
Sent: Thursday, March 17, 2011 10:23:15 PM
Subject: [BioC] Question about translate funciton in Biostrings package

Dear list,

I'm using "tanslate" function in "Biostrings" package to translate DNA sequence
in proteins.

It did well when the base letter is "A/G/C/T"

But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M",
"tanslate" function did not work, for example:

translate(DNAString("AACTGTCGMCCC"))
#Error in translate(DNAStringSet(x)) : not a base at pos 9

translate(DNAString("AACTGNTCG"))
#Error in translate(DNAStringSet(x)) : not a base at pos 6


sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Chinese_People's Republic of China.936
LC_CTYPE=Chinese_People's
Republic of China.936    LC_MONETARY=Chinese_People's Republic of China.936
[4] LC_NUMERIC=C                                       LC_TIME=Chinese_People's
Republic of China.936

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.18.2 IRanges_1.8.9

loaded via a namespace (and not attached):
[1] Biobase_2.10.0 tools_2.12.1

---
LiGang

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


More information about the Bioconductor mailing list