[BioC] RE : Question about translate funciton in Biostrings package

ligang luzifer.li at gmail.com
Tue Mar 22 04:26:17 CET 2011


Simon Noël <simon.noel.2 at ...> writes:

> 
> Hi,
> 
> There is my understanting of the situation
> 
> In DNA, there are some time ambiguities in nucleic acide sequence.  Because an
aa may have many codon,
> sometime swiching an A for a C for exemple won't do any big difference.  That
where ambiguities letters are
> used.  Each organism have a prefered codon for each aa, and that's helping to
find mutation when an other
> codon for the same aa is used.  If you simply want an aa sequence, replacing
the ambiguities letters by one of
> the possible an won't do any difference.  If it's for doing phylogenic
analysis, there a difference.  From
> what I know from physogenic analysis and what that package do, i think that's
not what is intended to be done here.
> 
> A solution can be to replace manualy each ambiguities letters by one of his
correspondian nucleic acide. 
> After that, the function will work well...  But an other possibility is to
simply add new parameter to it. 
> You say that there no universal convention for the ambiguities letters...  But
the user should know what is
> the convention for his sequence.  So if my understanding is correct, adding
new parameters to specify wich
> ambiguities letters may be find and by wich nucleic acide do the replacement
should fix the function.
> 
> Am I right?
> 
> Simon Noël
> CdeC
> 
> ________________________________________
> De : bioconductor-bounces at ...
> [bioconductor-bounces at ...] de la part de Pages, Herve [hpages at ...]
> Date d'envoi : 18 mars 2011 01:57
> À : ligang
> Cc : bioconductor at ...
> Objet : Re: [BioC] Question about translate funciton in Biostrings package
> 
> Hi LiGang,
> 
> It's not clear to me what translate() should do when the input
> contains ambiguity letters. I can see that for some ambiguities
> in the input, the output won't be affected. Like in your first
> example, replacing M by either A or C produces the same ouput:
> 
> > translate(DNAString("AACTGTCGACCC"))
>   4-letter "AAString" instance
> seq: NCRP
> > translate(DNAString("AACTGTCGCCCC"))
>   4-letter "AAString" instance
> seq: NCRP
> 
> So yes I could add support for this.
> 
> Otherwise, in general, what to do? Should the output contain letters
> representing ambiguous amino acids? The problem is that last time I
> checked I was not able to find "official" ambiguity codes for amino
> acids that would represent all possible ambiguities in the protein
> sequence resulting from all possible ambiguities in the DNA sequence.
> 
> Can you please clarify what your question is?
> 
> Thanks,
> H.
> 
> ----- Original Message -----
> From: "ligang" <luzifer.li at ...>
> To: bioconductor at ...
> Sent: Thursday, March 17, 2011 10:23:15 PM
> Subject: [BioC] Question about translate funciton in Biostrings package
> 
> Dear list,
> 
> I'm using "tanslate" function in "Biostrings" package to translate DNA sequence
> in proteins.
> 
> It did well when the base letter is "A/G/C/T"
> 
> But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M",
> "tanslate" function did not work, for example:
> 
> translate(DNAString("AACTGTCGMCCC"))
> #Error in translate(DNAStringSet(x)) : not a base at pos 9
> 
> translate(DNAString("AACTGNTCG"))
> #Error in translate(DNAStringSet(x)) : not a base at pos 6
> 
> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=Chinese_People's Republic of China.936
> LC_CTYPE=Chinese_People's
> Republic of China.936    LC_MONETARY=Chinese_People's Republic of China.936
> [4] LC_NUMERIC=C                                       LC_TIME=Chinese_People's
> Republic of China.936
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Biostrings_2.18.2 IRanges_1.8.9
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 tools_2.12.1
> 
> ---
> LiGang
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 



For some tools such as  the translate tool at
'http://expasy.org/tools/dna.html', for DNAString "TTN", expasy tool return "X".

and my question is
>translate(DNAString("TTN")) 
could it return "X"? 
in Biostrings package, "X" is an accptable letter of AAString, for example:
AAString("XXXARN")


of course, It would be better if the 'translate' function can be more flexible,
for example 

translate(DNAString("TCN"))
##because "TCA","TCG","TCC","TCT"all translate to 'Ser',could above command
return "S"?

translate(DNAString("TTY"))  
###because both "TCC" and "TCT"  translate to 'Phe', could the above command
return "F"?

---
LiGang



More information about the Bioconductor mailing list