[BioC] RE : Question about translate funciton in Biostrings package
ligang
luzifer.li at gmail.com
Tue Mar 22 04:26:17 CET 2011
Simon Noël <simon.noel.2 at ...> writes:
>
> Hi,
>
> There is my understanting of the situation
>
> In DNA, there are some time ambiguities in nucleic acide sequence. Because an
aa may have many codon,
> sometime swiching an A for a C for exemple won't do any big difference. That
where ambiguities letters are
> used. Each organism have a prefered codon for each aa, and that's helping to
find mutation when an other
> codon for the same aa is used. If you simply want an aa sequence, replacing
the ambiguities letters by one of
> the possible an won't do any difference. If it's for doing phylogenic
analysis, there a difference. From
> what I know from physogenic analysis and what that package do, i think that's
not what is intended to be done here.
>
> A solution can be to replace manualy each ambiguities letters by one of his
correspondian nucleic acide.
> After that, the function will work well... But an other possibility is to
simply add new parameter to it.
> You say that there no universal convention for the ambiguities letters... But
the user should know what is
> the convention for his sequence. So if my understanding is correct, adding
new parameters to specify wich
> ambiguities letters may be find and by wich nucleic acide do the replacement
should fix the function.
>
> Am I right?
>
> Simon Noël
> CdeC
>
> ________________________________________
> De : bioconductor-bounces at ...
> [bioconductor-bounces at ...] de la part de Pages, Herve [hpages at ...]
> Date d'envoi : 18 mars 2011 01:57
> À : ligang
> Cc : bioconductor at ...
> Objet : Re: [BioC] Question about translate funciton in Biostrings package
>
> Hi LiGang,
>
> It's not clear to me what translate() should do when the input
> contains ambiguity letters. I can see that for some ambiguities
> in the input, the output won't be affected. Like in your first
> example, replacing M by either A or C produces the same ouput:
>
> > translate(DNAString("AACTGTCGACCC"))
> 4-letter "AAString" instance
> seq: NCRP
> > translate(DNAString("AACTGTCGCCCC"))
> 4-letter "AAString" instance
> seq: NCRP
>
> So yes I could add support for this.
>
> Otherwise, in general, what to do? Should the output contain letters
> representing ambiguous amino acids? The problem is that last time I
> checked I was not able to find "official" ambiguity codes for amino
> acids that would represent all possible ambiguities in the protein
> sequence resulting from all possible ambiguities in the DNA sequence.
>
> Can you please clarify what your question is?
>
> Thanks,
> H.
>
> ----- Original Message -----
> From: "ligang" <luzifer.li at ...>
> To: bioconductor at ...
> Sent: Thursday, March 17, 2011 10:23:15 PM
> Subject: [BioC] Question about translate funciton in Biostrings package
>
> Dear list,
>
> I'm using "tanslate" function in "Biostrings" package to translate DNA sequence
> in proteins.
>
> It did well when the base letter is "A/G/C/T"
>
> But while the DNA sequence contain nucleotide ambiguity codes such as "N"/"M",
> "tanslate" function did not work, for example:
>
> translate(DNAString("AACTGTCGMCCC"))
> #Error in translate(DNAStringSet(x)) : not a base at pos 9
>
> translate(DNAString("AACTGNTCG"))
> #Error in translate(DNAStringSet(x)) : not a base at pos 6
>
> sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=Chinese_People's Republic of China.936
> LC_CTYPE=Chinese_People's
> Republic of China.936 LC_MONETARY=Chinese_People's Republic of China.936
> [4] LC_NUMERIC=C LC_TIME=Chinese_People's
> Republic of China.936
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.18.2 IRanges_1.8.9
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.10.0 tools_2.12.1
>
> ---
> LiGang
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
For some tools such as the translate tool at
'http://expasy.org/tools/dna.html', for DNAString "TTN", expasy tool return "X".
and my question is
>translate(DNAString("TTN"))
could it return "X"?
in Biostrings package, "X" is an accptable letter of AAString, for example:
AAString("XXXARN")
of course, It would be better if the 'translate' function can be more flexible,
for example
translate(DNAString("TCN"))
##because "TCA","TCG","TCC","TCT"all translate to 'Ser',could above command
return "S"?
translate(DNAString("TTY"))
###because both "TCC" and "TCT" translate to 'Phe', could the above command
return "F"?
---
LiGang
More information about the Bioconductor
mailing list