[R] "ACCTGMX" to "1223400" in R?

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jul 20 04:46:27 CEST 2010


On Mon, Jul 19, 2010 at 5:31 PM, John1983 <sandhya_prabhakaran at yahoo.com> wrote:
>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as strings
> of A,C,T and G (also wild-character like M and X). I use the Bioconductor
> package in R. Currently I need to convert a string of the form "ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4 and any
> other character with a 0. I checked with 'replace' and also with a function
> called 'copySubstitute' found in the Biobase package but this is only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it to yet
> another string ("1223400"). Now I use the strsplit function to split
> "ACCTGM" into "A" "C" "C" "T" "G" "M" and then use 'which' to assign the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?
>

Here are a few alternatives.  The first uses chartr which translates
the ith character
in the first string to the ith character in second string.   If speed
is a consideration
then note that this alternative is the fastest by far.

The second alternative translates just ACGT using chartr and then uses gsub to
translate everything else to 0.  This alternative like the prior only
uses core R
functionality.  This solution is intermediate in speed and simplicity
between the
other two.

The third uses gsubfn which is like gsub but allows the replacement to
be a list.
In that case if the match equals a name in the list it is replaced
with that component
and if no name is matched then the unnamed component at the end is used as the
replacement.  This one has the advantage that it is particularly
simple to specify.

#1
chartr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "10200040000000000003000000", "ACCTGMX")

#2
gsub("[^1-4]", "0", chartr("ACGT", "1234", "ACCTGMX"))

#3
library(gsubfn)
gsubfn(".", list(A = 1, C = 2, T = 3, G = 4, 0), "ACCTGMX")



More information about the R-help mailing list