[R] "ACCTGMX" to "1223400" in R?
David Winsemius
dwinsemius at comcast.net
Tue Jul 20 03:37:01 CEST 2010
On Jul 19, 2010, at 5:31 PM, John1983 wrote:
>
> Hi,
>
> I am a newbie in R and was working on some DNA data represented as
> strings
> of A,C,T and G (also wild-character like M and X). I use the
> Bioconductor
> package in R.
Well, I guess it's sort of a "meta" package, but it is really more of
a subculture. It also has its own mailing list.
> Currently I need to convert a string of the form "ACCTGMX" to
> "1223400" i.e. A is replaced by 1, C with 2, T with 3, G with 4 and
> any
> other character with a 0. I checked with 'replace' and also with a
> function
> called 'copySubstitute' found in the Biobase package but this is
> only for
> files.
> The data here is a string ("ACCTGMX" ) and we need to convert it to
> yet
> another string ("1223400"). Now I use the strsplit function to split
> "ACCTGM" into "A" "C" "C" "T" "G" "M" and then use 'which' to assign
> the
> corresponding numbers.
> Is there a faster way to do this or some function I can make use of?
> tst <- rep( "ACCTGMX", 5)
> newtst <- gsub("A", "1", tst)
> newtst <- gsub("C", "2", newtst)
> newtst <- gsub("T", "3", newtst)
> newtst <- gsub("G", "4", newtst)
> newtst <- gsub("[[:alpha:]]", "0", newtst)
> newtst
[1] "1223400" "1223400" "1223400" "1223400" "1223400"
There is also a rollaply function in teh zoo and an strapply function
in the gsubfn package that might be even more powerful, but I am
insufficiently talented to give you a one-liner using them.
>
> Please advise.
>
> Thank you.
> --
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list