[R] R string functions

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Jun 16 00:35:29 CEST 2011


Hi,

On Wed, Jun 15, 2011 at 4:37 PM, karena <dr.jzhou at gmail.com> wrote:
> Hi,
>
> I have a string "GGGGGGCCCAATCGCAATTCCAATT"
>
> What I want to do is to count the percentage of each letter in the string,
> what string functions can I use to count the number of each letter appearing
> in the string?
>
> For example, the letter "A" appeared 6 times, letter "T" appeared 5 times,
> how can I use a string function to get the these number?

The replies you've already received are already helpful ... in
addition to them, though, I'd suggest you check out the "Biostrings"
package from bioconductor since it looks like you are working with
DNA:

http://www.bioconductor.org/packages/release/bioc/html/Biostrings.html

There are many (many^2) things already implemented in that package
that you will likely want to do with genomic sequences, and done so in
a memory-and-performance efficient manner.

For this particular example:

R> library(Biostrings)
R> x <- DNAString("GGGGGGCCCAATCGCAATTCCAATT")
R> oligonucleotideFrequency(x, 1)
A C G T
6 7 7 5

## And just for fun:
R> oligonucleotideFrequency(x, 2)
AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT
 3  0  0  3  3  3  1  0  0  2  5  0  0  2  0  2

Depending on how much genomic/sequence stuff you are planning to do,
it could be worth your while to invest some time looking into various
functionality the Biostrings (and IRanges) package(s) provides for
you.

Hope that helps,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list