# [R] Sequence analysis

arun smartpink111 at yahoo.com
Fri Apr 19 15:59:23 CEST 2013

```
Hi,

May be

library(Biostrings) from Bioconductor helps you.
source("http://bioconductor.org/biocLite.R") biocLite("Biostrings")
?matchPattern()
?letterFrequency()
vec1<- "ababbbassdaa"
alphabetFrequency(DNAString(vec1))
#A C G T M R W S Y K V H D B N - +
#5 0 0 0 0 0 0 2 0 0 0 0 1 4 0 0 0

letterFrequency(DNAStringSet(vec1),letters="AC",OR=0)
#    A C
#[1,] 5 0

longestConsecutive(c(vec1,vec2),"b")
#[1] 3 2

matchPattern(DNAString("AB"),DNAString(vec1))
# Views on a 12-letter DNAString subject
#subject: ABABBBASSDAA
#views:
#   start end width
#[1]     1   2     2 [AB]
#[2]     3   4     2 [AB]

Also,

library(seqinr)
lapply(seq(s2c(vec2)),function(i) table(splitseq(s2c(vec2),word=i)))
#[[1]]
#
#a b d f g s
#1 4 2 2 2 4
#
#[[2]]
#
#ad bb bs df fg gs sb
# 1  1  1  1  1  1  1
---------------------------------------
A.K.

----- Original Message -----
From: ben1983 <ben_thompson at talk21.com>
To: r-help at r-project.org
Cc:
Sent: Friday, April 19, 2013 7:21 AM
Subject: [R] Sequence analysis

Hiya,
I am trying to look at the similarities between a number of
sequences, for example i am trying to see how similar "ababbbassdaa" is to
"addffggssbbsbbs" I was wondering is the some way for me to see how similar
they are in terms of, for example, number of a's, number of b's, how often a
and ab are consecutive, how often abab is together etc.
Any advice would be really useful......any kind of shove in the right
direction would be amazing! I've tried doing basic alignments but i think
this is loosing quite a lot of information.
Many thanks,
Ben

--
View this message in context: http://r.789695.n4.nabble.com/Sequence-analysis-tp4664693.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help