# [R] labels and counting

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Fri Dec 31 00:37:35 CET 2004

```On 30-Dec-04 dax42 wrote:
> Hello,
>
> I have got the following problem:
> given is a large string sequence consisting of the four letters "A" "C"
> "G" and "T" (as before). Additionally, I have got a second string
> sequence of the same length giving a label for each character. The
> labels are "+" and "-".
>
> Now I would like to create an 8x8 matrix which contains the numbers on
> how often we see all possible pairwise combinations, for example "A"
> with the label "+" followed by "C" with the label "+" or "T"->"C" with
> the labels "-"->"+" etc.
>
> Of course I can just use loops to "walk" along the sequence, but as you
> have shown me so much better solutions in response to my last mail, I
> thought you might be able to help and improve my R skills even further
> ..
>
> Cheers, Winnie

Well, flattery and all that ...

Anyway, the following is an example of how it can be done.
You can cut&paste all the following.

# Artificial example of pairs, one of "A","C","T","G" paired
#   with one of "-","+"
S<-sample(c("A","C","G","T"),1000,replace=TRUE)
T<-sample(c("-","+"),1000,replace=TRUE)
U<-apply(cbind(S,T),1,paste,collapse="")

U[1:10]
## [1] "C+" "T-" "G+" "T+" "C+" "T+" "T-" "C+" "C-" "C-"
## Shows the first few of the pairs

# constructs 4-character items, each consisting of a pair
#   (e.g. "C+") pasted to its successor (e.g. "T-")
V<-apply(cbind(U[1:999],U[2:1000]),1,paste,collapse="")

V[1:7]
## [1] "C+T-" "T-G+" "G+T+" "T+C+" "C+T+" "T+T-" "T-C+"
## Shows the first few of these. Compare with U above.

## Now this is where the real gurus can show their mettle.
##
## One way to get the counts is simply

table(V)

## but this is not a nice layout. Another is the loop:

for(i in sort(unique(V))){print(paste(i,":",sum(V==i)))}

## and I had hoped to think of a solution that did not
## involve a vulgar loop but would also avoid the unhelpful
## layout of table(V). (This is not your 8x8 matrix, but
## converting the output of the loop to one should not be
## impossible ... )

Pending the elegant solution which someone will come up with,
working through the above and consulting "?" for anything
not understood will reveal a few things about R ...

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 30-Dec-04                                       Time: 23:37:35
------------------------------ XFMail ------------------------------

```