[BioC] Calculating alignment scores from aligned sequences

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Apr 17 23:07:59 CEST 2012


Hi,

I thought I'd add to the pairwiseAlignment-related activity on the
list today, so here's a bit of navel gazing.

I *think* I want to calculate alignment scores for subsets of already
aligned sequences and I don't think there's an easy and efficient way
to do this without having to realign the subsets of the strings I'm
interested in.

For instance, say I have a pairwise alignment `pa`, and the result of
`compareStrings(pattern(pa), subject(pa))` looks like something like
this:

"GTA?TTT?A-----TTTCATATC?TGT?TC?------------------------------------------------------CAT"

Given values for the substitutionMatrix, gapOpenening, and
gapExtension penalties used during the pairwiseAlignment, one could
(in principle), calculate the alignment score  of the first half vs.
the second half, or (say) running windows of length N along the
alignment.

Like I said, I *think* this is what I want to do and I was just
curious if I'm not missing something in Biostrings that I can wire
into ... although it's straight-forward enough to write something
myself. In the gapOpening == gapExtension case, I can imagine
massaging the `compareString` to an integer vector and then doing a
simple matrix multiplication into an augmented substitutionMatrix, but
I think the more common gapOpening != gapExtension case suggests I
should really write a helper function in C(++).

Thanks,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list