[R] Performing basic Multiple Sequence Alignment in R?

Mike Marchywka marchywka at hotmail.com
Tue Dec 21 23:03:17 CET 2010

> From: tal.galili at gmail.com
> Date: Tue, 21 Dec 2010 20:17:18 +0200
> Subject: Re: [R] Performing basic Multiple Sequence Alignment in R?
> To: r-help at r-project.org
> Dear Mike and Thomas,
> From what I gathered here (Thanks to Joris Meys):
> http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434
> There is an R interface to the MUSCLE algorithm in the bio3d package
> (function seqaln()).
> But not one for clustal.
> I will probably end up using pairwiseAlignment on pairs of allignments
> with some sort of stopping rules (I'll have to play with it to see how
> it works).



Certainly if you are flexible and can use whatever may be close in R that
is fine but I seem to recall that exact string matching was a fast and 
interesting way to go and maybe some of the authors above, in the interest
of promoting their work, would help implement an R version if there is demand.

I seem to recall I did something like building indexes of the strings to be aligned
first, finding substrings that were unique to a given string but appeared only
once in each of the sequences to be aligned ( this was the most restrictive criterion
but you can imagine how to make it more accomodating). Now that you got me started,
up front tokenizing or compiling of input sequences ( usually no more than indexing
them in some way ) made many later operations like alignment go faster. This
may have ended up being similar to BLAST but now I can't really recall. Anyway,
my point here is that some where in R there may be packages that
generate intermediate forms useful across disciplines- mining data from
text, linquistics, or macromolecule analysis.  In fact, the indexing process 
helps find things that have migrated a long ways from their original place
and there are probably other non-alignment related things you could
get out of the approach. 

> Thank you all for your answers.
> It is always helpful to from others if something was already
> implemented in R or not.
> Best,
> Tal
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |
> 972-52-7275845
> Read me: www.talgalili.com (Hebrew) |
> www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
> On Tue, Dec 21, 2010 at 2:44 PM, Mike Marchywka
> > wrote:
> e came
> here with a task and was pointed to bio packages but I
> thought there m

More information about the R-help mailing list