[R] sequence clustering and assembly

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 15 14:33:11 CEST 2010

Hi Bogdan --

On 04/14/2010 08:19 PM, Bogdan Tanasa wrote:
> Dear all,
> please could you suggest any R functions or packages (or external
> programs), that
likely you'll have more luck on the Bioconductor mailing list,



> a. take as input a large number (> 10 000) of short 20-30 nt
> sequences, and do sequence assembly, to reconstruct larger (extended)
> 30-50 sequences ?

I don't know of any sequence assemblers in R; velvet would be a first
stop third party tool but it sounds like you have some fairly specific

> b. take as input a larger number of sequences (100 000 - 1 mil) and
> cluster these sequences in distinct classes based on the sequence
> similarity  ?

The Biostrings package has various functions to calculate edit distance,
which might form the input to familiar R clustering algorithms. See
installation instructions at


This thread


might suggest some directions.


> thanks a lot,
> bogdan
> [[alternative HTML version deleted]]
> ______________________________________________ R-help at r-project.org
> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code.

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

More information about the R-help mailing list