[BioC] Group millions of the same DNA sequences?
Stijn van Dongen
stijn at ebi.ac.uk
Thu Nov 18 10:23:53 CET 2010
On Thu, Nov 18, 2010 at 09:21:07AM +0800, Xiaohui Wu wrote:
> Thank you Aaron! Till now, sort and uniq may be the easiest way to do this.
> For clustering, I don't think assembler is suitable for my case. I want to
> cluster similar reads to get different clusters, each cluster has some reads,
> and do further analysis.
about the clustering, an approach like
Fast approximate hierarchical clustering using similarity heuristics
Meelis Kull and Jaak Vilo
could be worthwhile. If the similarities obey the metric inequality,
it should not be necessary to do all-against-all comparisons.
best,
Stijn
--
Stijn van Dongen >8< -o) O< forename pronunciation: [Stan]
EMBL-EBI /\\ Tel: +44-(0)1223-492675
Hinxton, Cambridge, CB10 1SD, UK _\_/ http://micans.org/stijn
More information about the Bioconductor
mailing list