Dr. Thomas Isenbarger
isen at plantpath.wisc.edu
Wed Dec 8 22:12:26 CET 2004
I have a matrix of similarity scores that I want to convert into a
matrix of dissimilarity scores so that I can apply some clustering
methods to the data. That is, high values in my matrix signify
similarity and low values (zero being the lowest) signify no
similarity. What functions/options in R or its packages are available
for making this kind of transformation of a matrix?
Specifically, I am a molecular biologist. I have a set of 700+
nucleotide sequences i want to group into clusters based on sequence
similarities. There is a wide range of sequences in the set, some of
which are homologous to other sequences in the set. I want to use
clustering to identify these groups.
If the sequences were related and good be trimmed to the same length, I
would do an alignment and then use phylip (or some other distance
method) to create a distance matrix, but since my sequences are
unrelated and cannot be trimmed to the same length, I am at a loss for
what to do.
For a set with so many unrelated sequences of different lengths, the
only thing I have been able to is an all-against-all BLAST to create
the matrix, but this gives high scores for similarities, not high
scores for dissimilarities. The only thought I had was to use the
reciprocal of the BLAST score as some perverse measure of distance.
