[R] similarity matrix conversion to dissimilarity

Dr. Thomas Isenbarger isen at plantpath.wisc.edu
Wed Dec 8 22:12:26 CET 2004


I have a matrix of similarity scores that I want to convert into a 
matrix of dissimilarity scores so that I can apply some clustering 
methods to the data.  That is, high values in my matrix signify 
similarity and low values (zero being the lowest) signify no 
similarity.  What functions/options in R or its packages are available 
for making this kind of transformation of a matrix?

Specifically, I am a molecular biologist.  I have a set of 700+ 
nucleotide sequences i want to group into clusters based on sequence 
similarities.  There is a wide range of sequences in the set, some of 
which are homologous to other sequences in the set.  I want to use 
clustering to identify these groups.

If the sequences were related and good be trimmed to the same length, I 
would do an alignment and then use phylip (or some other distance 
method) to create a distance matrix, but since my sequences are 
unrelated and cannot be trimmed to the same length, I am at a loss for 
what to do.

For a set with so many unrelated sequences of different lengths, the 
only thing I have been able to is an all-against-all BLAST to create 
the matrix, but this gives high scores for similarities, not high 
scores for dissimilarities.  The only thought I had was to use the 
reciprocal of the BLAST score as some perverse measure of distance.

I am not subscribed to the list, so can I ask for responses directly to 
my email address?

Thank-you,
Tom Isenbarger


--
isen at plantpath.wisc.edu
thomas a isenbarger
(608) 265-0850




More information about the R-help mailing list