[R] similarity matrix conversion to dissimilarity

Doran, Harold HDoran at air.org
Wed Dec 8 23:43:38 CET 2004


 Dear Sir:

I posed a similar question a few months back and received many
responses. Check the searchable archives at R Cran for those helpful
email. I did a search for 'similarity matrix' and many results were
returned.

Harold

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dr. Thomas
Isenbarger
Sent: Wednesday, December 08, 2004 4:12 PM
To: r-help at stat.math.ethz.ch
Subject: [R] similarity matrix conversion to dissimilarity

I have a matrix of similarity scores that I want to convert into a
matrix of dissimilarity scores so that I can apply some clustering
methods to the data.  That is, high values in my matrix signify
similarity and low values (zero being the lowest) signify no similarity.
What functions/options in R or its packages are available for making
this kind of transformation of a matrix?

Specifically, I am a molecular biologist.  I have a set of 700+
nucleotide sequences i want to group into clusters based on sequence
similarities.  There is a wide range of sequences in the set, some of
which are homologous to other sequences in the set.  I want to use
clustering to identify these groups.

If the sequences were related and good be trimmed to the same length, I
would do an alignment and then use phylip (or some other distance
method) to create a distance matrix, but since my sequences are
unrelated and cannot be trimmed to the same length, I am at a loss for
what to do.

For a set with so many unrelated sequences of different lengths, the
only thing I have been able to is an all-against-all BLAST to create the
matrix, but this gives high scores for similarities, not high scores for
dissimilarities.  The only thought I had was to use the reciprocal of
the BLAST score as some perverse measure of distance.

I am not subscribed to the list, so can I ask for responses directly to
my email address?

Thank-you,
Tom Isenbarger


--
isen at plantpath.wisc.edu
thomas a isenbarger
(608) 265-0850

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list