[R] Re: clustering polypeptide sequences
ucgamdo@ucl.ac.uk
ucgamdo at ucl.ac.uk
Mon Sep 8 14:13:09 CEST 2003
Hi Peter,
You didn't give a very specific example, but it seems to me that what
you wish to do is not really complicated. I suppose you have created a
table of sequences vs. say hyprophobicity, charge, etc..., something like...
seq hydroph arom
b0001 0.104762 0.000000
b0002 0.035122 0.065854
b0003 0.024193 0.070968
b0004 -0.096729 0.084112
b0005 -0.973469 0.091837
b0006 -0.402713 0.108527
b0007 0.680672 0.123950
b0008 -0.209779 0.072555
b0009 -0.013334 0.046154
b0010 0.952128 0.143617
suppose you have these data into a data frame called myseqs [see the R
documentation in how to upload these data, you can try > myseqs <-
edit(read.table()) ]
# you need to load the necessary libraries
library(mva) # basic clustering
library(cluster) # more clustering algorithms
# then you need to calculate the 'distances' between sequences
myseqs.d <- dist(myseqs) # this creates the euclidean distance matrix, try
help(dist) for more info
# then we perform a hierarchical cluster
myseqs.clus <- hclust(myseqs.d)
# now checkout your results
plot(myseqs.clus) # hey! you see how easy it is?
# the documentation for hlcust contains much more info
# other fancy clustering algorithms
myseqs.pam <- pam(myseqs, k = 2)
plot(myseqs.pam)
I hope this is of any help.
More information about the R-help
mailing list