[BioC] semi-supervised clustering

friedrich.leisch at stat.uni-muenchen.de friedrich.leisch at stat.uni-muenchen.de
Fri Oct 26 16:44:43 CEST 2007


>>>>> On Fri, 26 Oct 2007 07:19:45 -0700 (PDT),
>>>>> Tim Smith (TS) wrote:

  > Hi Friedrich,
  > Thanks for the response!

  > I have tried the following:
  > ---------------------------------------------------------
  > nums <- sample(1:300,70)
  > x <- matrix(nums,10,7)
  > mygroups <- c(1,3,4)   # i.e. I would like these 3 rows in 'x' to
  > cluster together

The group vector needs to be a factor or integer vector with the same
length as x has rows, otherwise it will be recycled as in many other
functions. See below for a working example. In the usual setting the
grouping would be a factor variable in your data frame.



  > myfam <- kccaFamily("kmeans", groupFun = "minSumClusters")
  > clres <- kcca(x, k=3, myfam, group=mygroups)

  > --------------------------------------------------------

  > I get the following result:

  >> clres
  > kcca object of family 'kmeans' 

  > call:
  > kcca(x = x, k = 3, family = myfam, group = mygroups)

  > cluster sizes:

  > 1 2 
  > 3 7 


  > I have two questions:

  > i) How do I get the details of the clusters (i.e which points/rows
  > are in which cluster)?

cluster(clres)


  > ii) If k=3, then shouldn't there be 3 clusters?

If a cluster gets empty during the iterations it is removed, so you
can end up with less clusters than you asked for. For grouped
clustering this happens more often than for regular kmeans because of
the re-assignement of group members.

A working example:

set.seed(12)
## same as above
nums <- sample(1:300,70)
x <- matrix(nums,10,7)

## Rows 1, 3 and 4 are in group 1, all other groups contain
## only one observation
mygroups <- c(1,2,1,1,3,4,5,6,7,8)

myfam <- kccaFamily("kmeans", groupFun = "minSumClusters")
clres <- kcca(x, k=3, myfam, group=mygroups)

R> clres
kcca object of family ‘kmeans’ 

call:
kcca(x = x, k = 3, family = myfam, group = mygroups)

cluster sizes:

1 2 3 
3 5 2 

R> table(cluster(clres),mygroups)
   mygroups
    1 2 3 4 5 6 7 8
  1 3 0 0 0 0 0 0 0
  2 0 1 0 1 1 1 1 0
  3 0 0 1 0 0 0 0 1


and all members of group 1 end up in cluster 1 (note: need not be
cluster 1)


hth,
fritz

-- 
-----------------------------------------------------------------------
Prof. Dr. Friedrich Leisch 

Institut für Statistik                          Tel: (+49 89) 2180 3165
Ludwig-Maximilians-Universität                  Fax: (+49 89) 2180 5308
Ludwigstraße 33
D-80539 München                 http://www.stat.uni-muenchen.de/~leisch



More information about the Bioconductor mailing list