[R] Get distribution of positive/negative examples for each cluster

Phil Spector spector at stat.berkeley.edu
Wed Jul 21 23:07:37 CEST 2010


Boya-
    table() is the function that does what you want:

> cdat = data.frame(membership=rep(1:3,rep(3,3)),
+                   label=as.character(c(0,0,1,0,1,1,1,1,1)))
> table(cdat)
           label
membership 0 1
          1 2 1
          2 1 2
          3 0 3

>From there, you can rearrange it in a variety of ways:

> as.data.frame(table(cdat))
   membership label Freq
1          1     0    2
2          2     0    1
3          3     0    0
4          1     1    1
5          2     1    2
6          3     1    3

Or, to conform with your request

> reshape(as.data.frame(table(cdat)),idvar='membership',
+         v.names='Freq',timevar='label',direction='wide')
   membership Freq.0 Freq.1
1          1      2      1
2          2      1      2
3          3      0      3


 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu


On Wed, 21 Jul 2010, Boya Sun wrote:

> Dear R experts,
>
> I have a labeled data set. Each data is assigned a binary label 0 or 1.
> Assume that I use some clustering algorithm to group the data by clusters
> (using some features of the data). Now I want to know how many data are
> labeled as 0/1 in each cluster.
>
> For example, assume that I have 9 labeled data grouped into three clusters.
> The ids of the clusters are 1, 2, and 3.  The dataset is represented by the
> following matrix:
>
>        membership        Label
> d1    1                        0
> d2    1                        0
> d3    1                        1
> d4    2                        0
> d5    2                        1
> d6    2                        1
> d7    3                        1
> d8    3                        1
> d9    3                        1
>
> Now I want to get the following output, telling me how many data are labeled
> as 0 and 1 in each cluster
>
> cluster_id    0-data    1-data
> 1                2            1
> 2                1            2
> 3                0            3
>
> The output does not have to be a matrix, it could be a summary of the
> statistics.
>
> How should I approach this problem? What R functions should I use to get
> such information?
>
> Thanks so much!
>
> Boya
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list