[R] Get distribution of positive/negative examples for each cluster
Phil Spector
spector at stat.berkeley.edu
Wed Jul 21 23:07:37 CEST 2010
Boya-
table() is the function that does what you want:
> cdat = data.frame(membership=rep(1:3,rep(3,3)),
+ label=as.character(c(0,0,1,0,1,1,1,1,1)))
> table(cdat)
label
membership 0 1
1 2 1
2 1 2
3 0 3
>From there, you can rearrange it in a variety of ways:
> as.data.frame(table(cdat))
membership label Freq
1 1 0 2
2 2 0 1
3 3 0 0
4 1 1 1
5 2 1 2
6 3 1 3
Or, to conform with your request
> reshape(as.data.frame(table(cdat)),idvar='membership',
+ v.names='Freq',timevar='label',direction='wide')
membership Freq.0 Freq.1
1 1 2 1
2 2 1 2
3 3 0 3
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Wed, 21 Jul 2010, Boya Sun wrote:
> Dear R experts,
>
> I have a labeled data set. Each data is assigned a binary label 0 or 1.
> Assume that I use some clustering algorithm to group the data by clusters
> (using some features of the data). Now I want to know how many data are
> labeled as 0/1 in each cluster.
>
> For example, assume that I have 9 labeled data grouped into three clusters.
> The ids of the clusters are 1, 2, and 3. The dataset is represented by the
> following matrix:
>
> membership Label
> d1 1 0
> d2 1 0
> d3 1 1
> d4 2 0
> d5 2 1
> d6 2 1
> d7 3 1
> d8 3 1
> d9 3 1
>
> Now I want to get the following output, telling me how many data are labeled
> as 0 and 1 in each cluster
>
> cluster_id 0-data 1-data
> 1 2 1
> 2 1 2
> 3 0 3
>
> The output does not have to be a matrix, it could be a summary of the
> statistics.
>
> How should I approach this problem? What R functions should I use to get
> such information?
>
> Thanks so much!
>
> Boya
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list