[R] cluster by unique value
Petr Savicky
savicky at praha1.ff.cuni.cz
Tue Jul 19 00:05:21 CEST 2011
On Mon, Jul 18, 2011 at 06:36:13AM -0400, Sarah Goslee wrote:
> Your data1 and your data1_class file differ in the first three
> columns. Assuming that's an error, here's one way to do it:
>
> > data1 <- data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),layer2=c(2,3,2,2,1,2,3,2,2,2), layer3=c(1,1,1,1,1,1,1,1,1,4))
> > data1 <- cbind(data1, class=as.numeric(as.factor(do.call(paste, data1))))
> > data1
> layer1 layer2 layer3 class
> 1 0.2 2 1 2
> 2 0.5 3 1 4
> 3 0.2 2 1 2
> 4 0.8 2 1 5
> 5 0.2 1 1 1
> 6 0.5 2 1 3
> 7 0.5 3 1 4
> 8 0.8 2 1 5
> 9 0.2 2 1 2
> 10 0.8 2 4 6
>
> You didn't give a reproducible example, and I didn't want to type in
> all the decimal places, but you should be able to get the idea from
> this example. Also, the class numbers are assigned on sorted character
> rows, from lowest to highest, and not starting with the first one, as
> in your example. If you do need the latter, some combination of
> unique() and subsetting or merge() may work for you.
Let me suggest the following modification, which assigns numbers
to the classes according to their first occurrence.
data1 <- data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),
layer2=c(2,3,2,2,1,2,3,2,2,2), layer3=c(1,1,1,1,1,1,1,1,1,4))
x <- do.call(paste, data1)
data1 <- cbind(data1, class=as.numeric(factor(x, levels=unique(x))))
data1
layer1 layer2 layer3 class
1 0.2 2 1 1
2 0.5 3 1 2
3 0.2 2 1 1
4 0.8 2 1 3
5 0.2 1 1 4
6 0.5 2 1 5
7 0.5 3 1 2
8 0.8 2 1 3
9 0.2 2 1 1
10 0.8 2 4 6
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list