[R] cluster by unique value

Tue Jul 19 00:05:21 CEST 2011

On Mon, Jul 18, 2011 at 06:36:13AM -0400, Sarah Goslee wrote:
> Your data1 and your data1_class file differ in the first three
> columns. Assuming that's an error, here's one way to do it:
> 
> > data1 <- data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),layer2=c(2,3,2,2,1,2,3,2,2,2), layer3=c(1,1,1,1,1,1,1,1,1,4))
> > data1 <- cbind(data1, class=as.numeric(as.factor(do.call(paste, data1))))
> > data1
>    layer1 layer2 layer3 class
> 1     0.2      2      1     2
> 2     0.5      3      1     4
> 3     0.2      2      1     2
> 4     0.8      2      1     5
> 5     0.2      1      1     1
> 6     0.5      2      1     3
> 7     0.5      3      1     4
> 8     0.8      2      1     5
> 9     0.2      2      1     2
> 10    0.8      2      4     6
> 
> You didn't give a reproducible example, and I didn't want to type in
> all the decimal places, but you should be able to get the idea from
> this example. Also, the class numbers are assigned on sorted character
> rows, from lowest to highest, and not starting with the first one, as
> in your example.  If you do need the latter, some combination of
> unique() and subsetting or merge() may work for you.

Let me suggest the following modification, which assigns numbers
to the classes according to their first occurrence.

  data1 <- data.frame(layer1=c(.2, .5, .2, .8, .2, .5, .5, .8, .2, .8),
  layer2=c(2,3,2,2,1,2,3,2,2,2), layer3=c(1,1,1,1,1,1,1,1,1,4))
  x <- do.call(paste, data1)
  data1 <- cbind(data1, class=as.numeric(factor(x, levels=unique(x))))
  data1

     layer1 layer2 layer3 class
  1     0.2      2      1     1
  2     0.5      3      1     2
  3     0.2      2      1     1
  4     0.8      2      1     3
  5     0.2      1      1     4
  6     0.5      2      1     5
  7     0.5      3      1     2
  8     0.8      2      1     3
  9     0.2      2      1     1
  10    0.8      2      4     6

Hope this helps.

Petr Savicky.