[R] which rows are duplicates?
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Mar 31 14:36:33 CEST 2009
Dimitris Rizopoulos wrote:
> Wacek Kusnierczyk wrote:
>> Wacek Kusnierczyk wrote:
>>> Michael Dewey wrote:
>>>
>>>> At 05:07 30/03/2009, Aaron M. Swoboda wrote:
>>>>
>>>>> I would like to know which rows are duplicates of each other, not
>>>>> simply that a row is duplicate of another row. In the following
>>>>> example rows 1 and 3 are duplicates.
>>>>>
>>>>>
>>>>>> x <- c(1,3,1)
>>>>>> y <- c(2,4,2)
>>>>>> z <- c(3,4,3)
>>>>>> data <- data.frame(x,y,z)
>>>>>>
>>>>> x y z
>>>>> 1 1 2 3
>>>>> 2 3 4 4
>>>>> 3 1 2 3
>>>>>
>>> i don't have any solution significantly better than what you have
>>> already been given.
>>
>> i now seem to have one:
>>
>> # dummy data
>> data = data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5,
>> replace=TRUE))
>> # add a class column; identical rows have the same class id
>> data$class = local({
>> rows = do.call('paste', c(data, sep='\r'))
>> with(
>> rle(sort(rows)),
>> rep(1:length(values), lengths)[rank(rows)] ) })
>>
>> data
>> # x y class
>> # 1 2 2 3
>> # 2 2 1 2
>> # 3 2 1 2
>> # 4 1 2 1
>> # 5 2 2 3
>>
>
> another approach (maybe a bit cleaner) seems to be:
>
> data <- data.frame(x=sample(1:2, 5, replace=TRUE), y=sample(1:2, 5,
> replace = TRUE))
>
> vals <- do.call('paste', c(data, sep = '\r'))
> data$class <- match(vals, unique(vals))
> data
>
wow, cool! this seems unbeatable ;)
i guess it can't be slower than any of the others.
vQ
More information about the R-help
mailing list