[R] Finding (swapped) repetitions of numbers pairs across two columns

arun smartpink111 at yahoo.com
Fri Dec 28 03:49:09 CET 2012


Hi,

You could also use:
apply(cbind(v1,v2),1,function(x) x[order(x)])
#or
unique(t(apply(cbind(v1,v2),1,sort.int,method="quick")))

By comparing different methods:
set.seed(51)
v1<-sample(0:9,1e5,replace=TRUE)
set.seed(49)
v2<-sample(0:9,1e5,replace=TRUE)
system.time(res1<-unique(t(apply(cbind(v1, v2), 1, sort))))
# user  system elapsed 
# 11.373   0.188  11.918 

system.time(res2<-unique(t(apply(cbind(v1,v2),1,sort.int,method="quick"))))
#   user  system elapsed 
#  7.088   0.120   7.446 

 identical(res1,res2)
#[1] TRUE
 system.time(res3 <- unique(t(apply(cbind(v1,v2),1,function(x) x[order(x)])))) #found to be faster
#   user  system elapsed 
#  2.693   0.072   2.857 

 identical(res1,res3)
#[1] TRUE



A.K.



----- Original Message -----
From: Emmanuel Levy <emmanuel.levy at gmail.com>
To: R-help Mailing List <r-help at r-project.org>
Cc: 
Sent: Thursday, December 27, 2012 3:30 PM
Subject: [R] Finding (swapped) repetitions of numbers pairs across two columns

Hi,

I've had this problem for a while and tackled it is a quite dirty way
so I'm wondering is a better solution exists:

If we have two vectors:

v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)

How to remove one instance of the "3,1" / "1,3" double?

At the moment I'm using the following solution, which is quite horrible:

v1 = c(0,1,2,3,4)
v2 = c(5,3,2,1,0)
ft <- cbind(v1, v2)
direction = apply( ft, 1, function(x) return(x[1]>x[2]))
ft.tmp = ft
ft[which(direction),1] = ft.tmp[which(direction),2]
ft[which(direction),2] = ft.tmp[which(direction),1]
uniques     = apply( ft, 1, function(x) paste(x, collapse="%") )
uniques     = unique(uniques)
ft.unique   = matrix(unlist(strsplit(uniques,"%")), ncol=2, byrow=TRUE)


Any better solution would be very welcome!

All the best,

Emmanuel

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list