[R] sorting in 'merge'

jiho jo.irisson at gmail.com
Mon Jan 21 11:46:18 CET 2008


Hello everyone,

I've been advised to use merge to extract information from two  
data.frames with a number of common columns, but I cannot get a grasp  
on how it sorts the result. With sort=FALSE, I would expect it to give  
the result back sorted exactly as the input was but it seems it is not  
always the case, especially when there are repeats in the input.

For example:

 > a = data.frame(field1=c(1,1,2,2),field2=c(1:2,1:2),var1=runif(4))
 > b = data.frame(field1=c(2,2,1,1),field2=c(1,2,2,1),var2=runif(4))
 > a
   field1 field2      var1
1      1      1 0.8327855
2      1      2 0.4309419
3      2      1 0.5134574
4      2      2 0.8063110
 > b
   field1 field2      var2
1      2      1 0.2739025
2      2      2 0.5147113
3      1      2 0.2958369
4      1      1 0.3703116

	So b is in an irregular order, if I then merge:

 > merge(b,a)
   field1 field2      var2      var1
1      1      1 0.3703116 0.8327855
2      1      2 0.2958369 0.4309419
3      2      1 0.2739025 0.5134574
4      2      2 0.5147113 0.8063110

	in that case the result is sorted, as expected. If i merge it without  
sorting:

 > merge(b,a,sort=F)
   field1 field2      var2      var1
1      2      1 0.2739025 0.5134574
2      2      2 0.5147113 0.8063110
3      1      2 0.2958369 0.4309419
4      1      1 0.3703116 0.8327855

	it retains the order in b, which is what I want.
	However if I now add a repeated row to b

 > b = rbind(b,b[1,])
 > b
   field1 field2      var2
1      2      1 0.2739025
2      2      2 0.5147113
3      1      2 0.2958369
4      1      1 0.3703116
5      2      1 0.2739025
	
	and merge it, without sorting

 > merge(b,a,sort=F)
   field1 field2      var2      var1
1      2      1 0.2739025 0.5134574
2      2      1 0.2739025 0.5134574
3      2      2 0.5147113 0.8063110
4      1      2 0.2958369 0.4309419
5      1      1 0.3703116 0.8327855

	the result is still somehow sorted according to the order of b. I  
would have expected the output to be:

merge(b,a,sort=F)
   field1 field2      var2      var1
1      2      1 0.2739025 0.5134574
2      2      2 0.5147113 0.8063110
3      1      2 0.2958369 0.4309419
4      1      1 0.3703116 0.8327855
5      2      1 0.2739025 0.5134574

Is it possible to get this output (another function similar to merge)?  
What is the overall reason (if someone knows it) for the current  
behaviour of merge?

Thanks in advance.

PS: code

a = data.frame(field1=c(1,1,2,2),field2=c(1:2,1:2),var1=runif(4))
b = data.frame(field1=c(2,2,1,1),field2=c(1,2,2,1),var2=runif(4))
a
b
merge(b,a)
merge(b,a,sort=F)
b = rbind(b,b[1,])
b
merge(b,a,sort=F)


JiHO
---
http://jo.irisson.free.fr/



More information about the R-help mailing list