# [R] sorting in 'merge'

jiho jo.irisson at gmail.com
Mon Jan 21 11:46:18 CET 2008

```Hello everyone,

I've been advised to use merge to extract information from two
data.frames with a number of common columns, but I cannot get a grasp
on how it sorts the result. With sort=FALSE, I would expect it to give
the result back sorted exactly as the input was but it seems it is not
always the case, especially when there are repeats in the input.

For example:

> a = data.frame(field1=c(1,1,2,2),field2=c(1:2,1:2),var1=runif(4))
> b = data.frame(field1=c(2,2,1,1),field2=c(1,2,2,1),var2=runif(4))
> a
field1 field2      var1
1      1      1 0.8327855
2      1      2 0.4309419
3      2      1 0.5134574
4      2      2 0.8063110
> b
field1 field2      var2
1      2      1 0.2739025
2      2      2 0.5147113
3      1      2 0.2958369
4      1      1 0.3703116

So b is in an irregular order, if I then merge:

> merge(b,a)
field1 field2      var2      var1
1      1      1 0.3703116 0.8327855
2      1      2 0.2958369 0.4309419
3      2      1 0.2739025 0.5134574
4      2      2 0.5147113 0.8063110

in that case the result is sorted, as expected. If i merge it without
sorting:

> merge(b,a,sort=F)
field1 field2      var2      var1
1      2      1 0.2739025 0.5134574
2      2      2 0.5147113 0.8063110
3      1      2 0.2958369 0.4309419
4      1      1 0.3703116 0.8327855

it retains the order in b, which is what I want.
However if I now add a repeated row to b

> b = rbind(b,b[1,])
> b
field1 field2      var2
1      2      1 0.2739025
2      2      2 0.5147113
3      1      2 0.2958369
4      1      1 0.3703116
5      2      1 0.2739025

and merge it, without sorting

> merge(b,a,sort=F)
field1 field2      var2      var1
1      2      1 0.2739025 0.5134574
2      2      1 0.2739025 0.5134574
3      2      2 0.5147113 0.8063110
4      1      2 0.2958369 0.4309419
5      1      1 0.3703116 0.8327855

the result is still somehow sorted according to the order of b. I
would have expected the output to be:

merge(b,a,sort=F)
field1 field2      var2      var1
1      2      1 0.2739025 0.5134574
2      2      2 0.5147113 0.8063110
3      1      2 0.2958369 0.4309419
4      1      1 0.3703116 0.8327855
5      2      1 0.2739025 0.5134574

Is it possible to get this output (another function similar to merge)?
What is the overall reason (if someone knows it) for the current
behaviour of merge?

PS: code

a = data.frame(field1=c(1,1,2,2),field2=c(1:2,1:2),var1=runif(4))
b = data.frame(field1=c(2,2,1,1),field2=c(1,2,2,1),var2=runif(4))
a
b
merge(b,a)
merge(b,a,sort=F)
b = rbind(b,b[1,])
b
merge(b,a,sort=F)

JiHO
---
http://jo.irisson.free.fr/

```