[R] sorting in 'merge'
jiho
jo.irisson at gmail.com
Mon Jan 21 11:46:18 CET 2008
Hello everyone,
I've been advised to use merge to extract information from two
data.frames with a number of common columns, but I cannot get a grasp
on how it sorts the result. With sort=FALSE, I would expect it to give
the result back sorted exactly as the input was but it seems it is not
always the case, especially when there are repeats in the input.
For example:
> a = data.frame(field1=c(1,1,2,2),field2=c(1:2,1:2),var1=runif(4))
> b = data.frame(field1=c(2,2,1,1),field2=c(1,2,2,1),var2=runif(4))
> a
field1 field2 var1
1 1 1 0.8327855
2 1 2 0.4309419
3 2 1 0.5134574
4 2 2 0.8063110
> b
field1 field2 var2
1 2 1 0.2739025
2 2 2 0.5147113
3 1 2 0.2958369
4 1 1 0.3703116
So b is in an irregular order, if I then merge:
> merge(b,a)
field1 field2 var2 var1
1 1 1 0.3703116 0.8327855
2 1 2 0.2958369 0.4309419
3 2 1 0.2739025 0.5134574
4 2 2 0.5147113 0.8063110
in that case the result is sorted, as expected. If i merge it without
sorting:
> merge(b,a,sort=F)
field1 field2 var2 var1
1 2 1 0.2739025 0.5134574
2 2 2 0.5147113 0.8063110
3 1 2 0.2958369 0.4309419
4 1 1 0.3703116 0.8327855
it retains the order in b, which is what I want.
However if I now add a repeated row to b
> b = rbind(b,b[1,])
> b
field1 field2 var2
1 2 1 0.2739025
2 2 2 0.5147113
3 1 2 0.2958369
4 1 1 0.3703116
5 2 1 0.2739025
and merge it, without sorting
> merge(b,a,sort=F)
field1 field2 var2 var1
1 2 1 0.2739025 0.5134574
2 2 1 0.2739025 0.5134574
3 2 2 0.5147113 0.8063110
4 1 2 0.2958369 0.4309419
5 1 1 0.3703116 0.8327855
the result is still somehow sorted according to the order of b. I
would have expected the output to be:
merge(b,a,sort=F)
field1 field2 var2 var1
1 2 1 0.2739025 0.5134574
2 2 2 0.5147113 0.8063110
3 1 2 0.2958369 0.4309419
4 1 1 0.3703116 0.8327855
5 2 1 0.2739025 0.5134574
Is it possible to get this output (another function similar to merge)?
What is the overall reason (if someone knows it) for the current
behaviour of merge?
Thanks in advance.
PS: code
a = data.frame(field1=c(1,1,2,2),field2=c(1:2,1:2),var1=runif(4))
b = data.frame(field1=c(2,2,1,1),field2=c(1,2,2,1),var2=runif(4))
a
b
merge(b,a)
merge(b,a,sort=F)
b = rbind(b,b[1,])
b
merge(b,a,sort=F)
JiHO
---
http://jo.irisson.free.fr/
More information about the R-help
mailing list