[R] sorting in 'merge'
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Mon Jan 21 12:22:47 CET 2008
jiho wrote:
> [...snip...]
> the result is still somehow sorted according to the order of b. I
> would have expected the output to be:
>
> merge(b,a,sort=F)
> field1 field2 var2 var1
> 1 2 1 0.2739025 0.5134574
> 2 2 2 0.5147113 0.8063110
> 3 1 2 0.2958369 0.4309419
> 4 1 1 0.3703116 0.8327855
> 5 2 1 0.2739025 0.5134574
>
> Is it possible to get this output (another function similar to merge)?
> What is the overall reason (if someone knows it) for the current
> behaviour of merge?
>
>
Well, the documentation says that the order is "unspecified". That means
that expecting anything specific is likely to be wrong (and even if you
happen to guess correctly, the answer may be wrong next year!).
Merge algorithms generally require sorting of data for efficiency, and
putting things back in the original order (or any other order) adds
complexity. It is not even at all clear what the "original order"
actually means in cases of many-many matching (or alternating one-many
and many-one).
To sort according to the original order of b, I'd just do it explicitly
m <- merge(cbind(id=seq_len(nrow(b)), b), a, sort=F)
m[order(m$id),]
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list