[R] Trying to understand how to sort a DF on two columns

Stephen Ellison S@E|||@on @end|ng |rom LGCGroup@com
Wed Aug 14 16:10:32 CEST 2019


> I want to sort a DF, temp, on two columns, patid and time. I have searched
> the internet and found code that I was able to modify to get my data sorted.
> Unfortunately I don't understand how the code works. I would appreciate it
> if someone could explain to me how the code works. Among other
> questions, despite reading, I don't understand how with() works, nor what it
> does in the current setting.
> 
> code:
 data4xsort<-temp[
   with( temp, order(temp[,"patid"], temp[,"time"])),
 ]

With apologies for brevity-induced brusqueness:

1) You don't need 'with' in the code. You could say
data4xsort<- temp[order(temp[,"patid"], temp[,"time"]), ]
or
data4xsort<- temp[order(temp$patid, temp$time), ]

2) If you _did_ use 'with', you could say
data4xsort<- temp[with(temp, order(patid,time)), ]

Basically, 'with(x, ...)' says 'look in x first for anything in '...'. 

3. order. order is a bit of a mindbender. It gives you the numeric indices you need to convert an unsorted object into a sorted obbject.
If we said
a <- c(2,3,1)  
order(a)
by default, we get back
# [1] 3 1 2

These are indexes into a that put the elements of a in ascending order. a[3] is 1, a[1] is 2 and so on. 
So if we say
oo <- order(a) 
a[oo]

we get
[1] 1 2 3
... which is a, in ascending order. And to do that, we used oo as indexes in a.

4. For a data frame, you generally want to sort rows into a particular order. So let's say we have a data frame like
d <- data.frame(a=c(2,3,1,3,1,2), b=c(1,2,2,1,1,2))
d
  a b
1 2 1
2 3 2
3 1 2
4 3 1
5 1 1
6 2 2

We can say
oo.d <- with(d, order(a, b)) #which says 'look in 'd' to find 'a' and 'b' 
	#We could also have said oo.d <- order(d$a, d$b)

This gives us the row numbers of d, arranged to give us the row ordering we asked 'order' to generate. 
Now, if we say 
d[oo.d, ]     #where we need the empty second index so that the first is treated as a row index
# we get d, with rows sorted by a first and then b:
  a b
5 1 1
3 1 2
1 2 1
6 2 2
4 3 1
2 3 2

#You might notice that the default row numbers from d - the left hand colum above - are now identical to oo.d; 
# this is particular to default row numbers, though.

5. If you want to pack that into one line without assigning the ordering to oo.d, it goes (for example)
d[ with(d, order(a, b)), ]

... which is pretty much what your code is doing.

The only thing I've missed is that when you wrap something like 
order(temp[,"patid"], temp[,"time"]) 
in 'with', 'with' is not doing anything useful for you. 
temp[,"patid"] has already told R where to look for patid, 
so R doesn’t need to look anywhere else. 


Does that help?

Steve Ellison


*******************************************************************
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmaster using lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK


More information about the R-help mailing list