[R] Simple order() data frame question.

Marc Schwartz marc_schwartz at me.com
Thu May 12 15:40:00 CEST 2011


On May 12, 2011, at 8:09 AM, John Kane wrote:

> Argh.  I knew it was at least partly obvious.  I never have been able to read the order() help page and understand what it is saying.
> 
> THanks very much.
> 
> By the way, to me it is counter-intuitive that the the command is
> 
>> df1[order(df1[,2],decreasing=TRUE),]
> 
> For some reason I keep expecting it to be 
> order( , df1[,2],decreasing=TRUE)
> 
> So clearly I don't understand what is going on but at least I a lot better off.  I may be able to get this graph to work.  


John,

Perhaps it may be helpful to understand that order() does not actually sort() the data. 

It returns a vector of indices into the data, where those indices are the sorted ordering of the elements in the vector, or in this case, the column.

So you want the output of order() to be used within the brackets for the row *indices*, to reflect the ordering of the column (or columns in the case of a multi-level sort) that you wish to use to sort the data frame rows.

set.seed(1)
x <- sample(10)

> x
 [1]  3  4  5  7  2  8  9  6 10  1


# sort() actually returns the sorted data
> sort(x)
 [1]  1  2  3  4  5  6  7  8  9 10


# order() returns the indices of 'x' in sorted order
> order(x)
 [1] 10  5  1  2  3  8  4  6  7  9


# This does the same thing as sort()
> x[order(x)]
 [1]  1  2  3  4  5  6  7  8  9 10


set.seed(1)
df1 <- data.frame(aa = letters[1:10], bb = rnorm(10))

> df1
   aa         bb
1   a -0.6264538
2   b  0.1836433
3   c -0.8356286
4   d  1.5952808
5   e  0.3295078
6   f -0.8204684
7   g  0.4874291
8   h  0.7383247
9   i  0.5757814
10  j -0.3053884


# These are the indices of df1$bb in sorted order
> order(df1$bb)
 [1]  3  6  1 10  2  5  7  9  8  4


# Get df1$bb in increasing order
> df1$bb[order(df1$bb)]
 [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884  0.1836433  0.3295078
 [7]  0.4874291  0.5757814  0.7383247  1.5952808


# Same thing as above
> sort(df1$bb)
 [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884  0.1836433  0.3295078
 [7]  0.4874291  0.5757814  0.7383247  1.5952808


You can't use the output of sort() to sort the data frame rows, so you need to use order() to get the ordered indices and then use that to extract the data frame rows in the sort order that you desire:

> df1[order(df1$bb), ]
   aa         bb
3   c -0.8356286
6   f -0.8204684
1   a -0.6264538
10  j -0.3053884
2   b  0.1836433
5   e  0.3295078
7   g  0.4874291
9   i  0.5757814
8   h  0.7383247
4   d  1.5952808


> df1[order(df1$bb, decreasing = TRUE), ]
   aa         bb
4   d  1.5952808
8   h  0.7383247
9   i  0.5757814
7   g  0.4874291
5   e  0.3295078
2   b  0.1836433
10  j -0.3053884
1   a -0.6264538
6   f -0.8204684
3   c -0.8356286


Does that help?

Regards,

Marc Schwartz



More information about the R-help mailing list