[R-sig-finance] Sorting

Martin Maechler maechler at stat.math.ethz.ch
Thu Nov 17 11:37:03 CET 2005


>>>>> "ToBra" == Brandt, T (Tobias) <TobiasBr at taquanta.com>
>>>>>     on Thu, 17 Nov 2005 09:56:56 +0200 writes:

    ToBra> Hi
    ToBra> I'd like to add that in my experience using "order" would be the preferred
    ToBra> method as "sort" can lead to unexpected results as I hope the following
    ToBra> example will show.

    >> a <- matrix(4:1, 2,2, byrow=TRUE)
    >> colnames(a) <- c('b', 'a')
    >> a
    ToBra>      b a
    ToBra> [1,] 4 3
    ToBra> [2,] 2 1
    >> sort(a[2,])
    ToBra> a b 
    ToBra> 1 2 
    >> # works as expected 

yes, since it's sorting  the *vector*  a[2,] ,
and for vectors it's clear what do to with names.

    >> # whereas the following can lead the unwary user astray

    >> sort(a[2,,drop=FALSE])
    ToBra>      b a
    ToBra> [1,] 1 2
    >> 

    ToBra> This led to some errors in my code which took me a
    ToBra> while to track down so I'd just like to put it out
    ToBra> there as a caveat.

good;  better would have been to go to the R-help (or maybe
R-devel) mailing list and ask / comment about it in public
(but read on)

    ToBra> Not having looked at the source code of "sort" in any
    ToBra> detail and going purely on input->black box->output
    ToBra> based experience it seems to me that "sort" only
    ToBra> sorts the associated "names" attribute and not the
    ToBra> "colnames".


Several weeks ago, we (R Core development team) have seen this
and changed it for "R-devel" (which will become R-2.3.0 in April'06).

The new behavior is to drop the dimnames when sorting matrices,
since in general it doesn't make any sense to keep them;
only in your case of a matrix with just one row (or just one
column) it would make sense.

If this needs more discussion,  *PLEASE*  move it to the 
R-devel mailing list.

Regards,
Martin Maechler, ETH Zurich



    ToBra> Now the "drop=FALSE" part in the above example might seem somewhat
    ToBra> artificial but when dealing with timeseries this occurs quite often.  For
    ToBra> example, carrying on with the above example

    >> library(zoo)
    >> t <- zoo(a, c(2001,2002))
    >> t
    ToBra> b a
    ToBra> 2001 4 3
    ToBra> 2002 2 1
    >> t2 <- window(t, 2002)
    >> t2
    ToBra> b a
    ToBra> 2002 2 1
    >> sort(t2)
    ToBra> b a
    ToBra> 2002 1 2
    >> sort(as.vector(t2))
    ToBra> [1] 1 2
    >> sort(as.matrix(t2))
    ToBra> b a
    ToBra> [1,] 1 2
    >> sort(as.matrix(t2)[1,])
    ToBra> a b 
    ToBra> 1 2 
    >> sort(t2[1,])
    ToBra> b a
    ToBra> 2002 1 2
    >> # none of which really gives the desired result since there is always some
    ToBra> loss of information or erroneous
    >> # information, ie. loss of dimension names or incorrectly labelled
    ToBra> dimensions.
    >> # Using order on the other hand is transparent and preserves all the
    ToBra> information.
    >> t2[,order(t2)]
    ToBra> a b
    ToBra> 2002 1 2
    >> 

    ToBra> Regards

    ToBra> Tobias 


    >> -----Original Message-----
    >> From: r-sig-finance-bounces at stat.math.ethz.ch 
    >> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of 
    >> Rainer Böhme
    >> Sent: 16 November 2005 12:29 PM
    >> To: seanpor at acm.org
    >> Cc: r-sig-finance at stat.math.ethz.ch; L.Isella
    >> Subject: Re: [R-sig-finance] Sorting
    >> 
    >> Hi Lorenzo,
    >> 
    >> # given this 2 x N matrix
    >> 
    >> N <- 10^5
    >> A <- matrix(rnorm(2*N),2)
    >> 
    >> # use the following statement to sort the column vectors by 
    >> their elements in the i-th row
    >> 
    >> i <- 1
    >> A <- A[,order(A[i,])]
    >> 
    >> Hope this helps,
    >> Rainer
    >> 
    >>> Good Morning Lorenzo,
    >>> 
    >>> I had a look at ?sort and it said "For ordering along more than one 
    >>> variable, e.g., for sorting data frames, see 'order'." So I 
    >> had a look 
    >>> at ?order... and popped down to the examples and it talks about 
    >>> ordering a dataframe - not the clearest of examples, but hey :-) 
    >>> Sometimes the obvious is the hardest to see :-)
    >>> 
    >>> cheers!
    >>> Sean
    >>> 
    >>> 
    >>> On 16/11/05, L.Isella <L.Isella at myrealbox.com> wrote:
    >>>> Dear All,
    >>>> I have a long array (say 2xN, with N of the order of 10^5 at least) 
    >>>> made up of couples of numerical values.
    >>>> I would like to sort these N couples in increasing order of 
    >> the value 
    >>>> of the numbers in the lower row.
    >>>> It is very easy to use sort() to take care of the sorting of the 
    >>>> lower row, but then I also have to sort the upper row so that the 
    >>>> values of each couple still match.
    >>>> I did it using a double loop, but for large N this is very slow.
    >>>> This is really a bottleneck in my code...
    >>>> Any suggestions?
    >>>> Regards
    >>>> 
    >>>> Lorenzo



More information about the R-sig-finance mailing list