[R-sig-finance] Sorting
Martin Maechler
maechler at stat.math.ethz.ch
Thu Nov 17 11:37:03 CET 2005
>>>>> "ToBra" == Brandt, T (Tobias) <TobiasBr at taquanta.com>
>>>>> on Thu, 17 Nov 2005 09:56:56 +0200 writes:
ToBra> Hi
ToBra> I'd like to add that in my experience using "order" would be the preferred
ToBra> method as "sort" can lead to unexpected results as I hope the following
ToBra> example will show.
>> a <- matrix(4:1, 2,2, byrow=TRUE)
>> colnames(a) <- c('b', 'a')
>> a
ToBra> b a
ToBra> [1,] 4 3
ToBra> [2,] 2 1
>> sort(a[2,])
ToBra> a b
ToBra> 1 2
>> # works as expected
yes, since it's sorting the *vector* a[2,] ,
and for vectors it's clear what do to with names.
>> # whereas the following can lead the unwary user astray
>> sort(a[2,,drop=FALSE])
ToBra> b a
ToBra> [1,] 1 2
>>
ToBra> This led to some errors in my code which took me a
ToBra> while to track down so I'd just like to put it out
ToBra> there as a caveat.
good; better would have been to go to the R-help (or maybe
R-devel) mailing list and ask / comment about it in public
(but read on)
ToBra> Not having looked at the source code of "sort" in any
ToBra> detail and going purely on input->black box->output
ToBra> based experience it seems to me that "sort" only
ToBra> sorts the associated "names" attribute and not the
ToBra> "colnames".
Several weeks ago, we (R Core development team) have seen this
and changed it for "R-devel" (which will become R-2.3.0 in April'06).
The new behavior is to drop the dimnames when sorting matrices,
since in general it doesn't make any sense to keep them;
only in your case of a matrix with just one row (or just one
column) it would make sense.
If this needs more discussion, *PLEASE* move it to the
R-devel mailing list.
Regards,
Martin Maechler, ETH Zurich
ToBra> Now the "drop=FALSE" part in the above example might seem somewhat
ToBra> artificial but when dealing with timeseries this occurs quite often. For
ToBra> example, carrying on with the above example
>> library(zoo)
>> t <- zoo(a, c(2001,2002))
>> t
ToBra> b a
ToBra> 2001 4 3
ToBra> 2002 2 1
>> t2 <- window(t, 2002)
>> t2
ToBra> b a
ToBra> 2002 2 1
>> sort(t2)
ToBra> b a
ToBra> 2002 1 2
>> sort(as.vector(t2))
ToBra> [1] 1 2
>> sort(as.matrix(t2))
ToBra> b a
ToBra> [1,] 1 2
>> sort(as.matrix(t2)[1,])
ToBra> a b
ToBra> 1 2
>> sort(t2[1,])
ToBra> b a
ToBra> 2002 1 2
>> # none of which really gives the desired result since there is always some
ToBra> loss of information or erroneous
>> # information, ie. loss of dimension names or incorrectly labelled
ToBra> dimensions.
>> # Using order on the other hand is transparent and preserves all the
ToBra> information.
>> t2[,order(t2)]
ToBra> a b
ToBra> 2002 1 2
>>
ToBra> Regards
ToBra> Tobias
>> -----Original Message-----
>> From: r-sig-finance-bounces at stat.math.ethz.ch
>> [mailto:r-sig-finance-bounces at stat.math.ethz.ch] On Behalf Of
>> Rainer Böhme
>> Sent: 16 November 2005 12:29 PM
>> To: seanpor at acm.org
>> Cc: r-sig-finance at stat.math.ethz.ch; L.Isella
>> Subject: Re: [R-sig-finance] Sorting
>>
>> Hi Lorenzo,
>>
>> # given this 2 x N matrix
>>
>> N <- 10^5
>> A <- matrix(rnorm(2*N),2)
>>
>> # use the following statement to sort the column vectors by
>> their elements in the i-th row
>>
>> i <- 1
>> A <- A[,order(A[i,])]
>>
>> Hope this helps,
>> Rainer
>>
>>> Good Morning Lorenzo,
>>>
>>> I had a look at ?sort and it said "For ordering along more than one
>>> variable, e.g., for sorting data frames, see 'order'." So I
>> had a look
>>> at ?order... and popped down to the examples and it talks about
>>> ordering a dataframe - not the clearest of examples, but hey :-)
>>> Sometimes the obvious is the hardest to see :-)
>>>
>>> cheers!
>>> Sean
>>>
>>>
>>> On 16/11/05, L.Isella <L.Isella at myrealbox.com> wrote:
>>>> Dear All,
>>>> I have a long array (say 2xN, with N of the order of 10^5 at least)
>>>> made up of couples of numerical values.
>>>> I would like to sort these N couples in increasing order of
>> the value
>>>> of the numbers in the lower row.
>>>> It is very easy to use sort() to take care of the sorting of the
>>>> lower row, but then I also have to sort the upper row so that the
>>>> values of each couple still match.
>>>> I did it using a double loop, but for large N this is very slow.
>>>> This is really a bottleneck in my code...
>>>> Any suggestions?
>>>> Regards
>>>>
>>>> Lorenzo
More information about the R-sig-finance
mailing list