[Rd] split() - unexpected sorting of results
Rui Barradas
ruipbarradas at sapo.pt
Sat Oct 21 06:35:29 CEST 2017
Hello,
In order to solve that problem of sorting numerics made characters there
is package stringr, functions str_sort and str_order.
library(stringr)
set.seed(2447)
x <- sample(11L)
sort(as.character(x))
[1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9"
str_sort(as.character(x), numeric = TRUE)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
str_order(as.character(x), numeric = TRUE)
#[1] 1 4 11 8 6 5 3 10 9 7 2
i <- str_order(as.character(x), numeric = TRUE)
as.character(x)[i]
#[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
Unfortunately this does not solve the OP's question, factor(),
as.factor(), split() and others use the base R sorter and this can only
be changed by changing their sources.
Hope this helps,
Rui Barradas
Em 21-10-2017 00:32, Hervé Pagès escreveu:
> Hi,
>
> On 10/20/2017 12:53 PM, Peter Meissner wrote:
>> Thanks, for the explanation.
>>
>> Still, I think this is surprising bahaviour which might be handled
>> better.
>
> Maybe a little surprising, but no more than:
>
> > x <- sample(11L)
>
> > sort(x)
> [1] 1 2 3 4 5 6 7 8 9 10 11
>
> > sort(as.character(x))
> [1] "1" "10" "11" "2" "3" "4" "5" "6" "7" "8" "9"
>
> The fact that sort(), as.factor(), split() and many other things behave
> consistently with respect to the underlying order of character vectors
> avoids other even bigger surprises.
>
> Also note that the underlying order of character vectors actually
> depends on your locale. One way to guarantee consistent results across
> platforms/locales is by explicitly specifying the levels when making
> a factor e.g.
>
> f <- factor(x, levels=unique(x))
> split(1:11, f)
>
> This is particularly sensible when writing unit tests.
>
> Cheers,
> H.
>
>>
>> Best, Peter
>>
>> Am 20.10.2017 9:49 nachm. schrieb "Iñaki Úcar" <i.ucar86 at gmail.com>:
>>
>>> Hi Peter,
>>>
>>> 2017-10-20 21:33 GMT+02:00 Peter Meissner <retep.meissner at gmail.com>:
>>>> Hey,
>>>>
>>>> I found this - for me - quite surprising and puzzling behaviour of
>>> split().
>>>>
>>>>
>>>> split(1:11, as.character(1:11))
>>>> split(1:11, 1:11)
>>>>
>>>>
>>>> When splitting by numerics everything works as expected - sorting of
>>> input
>>>> == sorting of output -- but when using a character vector everything
>>>> gets
>>>> re-sorted alphabetical.
>>>>
>>>>
>>>> Although, there are some references in the help files to what happens
>>> when
>>>> using split, I did not find any note on this - for me - rather
>>>> unexpected
>>>> behaviour.
>>>
>>> As the documentation states,
>>>
>>> f: a ‘factor’ in the sense that ‘as.factor(f)’ defines the
>>> grouping, or a list of such factors in which case their
>>> interaction is used for the grouping.
>>>
>>> And, in fact,
>>>
>>>> as.factor(1:11)
>>> [1] 1 2 3 4 5 6 7 8 9 10 11
>>> Levels: 1 2 3 4 5 6 7 8 9 10 11
>>>
>>>> as.factor(as.character(1:11))
>>> [1] 1 2 3 4 5 6 7 8 9 10 11
>>> Levels: 1 10 11 2 3 4 5 6 7 8 9
>>>
>>> Regards,
>>> Iñaki
>>>
>>>> I would like it best when the sorting of split results stays the
>>>> same no
>>>> matter the input (sorting of input == sorting of output)
>>>>
>>>> If that is not possibly a note of caution in the help pages and
>>>> maybe an
>>>> example might be valuable.
>>>>
>>>>
>>>> Best, Peter
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=
>>>>
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=o5-lZT7zAjFNU8C0Z9D7XaQO_2NGmhKF-IbGZFhSvO0&s=4cZ9rSLJAVnnjULGMCDPAclXHoc9_le3Z1DrZg0nQqg&e=
>>
>>
>
More information about the R-devel
mailing list