[R] Symbol/String comparison in R

Kristjan Kure kr|@tj@n@kure@1 @end|ng |rom gm@||@com
Thu Apr 14 13:59:41 CEST 2022

```Thank you for your response. This is the current status:

*I am looking fundamental why for these comparisons:*
"1040" <= "12000" # returns true
"1040" <= "10000" # returns false
"a" < "A" # returns true
"A" < "a" # returns false
"raining" <= "raining x" #true

*Feedback so far:*
1) Bert: "lexicographic", "locale"
2) Timothy: https://en.wikipedia.org/wiki/ASCII

1) "Lexicographic" - The phrase lexicographic order means alphabetical
order. This will help me only when comparing:
"a" < "b" # I suppose it returns true, bluntly because (1 < 2)? Position 1
- A, position 2 - B
"b" < "a" # I suppose it returns false, bluntly because (2 < 1)? Position 2
- B, position 1 - A

2) Checking the alphabet or ASCII table won't help me understand why "a" <
"A" returns true.

3) Letter "A" has smaller values compared to "a" (Checking oct, dec, hex
values in https://en.wikipedia.org/wiki/ASCII). On the other hand, if
alphabetical order is different
in country X the whole ASCII table is obsolete?

4) "Locale" - I understand the order of letters can be different between
locales/alphabets. Still, it does not help with "a" < "A" comparison.
Tried to use local() function in RStudio - did not get additional insight.
Or is there any local table somewhere listing all symbols, lowercase, and
uppercase symbols/letters?

I understand it might be a rare occasion for this type of comparison, but I
still want to understand the why. Also, some functions might return strings
and then it might be helpful to understand what is really going on.

*If no one can answer how these comparisons fundamentally work should this
kind of string comparison return NaN in R?*

Best regards,
Kristjan

On Thu, Apr 14, 2022 at 1:20 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> For some issues it can be useful to learn by experiment. It gives you
> experience and shows you what sorts of error messages you can expect. In
> the console type things like this:
> a>B
> gives an error
> "a">"B"
> FALSE
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert Gunter
> Sent: Wednesday, April 13, 2022 10:00 PM
> To: Kristjan Kure <kristjan.kure.1 using gmail.com>
> Cc: R-help <r-help using r-project.org>
> Subject: Re: [R] Symbol/String comparison in R
>
> [External Email]
>
> "I was not able to find answers to my questions (tried Google, Stack
> Overflow, etc). Please correct me if anything is wrong here."
>
> R has an extensive Help system. That should always be your first place to
> look. In this case, ?"<" (at the R prompt) brings you to the Help page for
> comparisons (as would ?Comparison, but only if the 'c" is in upper case,
> unfortunately). Among lots of other stuff, it says:
>
> "Comparison of strings in character vectors is lexicographic within the
> strings using the collating sequence of the locale in use: see locales."
> ... (+ lots more).
>
> Incidentally, rseek.org and rdrr.io are another couple of good places to
> look for R documentation.
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Apr 13, 2022 at 5:10 PM Kristjan Kure <kristjan.kure.1 using gmail.com>
> wrote:
> >
> > Hi!
> >
> > Sorry, I am a beginner in R.
> >
> > I was not able to find answers to my questions (tried Google, Stack
> > Overflow, etc). Please correct me if anything is wrong here.
> >
> > When comparing symbols/strings in R - raw numeric values are compared
> > symbol by symbol starting from left? If raw numeric values are not
> > used is there an ASCII / Unicode table where symbols have
> > values/ranking/order and R compares those values?
> >
> > *2) Comparing symbols*
> > Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct?
> >
> > # Raw value for "a" = 61
> > a_raw <- charToRaw("a")
> > a_raw
> >
> > # Raw value for "b" = 62
> > b_raw <- charToRaw("b")
> > b_raw
> >
> > # equals TRUE
> > "a" < "b"
> >
> > Ok, so 61 is less than 62 so it's TRUE. Is this correct?
> >
> > *3) Comparing strings #1*
> > "1040" <= "12000"
> >
> > raw_1040 <- charToRaw("1040")
> > raw_1040
> > #31 *30* (comparison happens with the second symbol) 34 30
> >
> > raw_12000 <- charToRaw("12000")
> > raw_12000
> > #31 *32* (comparison happens with the second symbol) 30 30 30
> >
> > The symbol in the second position is 30 and it's less than 32. Equals
> > to true. Is this correct?
> >
> > *4) Comparing strings #2*
> > "1040" <= "10000"
> >
> > raw_1040 <- charToRaw("1040")
> > raw_1040
> > #31 30 *34*  (comparison happens with third symbol) 30
> >
> > raw_10000 <- charToRaw("10000")
> > raw_10000
> > #31 30 *30*  (comparison happens with third symbol) 30 30
> >
> > The symbol in the third position is 34 is greater than 30. Equals to
> false.
> > Is this correct?
> >
> > *5) Problem - Why does this equal FALSE?* *"A" < "a"*
> >
> > 41 < 61 # FALSE?
> >
> > # Raw value for "A" = 41
> > A_raw <- charToRaw("A")
> > A_raw
> >
> > # Raw value for "a" = 61
> > a_raw <- charToRaw("a")
> > a_raw
> >
> > Why is capitalized "A" not less than lowercase "a"? Based on raw
> > values it should be. What am I missing here?
> >
> > Thanks
> > Kristjan
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> > man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> > Rzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxzb
> > 1Z4DKFB&s=7MT7GhFYxYsVOPG_ayqqA63o6SYSWKlMJYSq5BhbGow&e=
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> > sRzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxz
> > b1Z4DKFB&s=-FIG1LH5_F3fqVDTUEvJUFpwYehrqtqS2P6YhyETQwY&e=
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxzb1Z4DKFB&s=7MT7GhFYxYsVOPG_ayqqA63o6SYSWKlMJYSq5BhbGow&e=