[R] Symbol/String comparison in R
Ebert,Timothy Aaron
tebert @end|ng |rom u||@edu
Thu Apr 14 16:39:55 CEST 2022
These outcomes are correct. It is an element wise comparison with the left most comparison taking precedence.
“abaca” < “acaba”
TRUE
“abaca” < “acabammmmm”
TRUE
This is important because it makes clear sort order: What is the right order for the numbers 1 through 10 sorted as character?
Another fun game is to explain
F > T
FALSE
f > t
error
“f”>”t”
FALSE
I can also change the first outcome
F<-4
T<-2
F>T
TRUE
The key is to know when I am comparing variables versus strings and a short cut R uses as a default for TRUE and FALSE that can be reset by the user.
Tim
From: Kristjan Kure <kristjan.kure.1 using gmail.com>
Sent: Thursday, April 14, 2022 8:00 AM
To: Ebert,Timothy Aaron <tebert using ufl.edu>
Cc: Bert Gunter <bgunter.4567 using gmail.com>; R-help <r-help using r-project.org>
Subject: Re: [R] Symbol/String comparison in R
[External Email]
Thank you for your response. This is the current status:
I am looking fundamental why for these comparisons:
"1040" <= "12000" # returns true
"1040" <= "10000" # returns false
"a" < "A" # returns true
"A" < "a" # returns false
"raining" <= "raining x" #true
Feedback so far:
1) Bert: "lexicographic", "locale"
2) Timothy: https://en.wikipedia.org/wiki/ASCII<https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_ASCII&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=e86X_tk8tjXPbrbY_0i-7WvJzmij3UUzZzE0awNSlL0fEpHaKLIEWyoJ3diccoJ3&s=Ds3AyEefVfUhjBBwZYJ9CqhqtfyorOkZfENCpIZv7hU&e=>
My comments:
1) "Lexicographic" - The phrase lexicographic order means alphabetical order. This will help me only when comparing:
"a" < "b" # I suppose it returns true, bluntly because (1 < 2)? Position 1 - A, position 2 - B
"b" < "a" # I suppose it returns false, bluntly because (2 < 1)? Position 2 - B, position 1 - A
2) Checking the alphabet or ASCII table won't help me understand why "a" < "A" returns true.
3) Letter "A" has smaller values compared to "a" (Checking oct, dec, hex values in https://en.wikipedia.org/wiki/ASCII<https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_ASCII&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=e86X_tk8tjXPbrbY_0i-7WvJzmij3UUzZzE0awNSlL0fEpHaKLIEWyoJ3diccoJ3&s=Ds3AyEefVfUhjBBwZYJ9CqhqtfyorOkZfENCpIZv7hU&e=>). On the other hand, if alphabetical order is different
in country X the whole ASCII table is obsolete?
4) "Locale" - I understand the order of letters can be different between locales/alphabets. Still, it does not help with "a" < "A" comparison.
Tried to use local() function in RStudio - did not get additional insight. Or is there any local table somewhere listing all symbols, lowercase, and uppercase symbols/letters?
I understand it might be a rare occasion for this type of comparison, but I still want to understand the why. Also, some functions might return strings instead of numbers
and then it might be helpful to understand what is really going on.
If no one can answer how these comparisons fundamentally work should this kind of string comparison return NaN in R?
Best regards,
Kristjan
On Thu, Apr 14, 2022 at 1:20 PM Ebert,Timothy Aaron <tebert using ufl.edu<mailto:tebert using ufl.edu>> wrote:
For some issues it can be useful to learn by experiment. It gives you experience and shows you what sorts of error messages you can expect. In the console type things like this:
a>B
gives an error
"a">"B"
FALSE
-----Original Message-----
From: R-help <r-help-bounces using r-project.org<mailto:r-help-bounces using r-project.org>> On Behalf Of Bert Gunter
Sent: Wednesday, April 13, 2022 10:00 PM
To: Kristjan Kure <kristjan.kure.1 using gmail.com<mailto:kristjan.kure.1 using gmail.com>>
Cc: R-help <r-help using r-project.org<mailto:r-help using r-project.org>>
Subject: Re: [R] Symbol/String comparison in R
[External Email]
"I was not able to find answers to my questions (tried Google, Stack Overflow, etc). Please correct me if anything is wrong here."
R has an extensive Help system. That should always be your first place to look. In this case, ?"<" (at the R prompt) brings you to the Help page for comparisons (as would ?Comparison, but only if the 'c" is in upper case, unfortunately). Among lots of other stuff, it says:
"Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales." ... (+ lots more).
Incidentally, rseek.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__rseek.org&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=e86X_tk8tjXPbrbY_0i-7WvJzmij3UUzZzE0awNSlL0fEpHaKLIEWyoJ3diccoJ3&s=Wp6_AwvgFE91zeQ3W1r0TCGdfxdJhVtv4ZrlieWqeaA&e=> and rdrr.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__rdrr.io&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=e86X_tk8tjXPbrbY_0i-7WvJzmij3UUzZzE0awNSlL0fEpHaKLIEWyoJ3diccoJ3&s=lmyiTc5RbfDL4dT_DLta_PeLG-6BghH_cmU2Zr01jtI&e=> are another couple of good places to look for R documentation.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
Bert Gunter
"The trouble with having an open mind is that people keep coming along and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Wed, Apr 13, 2022 at 5:10 PM Kristjan Kure <kristjan.kure.1 using gmail.com<mailto:kristjan.kure.1 using gmail.com>> wrote:
>
> Hi!
>
> Sorry, I am a beginner in R.
>
> I was not able to find answers to my questions (tried Google, Stack
> Overflow, etc). Please correct me if anything is wrong here.
>
> When comparing symbols/strings in R - raw numeric values are compared
> symbol by symbol starting from left? If raw numeric values are not
> used is there an ASCII / Unicode table where symbols have
> values/ranking/order and R compares those values?
>
> *2) Comparing symbols*
> Letter "a" raw value is 61, letter "b" raw value is 62? Is this correct?
>
> # Raw value for "a" = 61
> a_raw <- charToRaw("a")
> a_raw
>
> # Raw value for "b" = 62
> b_raw <- charToRaw("b")
> b_raw
>
> # equals TRUE
> "a" < "b"
>
> Ok, so 61 is less than 62 so it's TRUE. Is this correct?
>
> *3) Comparing strings #1*
> "1040" <= "12000"
>
> raw_1040 <- charToRaw("1040")
> raw_1040
> #31 *30* (comparison happens with the second symbol) 34 30
>
> raw_12000 <- charToRaw("12000")
> raw_12000
> #31 *32* (comparison happens with the second symbol) 30 30 30
>
> The symbol in the second position is 30 and it's less than 32. Equals
> to true. Is this correct?
>
> *4) Comparing strings #2*
> "1040" <= "10000"
>
> raw_1040 <- charToRaw("1040")
> raw_1040
> #31 30 *34* (comparison happens with third symbol) 30
>
> raw_10000 <- charToRaw("10000")
> raw_10000
> #31 30 *30* (comparison happens with third symbol) 30 30
>
> The symbol in the third position is 34 is greater than 30. Equals to false.
> Is this correct?
>
> *5) Problem - Why does this equal FALSE?* *"A" < "a"*
>
> 41 < 61 # FALSE?
>
> # Raw value for "A" = 41
> A_raw <- charToRaw("A")
> A_raw
>
> # Raw value for "a" = 61
> a_raw <- charToRaw("a")
> a_raw
>
> Why is capitalized "A" not less than lowercase "a"? Based on raw
> values it should be. What am I missing here?
>
> Thanks
> Kristjan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> Rzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxzb
> 1Z4DKFB&s=7MT7GhFYxYsVOPG_ayqqA63o6SYSWKlMJYSq5BhbGow&e=
> PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> sRzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxz
> b1Z4DKFB&s=-FIG1LH5_F3fqVDTUEvJUFpwYehrqtqS2P6YhyETQwY&e=
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxzb1Z4DKFB&s=7MT7GhFYxYsVOPG_ayqqA63o6SYSWKlMJYSq5BhbGow&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=yz1V2nVJPZSQ9gn4HFUMVpUhZKZg_cwwu3HIvvS5jCkbCbdw_4DHCUxzb1Z4DKFB&s=-FIG1LH5_F3fqVDTUEvJUFpwYehrqtqS2P6YhyETQwY&e=
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list