[R] Flummoxed by gsub().
David Winsemius
dwinsemius at comcast.net
Thu Aug 24 20:28:59 CEST 2017
> On Aug 24, 2017, at 10:20 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>
>> On Aug 23, 2017, at 2:29 AM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>
>>
>> On 23/08/17 18:33, Stefan Evert wrote:
>>
>>>> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>>>
>>>> My reading of ?regex led me to believe that
>>>>
>>>> gsub("[:alpha:]","",x)
>>>>
>>>> should give the result that I want.
>>> That's looking for any of the characters a, l, p, h, : .
>>
>> OK. I see that now. I don't think that it's really stated anywhere that to search for (and possibly change) any one of a string of characters you enclose that string of characters in brackets [ ].
>
> That's explained on the ?regex page in the section on character classes. The source of confusion for you is that within regex character classes there is also a set of reserved constructions that all start and end with "[:" and ":]". It's a bit like needed to double or triple escape characters in regex. a leading "|" changes the parser settings (or "expectations" if one wants to anthropomorphize the process.
I meant a leading backslash "\" rather than a vertical bar ("|")
--
David.
>
>>
>> The first example from ?grep makes this "clear" (for some value of the word "clear") once you understand what this example is on about.
>>
>> So it's "obvious" once you've been shown, and totally opaque until then.
>
> Sometimes we all stumble over syntactic "special" detours. If you wanted to add a warning to the current ?regex tex, you could submit a diff for the base package, perhaps with something like:
>
> "Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale."
>
> Replaced with:
>
> "Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale. Their names do include the "[:" and ":]" characters."
>
>
>>
>>> What you meant to say was
>>> gsub("[[:alpha:]]","",x)
>>> i.e. the character class [:alpha:] within a character set.
>>
>> Yup. Got it. Thanks very much.
>>
More information about the R-help
mailing list