[R] Flummoxed by gsub().

David Winsemius dwinsemius at comcast.net
Thu Aug 24 19:20:15 CEST 2017


> On Aug 23, 2017, at 2:29 AM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
> 
> 
> On 23/08/17 18:33, Stefan Evert wrote:
> 
>>> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>> 
>>> My reading of ?regex led me to believe that
>>> 
>>>    gsub("[:alpha:]","",x)
>>> 
>>> should give the result that I want.
>> That's looking for any of the characters a, l, p, h, : .
> 
> OK.  I see that now.  I don't think that it's really stated anywhere that to search for (and possibly change) any one of a string of characters you enclose that string of characters in brackets [  ].

That's explained on the ?regex page in the section on character classes. The source of confusion for you is that within regex character classes there is also a set of reserved constructions that all start and end with "[:" and ":]". It's a bit like needed to double or triple escape characters in regex. a leading "|" changes the parser settings (or "expectations" if one wants to anthropomorphize the process.

> 
> The first example from ?grep makes this "clear" (for some value of the word "clear") once you understand what this example is on about.
> 
> So it's "obvious" once you've been shown, and totally opaque until then.

Sometimes we all stumble over syntactic "special" detours. If you wanted to add a warning to the current ?regex tex, you could submit a diff for the base package, perhaps with something like:

"Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale."

Replaced with:

"Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale. Their names do include the "[:" and ":]" characters."


> 
>> What you meant to say was
>> 	gsub("[[:alpha:]]","",x)
>> i.e. the character class [:alpha:] within a character set.
> 
> Yup.  Got it.  Thanks very much.
> 
> cheers,
> 
> Rolf
> 
> -- 
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law



More information about the R-help mailing list