[R] removing characters from a string

Duncan Murdoch murdoch at math.aau.dk
Tue Apr 12 16:20:02 CEST 2005


Martin Maechler wrote:
>>>>>>"Vivek" == Vivek Rao <rvivekrao at yahoo.com>
>>>>>>    on Tue, 12 Apr 2005 05:54:55 -0700 (PDT) writes:
> 
> 
>     Vivek> Is there a simple way in R to remove all characters
>     Vivek> from a string other than those in a specified set? For
>     Vivek> example, I want to keep only the digits 0-9 in a
>     Vivek> string.
> 
>     Vivek> In general, I have found the string handling abilities
>     Vivek> of R a bit limited. (Of course it's great for stats in
>     Vivek> general). Is there a good reference on this? Or should
>     Vivek> R programmers dump their output to a text file and use
>     Vivek> something like Perl or Python for sophisticated text
>     Vivek> processing?
> 
>     Vivek> I am familiar with the basic functions such as nchar,
>     Vivek> substring, as.integer, print, cat, sprintf etc.
> 
> It depends on your "etc":
> 
> The above is pretty trivial using gsub(),
> but since you sound sophisticated enough to proclaim missing R
> abilities, I leave the exercise to you.

Part of the problem here is our help system.  gsub is documented within 
the grep topic, so when you look at the keyword==character topics, you 
don't see it explicitly.  (You do see "pattern matching and 
replacement", which should have been a hint.)  And if you were looking 
for "string handling" under the programming category, you're completely 
out of luck.

Another reason some people might see R's string handling as limited is 
that it is sometimes more cumbersome to manipulate strings in R than in 
other languages.  For example, I vaguely recall that there's a good 
reason why R doesn't use "+" to concatenate strings, but I can't 
remember what it is.  And sometimes I'd like to strip whitespace or pad 
things to a given width; I generally need to define my own functions to 
do that each time.  R is capable of concatenation, stripping and 
padding, but is sometimes a little obscure in how it does them.

Duncan Murdoch




More information about the R-help mailing list