[R] removing characters from a string

Marc Schwartz MSchwartz at MedAnalytics.com
Tue Apr 12 15:13:54 CEST 2005


On Tue, 2005-04-12 at 05:54 -0700, Vivek Rao wrote:
> Is there a simple way in R to remove all characters
> from a string other than those in a specified set? For
> example, I want to keep only the digits 0-9 in a
> string.
> 
> In general, I have found the string handling abilities
> of R a bit limited. (Of course it's great for stats in
> general). Is there a good reference on this? Or should
> R programmers dump their output to a text file and use
> something like Perl or Python for sophisticated text
> processing?
> 
> I am familiar with the basic functions such as nchar,
> substring, as.integer, print, cat, sprintf etc.

Something like the following should work:

> x <- paste(sample(c(letters, LETTERS, 0:9), 50, replace = TRUE),
             collapse = "")

> x
[1] "QvuuAlSJYUFpUpwJomtCir8TfvNQyV6O7W7TlXSXlLHocCdtnV"

> gsub("[^0-9]", "", x)
[1] "8677"

The use of gsub() here replaces any characters NOT in 0:9 with a "",
therefore leaving only the digits.

See ?gsub for more information.

HTH,

Marc Schwartz




More information about the R-help mailing list