[Rd] extending strsplit(): supply pattern to keep, not to split by
Bill Dunlap
bill at insightful.com
Tue Apr 4 17:54:17 CEST 2006
strsplit() is a convenient way to get a
list of items from a string when you
have a regular expression for what is not
an item. E.g.,
> strsplit("1.2, 34, 1.7e-2", split="[ ,] *")
[[1]]:
[1] "1.2" "34" "1.7e-2"
However, sometimes is it more convenient to
give a pattern for the items you do want.
E.g., suppose you want to pull all the numbers
out of a string which contains a mix of numbers
and words. Making a pattern for what a
number is simpler than making a pattern
for what may come between the number.
> number.pattern <- "[-+]?(([0-9]+(\\.[0-9]*)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?"
I propose adding a keep=FALSE argument to
strsplit() to do this. If keep is FALSE,
then the split argument matches the stuff to
omit from the output; if keep is TRUE then
split matches the stuff to put into the
output. Then we could do the following to
get a list of all the numbers in a string
(done in a version of strsplit() I'm working on
for S-PLUS):
> strsplit("1.2, 34, 1.7e-2", split=number.pattern,keep=TRUE)
[[1]]:
[1] "1.2" "34" "1.7e-2"
> strsplit("Ibuprofin 200mg", split=number.pattern,keep=TRUE)
[[1]]:
[1] "200"
Is this a reasonable thing to want strsplit to do?
Is this a reasonable parameterization of it?
----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146
"All statements in this message represent the opinions of the author and do
not necessarily reflect Insightful Corporation policy or position."
More information about the R-devel
mailing list