[Rd] strsplit and the empty string
Christian Brechbühler
brechbuehler at gmail.com
Wed Jun 18 16:59:39 CEST 2008
On Wed, Jun 18, 2008 at 8:45 AM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> asked
for opinions:
>
> When the pattern
> matches the beginning of the search string, the empty string is added to
> the result, but that's not the case when the pattern matches the end of
> the search string:
>
> strsplit(" hello dolly ")
> [1] "" "hello" "dolly"
With R version 2.6.1 Patched (2007-11-26 r43541), I get
Error in strsplit(" hello dolly ") :
argument "split" is missing, with no default
But strsplit(" hello dolly ", " ") reproduces your results.
> The man for strsplit explains the algorithm:
>
> "
> The algorithm applied to each input string is
>
>
> repeat {
> if the string is empty
> break.
> if there is a match
> add the string to the left of the match to the output.
> remove the match and all to the left of it.
> else
> add the string to the output.
> break.
> }
>
> Note that this means that if there is a match at the beginning of
> a (non-empty) string, the first element of the output is '""', but
> if there is a match at the end of the string, the output is the
> same as with the match removed.
> "
The algorithm, the comment after it, and your results are consistent.
Whether it is intuitive is a matter of taste. I agree it's not as
symmetric as one might like.
> If the pattern matches, (second if above), the match is added to the
> output, and removed from the input -- which after this step is the empty
> string;
Close. The string to the left of the match, "dolly", is added to the output.
I agree, the input is now the empty string.
> in the next step, there is no match (else above), so the rest of
> the input string (= the empty string) *should* be added, but it is not
> what happens.
No, in the next step, the string is empty (first 'if' above), and we break.
The else branch never applies in your example.
> (i see no good
> reason for including the empty string at the beginning but not at the
> end of the output; no other language i know would do that this way)
I checked Perl, and it does exactly the same:
print join "==", split / /, " hello dolly "
==hello==dolly
(that's 3 elements: "", "hello", and "dolly").
Cheers,
/Christian
More information about the R-devel
mailing list