[Rd] Feature request: non-dropping regmatches/strextract
wdun|@p @end|ng |rom t|bco@com
Thu Aug 15 22:39:10 CEST 2019
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my
example a bit
> x <- c("Groucho <groucho using marx.com>", "<chico using marx.com>", "Harpo")
> strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,
1 Groucho groucho using marx.com
2 chico using marx.com
On Thu, Aug 15, 2019 at 1:04 PM William Dunlap <wdunlap using tibco.com> wrote:
> I don't care much for regmatches and haven't tried strextract, but I think
> replacing the character(0) by NA_character_ is almost always inappropriate
> if the match information comes from gregexpr.
> I think strcapture() does a pretty good job of what I think you are trying
> to do. Perhaps adding an argument to map no match to NA instead of ""
> would give you just what you wanted.
> > x <- c("Groucho <groucho using marx.com>", "<chico using marx.com>", "Harpo")
> > d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?",
> x, proto=data.frame(Name=character(), Junk=character(),
> Address=character(), stringsAsFactors=FALSE))
> > d[c("Name", "Address")]
> Name Address
> 1 Groucho groucho using marx.com
> 2 chico using marx.com
> 3 Harpo
> > str(.Last.value)
> 'data.frame': 3 obs. of 2 variables:
> $ Name : chr "Groucho" "" "Harpo"
> $ Address: chr "groucho using marx.com" "chico using marx.com" ""
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 <
> cyclicgroup-z1 using yahoo.com> wrote:
>> I do think keeping the default behavior is desirable for backwards
>> compatibility; my suggestion is not to change default behavior but to add
>> an optional argument that allows a different behavior. Although this can be
>> implemented in a user-defined function, retaining empty matches facilitates
>> programmatic use, and seems to be something that should be available in
>> base R. It is available, for example, in MATLAB, a comparable array
>> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the
>> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL
>> (the default) results in drops for vector outputs and character(0) for list
>> outputs and nomatch = NA results in insertion of NA_character_, and nomatch
>> = '' results in insertion of empty string.
>> I can submit proposed patch code if others think this is a good idea.
>> What are your thoughts on the proposed alteration to (currently
>> nonexported) strextract? I assume (maybe wrongly) that the plan is to
>> eventually export that function.
>> Thank you,
[[alternative HTML version deleted]]
More information about the R-devel