[Rd] Feature request: non-dropping regmatches/strextract
Toby Hocking
tdhock5 @end|ng |rom gm@||@com
Thu Aug 29 23:00:18 CEST 2019
if you want "to extract regex matches into a new column in a data.frame"
then there are some package functions which do exactly that. three examples
are namedCapture::df_match_variable, rematch2::bind_re_match, and
tidyr::extract. For a more detailed discussion see my R journal submission
(under review) about regular expression packages,
https://raw.githubusercontent.com/tdhock/namedCapture-article/master/RJwrapper.pdf
Comments/suggestions welcome.
On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel <
r-devel using r-project.org> wrote:
> A very common use case for regmatches is to extract regex matches into a
> new column in a data.frame (or data.table, etc.) or otherwise use the
> extracted strings alongside the input. However, the default behavior is to
> drop empty matches, which results in mismatches in column length if
> reassignment is done without subsetting.
>
> For consistency with other R functions and compatibility with this use
> case, it would be nice if regmatches did not automatically drop empty
> matches and would instead insert an NA_character_ value (similar to
> stringr::str_extract). This alternative regmatches could be implemented
> through an optional drop argument, a new function, or mentioned in the
> documentation (a la resample in ?sample).
>
> Alternatively, at the moment, there is a non-exported function strextract
> in utils which is very similar to stringr::str_extract. It would be great
> if this function, once exported, were to include a drop argument to prevent
> dropping positions with no matches.
>
> An example solution (last option):
>
> strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop =
> T) {
> m <- regexec(pattern, x, perl=perl, useBytes=useBytes)
> result <- regmatches(x, m)
>
> if(isTRUE(drop)){
> unlist(result)
> } else if(isFALSE(drop)) {
> unlist({result[lengths(result)==0] <- NA_character_; result})
> } else {
> stop("Invalid argument for `drop`")
> }
> }
>
> Based on Ricardo Saporta's response to How to prevent regmatches drop non
> matches?
>
> --CG
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list