[Rd] Feature request: non-dropping regmatches/strextract

Cyclic Group Z_1 cyc||cgroup-z1 @end|ng |rom y@hoo@com
Sat Aug 17 00:16:26 CEST 2019


Using strcapture seems like a great workaround for use cases of this kind, at least in base R. I agree as well that filling with NA for regmatches(..., gregexpr(...)) makes less sense, given the outputs are lists and thus are retained in the list.  Also, I suppose in the meantime the stringr package can be used when non-dropping vector outputs are desired.

However, I do think that non-dropping regex string extraction/matching in vector outputs from regmatches(..., regexpr(...)) or strextract would be a great (optional) design feature to have in base R for sake of consistency with the rest of the language (missing values, denoted by NA, are generally not dropped from vectors elsewhere and seem to agree conceptually with empty matches) and would help R to reach greater feature parity with MATLAB and Pandas in this respect (granted, Pandas is not technically a language on its own).

Although I have written personal wrappers and used stringr to accomplish the non-dropping behavior in the past, I have nevertheless found the behavior of base R string operations mildly astonishing (in the sense of POLA) and think others may have as well. As the stringr documentation puts it, "they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R." Since consistent, robust string operations are often a standard base feature of other data science and scientific programming languages, I think this minor change would be a great improvement to the language and hopefully help promote adoption of R, especially given the surge in text-based data analysis in recent years.

Alternatively, although I generally don't use the Tidyverse packages very often, stringr seems like a great candidate for inclusion in base or recommended R if the R Core team and the package developer see it fitting (just a suggestion and probably a long shot). 

However, I will try not to belabor this point further. In any case, thank you!

Best,CG
CG
	[[alternative HTML version deleted]]



More information about the R-devel mailing list