[Rd] error handling in strcapture

William Dunlap wdunlap at tibco.com
Wed Sep 21 21:11:32 CEST 2016


Michael, thanks for looking at my first issue with utils::strcapture.

Another issue is how it deals with lines that don't match the pattern.
Currently it gives an error

> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
proto=list(Name="", Number=0))
Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),  :
  number of matches does not always match ncol(proto)

First, isn't the 'number of matches' the number of parenthesized
subpatterns in the regular expression?  I thought that if the entire
pattern matches then the subpatterns without matches would be
shown as matches at position 0 with length 0.  Hence either the
pattern is compatible with the prototype or it isn't, it does not depend
on the text input.  E.g.,

> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12", "Z280"))
[[1]]
[1] 1 1 1 0
attr(,"match.length")
[1] 6 6 6 0
attr(,"useBytes")
[1] TRUE

[[2]]
[1] 1 1 0 1
attr(,"match.length")
[1] 2 2 0 2
attr(,"useBytes")
[1] TRUE

[[3]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE

Second, an error message like 'some lines were bad' is not very helpful.
Should it put NA's in all the columns of the current output row if the
input line didn't match the pattern and perhaps warn the user that there
were problems?  The user could then look for rows of NA's to see where the
problems were.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

	[[alternative HTML version deleted]]



More information about the R-devel mailing list