[Rd] error handling in strcapture

William Dunlap wdunlap at tibco.com
Tue Oct 4 22:40:43 CEST 2016


I noticed a problem in the strcapture from R-devel (2016-09-27 r71386),
when the text contains a missing value and perl=TRUE.

{
      # NA in text input should map to row of NA's in output, without
warning
      r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA,
"Fifty 50"), data.frame(Initial=factor(), Number=numeric()))
      e9p <- structure(list(Initial = structure(c(2L, NA, 1L), .Label =
c("F", "O"), class = "factor"),
                           Number = c(1, NA, 50)),
                      row.names = c(NA, -3L),
                      class = "data.frame")
      all.equal(e9p, r9p)
  }
#Error in if (any(ind)) { : missing value where TRUE/FALSE needed


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Sep 21, 2016 at 2:32 PM, Michael Lawrence <lawrence.michael at gene.com
> wrote:

> The new behavior is that it yields NAs when the pattern does not match
> (like strptime) and for empty captures in a matching pattern it yields
> the empty string, which is consistent with regmatches().
>
> Michael
>
> On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdunlap at tibco.com> wrote:
> > If there are any matches then strcapture can see if the pattern has the
> same
> > number of capture expressions as the prototype has columns and give an
> > error if not.  That seems appropriate.
> >
> > If there are no matches, then there is no easy way to see if the
> prototype
> > is compatible with the pattern, so should strcapture just assume the best
> > and fill in the prototype with NA's?
> >
> > Should there be warnings?  This is kind of like strptime(), which
> silently
> > gives NA's when the format does not match the text input.
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Wed, Sep 21, 2016 at 2:10 PM, Michael Lawrence
> > <lawrence.michael at gene.com> wrote:
> >>
> >> Hi Bill,
> >>
> >> Thanks, another good suggestion. strcapture() now returns NAs for
> >> non-matches. It's nice to have someone kicking the tires on that
> >> function.
> >>
> >> Michael
> >>
> >> On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
> >> <r-devel at r-project.org> wrote:
> >> > Michael, thanks for looking at my first issue with utils::strcapture.
> >> >
> >> > Another issue is how it deals with lines that don't match the pattern.
> >> > Currently it gives an error
> >> >
> >> >> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
> >> > proto=list(Name="", Number=0))
> >> > Error in strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three
> 3"),
> >> > :
> >> >   number of matches does not always match ncol(proto)
> >> >
> >> > First, isn't the 'number of matches' the number of parenthesized
> >> > subpatterns in the regular expression?  I thought that if the entire
> >> > pattern matches then the subpatterns without matches would be
> >> > shown as matches at position 0 with length 0.  Hence either the
> >> > pattern is compatible with the prototype or it isn't, it does not
> depend
> >> > on the text input.  E.g.,
> >> >
> >> >> regexec("^(([[:alpha:]]+)|([[:digit:]]+))$", c("Twelve", "12",
> "Z280"))
> >> > [[1]]
> >> > [1] 1 1 1 0
> >> > attr(,"match.length")
> >> > [1] 6 6 6 0
> >> > attr(,"useBytes")
> >> > [1] TRUE
> >> >
> >> > [[2]]
> >> > [1] 1 1 0 1
> >> > attr(,"match.length")
> >> > [1] 2 2 0 2
> >> > attr(,"useBytes")
> >> > [1] TRUE
> >> >
> >> > [[3]]
> >> > [1] -1
> >> > attr(,"match.length")
> >> > [1] -1
> >> > attr(,"useBytes")
> >> > [1] TRUE
> >> >
> >> > Second, an error message like 'some lines were bad' is not very
> helpful.
> >> > Should it put NA's in all the columns of the current output row if the
> >> > input line didn't match the pattern and perhaps warn the user that
> there
> >> > were problems?  The user could then look for rows of NA's to see where
> >> > the
> >> > problems were.
> >> >
> >> > Bill Dunlap
> >> > TIBCO Software
> >> > wdunlap tibco.com
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-devel at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list