[Rd] Bounty on Error Checking

Duncan Murdoch murdoch.duncan at gmail.com
Fri Jan 4 18:43:38 CET 2013


On 04/01/2013 10:38 AM, Matthew Dowle wrote:
> On 04.01.2013 15:22, Duncan Murdoch wrote:
> > On 04/01/2013 10:15 AM, Matthew Dowle wrote:
> >> On 04.01.2013 14:56, Duncan Murdoch wrote:
> >> > On 04/01/2013 9:51 AM, Matthew Dowle wrote:
> >> >> On 04.01.2013 14:03, Duncan Murdoch wrote:
> >> >> > On 13-01-04 8:32 AM, Matthew Dowle wrote:
> >> >> >>
> >> >> >> On Fri, Jan 3, 2013, Bert Gunter wrote
> >> >> >>> Well...
> >> >> >>>
> >> >> >>> On Thu, Jan 3, 2013 at 10:00 AM, ivo welch <ivo.welch <at>
> >> >> >>> anderson.ucla.edu> wrote:
> >> >> >>>>
> >> >> >>>> Dear R developers---I just spent half a day debugging an R
> >> >> >>>> program,
> >> >> >>>> which had two bugs---I selected the wrongly named variable,
> >> >> which
> >> >> >>>> turns out to have been a scalar, which then happily
> >> multiplied
> >> >> as
> >> >> >>>> if
> >> >> >>>> it was a matrix; and another wrongly named variable from a
> >> data
> >> >> >>>> frame,
> >> >> >>>> that triggered no error when used as a[["name"]] or a$name .
> >> >> >>>> there
> >> >> >>>> should be an option to turn on that throws an error inside R
> >> >> when
> >> >> >>>> one
> >> >> >>>> does this.  I cannot imagine that there is much code that
> >> wants
> >> >> to
> >> >> >>>> reference non-existing columns in data frames.
> >> >> >>>
> >> >> >>> But I can -- and do it all the time: To add a new variable,
> >> "d"
> >> >> to
> >> >> >>> a
> >> >> >>> data frame, df,  containing only "a" and "b" (with 10 rows,
> >> >> say):
> >> >> >>>
> >> >> >>> df[["d"]] <- 1:10
> >> >> >>
> >> >> >> Yes but that's `[[<-`. Ivo was talking about `[[` and `$`;
> >> i.e.,
> >> >> >> select
> >> >> >> only not assign, if I understood correctly.
> >> >> >>
> >> >> >>>
> >> >> >>> Trying to outguess documentation to create error triggers is
> >> a
> >> >> very
> >> >> >>> bad idea.
> >> >> >>
> >> >> >> Why exactly is it a very bad idea? (I don't necessarily
> >> disagree,
> >> >> >> just
> >> >> >> asking
> >> >> >> for more colour.)
> >> >> >>
> >> >> >>> R already has plenty of debugging tools -- and there is even
> >> a
> >> >> >>> "debug"
> >> >> >>> package. Perhaps you need a better programming editor/IDE.
> >> There
> >> >> >>> are
> >> >> >>> several listed on CRAN, RStudio, etc.
> >> >> >>
> >> >> >> True, but that relies on you knowing there's a bug to hunt
> >> for.
> >> >> What
> >> >> >> if
> >> >> >> you
> >> >> >> don't know you're getting incorrect results, silently? In a
> >> >> similar
> >> >> >> way
> >> >> >> that options(warn=2) turns known warnings into errors, to
> >> enable
> >> >> you
> >> >> >> to
> >> >> >> be
> >> >> >> more strict if you wish,
> >> >> >
> >> >> > I would say the point of options(warn=2) is rather to let you
> >> find
> >> >> > the location of the warning more easily, because it will abort
> >> the
> >> >> > evaluation.
> >> >>
> >> >> True but as well as that, I sometimes like to run production
> >> systems
> >> >> with
> >> >> options(warn=2). I'd prefer some tasks to halt at the slightest
> >> hint
> >> >> of
> >> >> trouble than write a warning silently to a log file that may not
> >> be
> >> >> looked
> >> >> at. I think of that as being more strict, more robust. Since
> >> >> option(warn=2)
> >> >> is set even when there is no warning, to catch if one arises in
> >> >> future.
> >> >> Not
> >> >> just to find it more easily once you know there is a warning.
> >> >>
> >> >> > I would not recommend using code that issues warnings.
> >> >>
> >> >> Not sure what you mean here.
> >> >
> >> > I just meant that I consider warnings to be a problem (as you do),
> >> so
> >> > they should all be fixed.
> >>
> >> I see now, good.
> >>
> >> >
> >> >>
> >> >> >
> >> >> > an option to turn on warnings from `[[` and
> >> >> >> `$`
> >> >> >> if the column is missing (select only, not assign) doesn't
> >> seem
> >> >> like
> >> >> >> a
> >> >> >> bad option to have. Maybe it would reveal some previously
> >> silent
> >> >> >> bugs.
> >> >> >
> >> >> > I agree that this would sometimes be useful, but a very common
> >> >> > convention is to do something like
> >> >> >
> >> >> > if (is.null(obj$element)) {  do something }
> >> >> >
> >> >> > These would all have to be re-written to something like
> >> >> >
> >> >> > if (missing.field(obj, "element") { do something }
> >> >> >
> >> >> > There are several hundred examples of the first usage in base
> >> R; I
> >> >> > imagine thousands more in contributed packages.
> >> >>
> >> >> Yes but Ivo doesn't seem to be writing that if() in his code.
> >> We're
> >> >> only talking about an option that users can turn on for their own
> >> >> code, iiuc. Not anything that would affect or break thousands of
> >> >> packages. That's why I referred to the fact that all packages now
> >> >> have namespaces, in the earlier post.
> >> >>
> >> >> > I don't think the
> >> >> > benefit of the change is worth all the work that would be
> >> >> necessary
> >> >> > to
> >> >> > implement it.
> >> >>
> >> >> It doesn't seem to be a lot of work. I already posted a working
> >> >> straw man, for example, as a first step.
> >> >
> >> > I understood the proposal to be that evaluating "obj$element"
> >> would
> >> > issue a warning if element didn't exist.  If that were the case,
> >> then
> >> > the common test
> >> >
> >> > is.null(obj$element)
> >> >
> >> > would issue a warning in the cases where it now returns TRUE.
> >>
> >> Yes, but only for obj$element appearing in Ivo's own code. Not if a
> >> package
> >> does that (including base). That's why I thought masking "[[<-" and
> >> "$<-"
> >> in .GlobalEnv might achieve that without affecting packages or base,
> >> although
> >> I don't know how such an option could be made available by R.
> >> Maybe options(strictselect=TRUE) would create those masks in
> >> .GlobalEnv,
> >> and options(strictselect=FALSE) would remove them. A package
> >> maintainer
> >> might choose to set that in their package to make it stricter (which
> >> would
> >> create those masks in the package's namespace too).
> >>
> >> Or users could just create those masks themselves, since it's only a
> >> few
> >> lines. Without affecting packages or base.
> >
> > options() are global
>
> I realise that. I was thinking that inside the options() function it
> could see if strictselect was being changed and then create the masks
> in .GlobalEnv. But I can see that is ugly, was just thinking out loud.
> Wasn't suggesting that "[[" would look at the value of strictselect.
>
> > but a package could change the meaning of $ or
> > [[.  It could even export those new definitions so that people who
> > wanted the strict usage could use it.  It would be hard to get the
> > same performance as the base definitions, but for debugging purposes
> > that might not matter.
>
> So in principle this would be a (small) good idea then?  Is it an
> option that R could provide? i.e. something for which a patch file
> for R would be considered by R core?

I don't think there's any need for this to be in base R, so I would 
guess it wouldn't be accepted.  It would be great if someone maintained 
a package full of versions of standard functions and operators that wrap 
the regular versions in extra checking, but that someone doesn't have to 
be in the core group.  Conceivably some of the very low level things 
couldn't be replaced by a package; then I think a minimal patch would be 
considered.

Duncan Murdoch



More information about the R-devel mailing list