[Rd] Bounty on Error Checking

Duncan Murdoch murdoch.duncan at gmail.com
Fri Jan 4 16:22:48 CET 2013


On 04/01/2013 10:15 AM, Matthew Dowle wrote:
> On 04.01.2013 14:56, Duncan Murdoch wrote:
> > On 04/01/2013 9:51 AM, Matthew Dowle wrote:
> >> On 04.01.2013 14:03, Duncan Murdoch wrote:
> >> > On 13-01-04 8:32 AM, Matthew Dowle wrote:
> >> >>
> >> >> On Fri, Jan 3, 2013, Bert Gunter wrote
> >> >>> Well...
> >> >>>
> >> >>> On Thu, Jan 3, 2013 at 10:00 AM, ivo welch <ivo.welch <at>
> >> >>> anderson.ucla.edu> wrote:
> >> >>>>
> >> >>>> Dear R developers---I just spent half a day debugging an R
> >> >>>> program,
> >> >>>> which had two bugs---I selected the wrongly named variable,
> >> which
> >> >>>> turns out to have been a scalar, which then happily multiplied
> >> as
> >> >>>> if
> >> >>>> it was a matrix; and another wrongly named variable from a data
> >> >>>> frame,
> >> >>>> that triggered no error when used as a[["name"]] or a$name .
> >> >>>> there
> >> >>>> should be an option to turn on that throws an error inside R
> >> when
> >> >>>> one
> >> >>>> does this.  I cannot imagine that there is much code that wants
> >> to
> >> >>>> reference non-existing columns in data frames.
> >> >>>
> >> >>> But I can -- and do it all the time: To add a new variable, "d"
> >> to
> >> >>> a
> >> >>> data frame, df,  containing only "a" and "b" (with 10 rows,
> >> say):
> >> >>>
> >> >>> df[["d"]] <- 1:10
> >> >>
> >> >> Yes but that's `[[<-`. Ivo was talking about `[[` and `$`; i.e.,
> >> >> select
> >> >> only not assign, if I understood correctly.
> >> >>
> >> >>>
> >> >>> Trying to outguess documentation to create error triggers is a
> >> very
> >> >>> bad idea.
> >> >>
> >> >> Why exactly is it a very bad idea? (I don't necessarily disagree,
> >> >> just
> >> >> asking
> >> >> for more colour.)
> >> >>
> >> >>> R already has plenty of debugging tools -- and there is even a
> >> >>> "debug"
> >> >>> package. Perhaps you need a better programming editor/IDE. There
> >> >>> are
> >> >>> several listed on CRAN, RStudio, etc.
> >> >>
> >> >> True, but that relies on you knowing there's a bug to hunt for.
> >> What
> >> >> if
> >> >> you
> >> >> don't know you're getting incorrect results, silently? In a
> >> similar
> >> >> way
> >> >> that options(warn=2) turns known warnings into errors, to enable
> >> you
> >> >> to
> >> >> be
> >> >> more strict if you wish,
> >> >
> >> > I would say the point of options(warn=2) is rather to let you find
> >> > the location of the warning more easily, because it will abort the
> >> > evaluation.
> >>
> >> True but as well as that, I sometimes like to run production systems
> >> with
> >> options(warn=2). I'd prefer some tasks to halt at the slightest hint
> >> of
> >> trouble than write a warning silently to a log file that may not be
> >> looked
> >> at. I think of that as being more strict, more robust. Since
> >> option(warn=2)
> >> is set even when there is no warning, to catch if one arises in
> >> future.
> >> Not
> >> just to find it more easily once you know there is a warning.
> >>
> >> > I would not recommend using code that issues warnings.
> >>
> >> Not sure what you mean here.
> >
> > I just meant that I consider warnings to be a problem (as you do), so
> > they should all be fixed.
>
> I see now, good.
>
> >
> >>
> >> >
> >> > an option to turn on warnings from `[[` and
> >> >> `$`
> >> >> if the column is missing (select only, not assign) doesn't seem
> >> like
> >> >> a
> >> >> bad option to have. Maybe it would reveal some previously silent
> >> >> bugs.
> >> >
> >> > I agree that this would sometimes be useful, but a very common
> >> > convention is to do something like
> >> >
> >> > if (is.null(obj$element)) {  do something }
> >> >
> >> > These would all have to be re-written to something like
> >> >
> >> > if (missing.field(obj, "element") { do something }
> >> >
> >> > There are several hundred examples of the first usage in base R; I
> >> > imagine thousands more in contributed packages.
> >>
> >> Yes but Ivo doesn't seem to be writing that if() in his code. We're
> >> only talking about an option that users can turn on for their own
> >> code, iiuc. Not anything that would affect or break thousands of
> >> packages. That's why I referred to the fact that all packages now
> >> have namespaces, in the earlier post.
> >>
> >> > I don't think the
> >> > benefit of the change is worth all the work that would be
> >> necessary
> >> > to
> >> > implement it.
> >>
> >> It doesn't seem to be a lot of work. I already posted a working
> >> straw man, for example, as a first step.
> >
> > I understood the proposal to be that evaluating "obj$element" would
> > issue a warning if element didn't exist.  If that were the case, then
> > the common test
> >
> > is.null(obj$element)
> >
> > would issue a warning in the cases where it now returns TRUE.
>
> Yes, but only for obj$element appearing in Ivo's own code. Not if a
> package
> does that (including base). That's why I thought masking "[[<-" and
> "$<-"
> in .GlobalEnv might achieve that without affecting packages or base,
> although
> I don't know how such an option could be made available by R.
> Maybe options(strictselect=TRUE) would create those masks in
> .GlobalEnv,
> and options(strictselect=FALSE) would remove them. A package maintainer
> might choose to set that in their package to make it stricter (which
> would
> create those masks in the package's namespace too).
>
> Or users could just create those masks themselves, since it's only a
> few
> lines. Without affecting packages or base.

options() are global, but a package could change the meaning of $ or 
[[.  It could even export those new definitions so that people who 
wanted the strict usage could use it.  It would be hard to get the same 
performance as the base definitions, but for debugging purposes that 
might not matter.

Duncan Murdoch



More information about the R-devel mailing list